Table of Contents generated with DocToc
- General Strategies to track and improve Performance
- v8 Performance Profiling
watch | slide watch profiling workflow
Analyse performance only once you have a problem in a top down manner like so:
- ensure it's JavaScript and not the DOM
- reduce testcase to pure JavaScript and run in
v8
shell - collect metrics and locate bottlenecks
- sample profiling to narrow down the general problem area
- at this point think about the algorithm, data structures, techniques, etc. used in this area and evaluate if improvements in this area are possible since that will most likely yield greater impact than any of the more fine grained improvments
- structural profiling to isolate the exact area i.e. function in which most time is spent
- evaluate what can be improved here again thinking about algorithm first
- only once algorithm and data structures seem optimal evaluate how the code structure affects assembly code generated by v8 and
possible optimizations (small functions,
try/catch
, closures, loops vs.forEach
, etc.)
- optimize slowest section of code and repeat structural profiling
- at fixed frequency program is instantaneously paused by setting stacksize to 0 and the call stack sampled
- assumes that the sample is representative of workload
- gives no sense fo flow to due gaps between samples
- functions that were inlined by compiler aren't shown
- collect data for longer period of time, sampling every 1ms
- ensure code is exercising the right code paths
- functions are instrumented to record entry and exit times
- three data points per function
- Inclusive Time: time spent in function including its children
- Exclusive Time: time spent in function excluding its children
- Call Count: number of times the function was called
- data points are taken at much higher frequency than sampling
- higher cost than sampling dut to instrumentation
- goal of optimization is to minimize inclusive time
- inlined functions retain markers
- think about data being processed
- is one piece of data slower?
- name time ranges based on data
- use variables/properties to dynamically name ranges
+--------------------------------------------------------------------------------------------+
| | Sampling | Structural / Instrumenting |
|-----------------------------------+------------------------+-------------------------------|
| Time | Approximate | Exact |
| Invocation count | Approximate | Exact |
| Overhead | Small | High(er) |
| Accuracy | Good - Poor | Good - Poor |
| Extra code / instrumentation | No | Yes |
+--------------------------------------------------------------------------------------------+
- need both
- manual instrumentation can reduce overhead
- instrumentation affects performance and may affect behavior
- samples are very accurate, but inaccurate for extacting time
- samping requires no program modification
- each module of app sould have time budget
- sum of modules should be
< 16ms
for smooth client side apps - track performance daily or per commit in order to catch budget busters right away
- queue up key handlers and execute inside Animation Frame
- optimize for lowest common denominator that your app will run on
- for mobile stay below
8-10ms
since remaining time is needed for chrome to do its work, i.e. render
- Profile Tab -> Start -> Record Sample
- tree view gives idea of flow (call stack) and allows drilling into tree nodes
- save profiles to load them later i.e. for bug reports
- use octane benchmark to experiment with the profiler
- access at chrome://tracing
- hidden feature like
chrome://memory
originally designed by chrome developers for chrome developers - view into guts of what chrome is doing
- timeline of what code is doing framed in larger chrome context
- allows optimizing low level gpu performance
- instrument code
- a) manually add calls to
console.time
andconsole.timeEnd
with a uniquename
as argument to mark entry and exit points of an area in the code - b) Firefox does automatic instrumentation via Firebug (Chrome's Profiler is sample based, while Firebug's is structural)
- c) use compiler/automatic tool to add calls
- d) use runtime instrumentation, similar to valgrind in C
- a) manually add calls to
- instrumentation archieved via trace macros
- can be nested (hierarchy reflected in profiling display)
- when turned off cost at most a few dozen clocks
- when turned on cost a few thousand clocks (0.01ms)
- arguments passed to macro are only computed when macro is enabled
time/timeEnd
spam dev tools console (keep it closed)- in order to easily remove macro in production wrap
time/timeEnd
calls
- close all other tabs in order to have the least noise caused by other tabs and thus get cleaner samples
|Record|
to start recording a trace- switch to app and interact with it, limit this to 10s as buffer gets large very quickly
- switch back
|Stop Tracing|
|Save| / |Load|
trace
- data includes lots of noise since each tab/process will include activity from the following pieces:
- IO thread
- renderer thread
- compositor thread
- find pid of your page via
chrome://memory
- in order to get nice timeline
- remove unnec. threads and components by selecting only rows with your pid
- filter by categories, v8 and webkit are most relevant for JS profiling
- navigation based on quake keys and is not mouse friendly, although it seems to be improving
+---+
| W | zoom in
+---+ +---+ +---+ +---+
| A | pan left | S | zoom out | D | pan right | ? | help (other shortcuts)
+---+ +---+ +---+ +---+
- trace-viewer supports streaming trace data over web sockets
- trace event format JSON format to allow interfacing with other tools
- web tracing framework an alternative to the built in tracer
- about:tracing
- ship with v8 source code
- plot-time-events: generates
png
showing v8 timeline - (mac|linux|windows)-tick-processor: generates table of functions sorted by time spent in them
Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"
[ .. ]
tools/plot-timer-events /chrome/dir/v8.log
v8.GCScavenger
young generation collectionv8.Execute
executing JavaScript- scavenges interrupt script execution
- shows code kind
- bright green - optimized
- blue/purple - unoptimized
- shows pauses
- lots in beginning since scripts are being parsed
- no pauses when running optimized code
- scavenges (top band) correllate with pause time spikes
Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"
[ .. ]
tools/mac-timer-events /chrome/dir/v8.log
- generates table of functions sorted by time spent in them
- includes C++ functions
*
indicates optimized functions- functions without
*
could not be optimized
/v8/out/native/d8 test.js --prof
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--no-sandbox --js-flags="--trace-deopt --trace-opt-verbose --trace-bailout"
[ . lots of other output. ]
[disabled optimization for xxx, reason: The Reason why function couldn't be optimized]
- lots of output which is best piped into file and evaluated
- especially watch out for deoptimized functions with lots of arithmetic operations
d8 --trace-opt
Log optimizing compiler bailouts:
d8 --trace-bailout
Log deoptimizations:
d8 --trace-deopt
- don't use construct that caused function to be deoptimized
- or move all code inside construct into separate function and call it instead