Skip to content

Latest commit

 

History

History
353 lines (253 loc) · 13.3 KB

performance-profiling.md

File metadata and controls

353 lines (253 loc) · 13.3 KB

Table of Contents generated with DocToc

General Strategies to track and improve Performance

Identify and Understand Performance Problem

watch | slide watch profiling workflow

Analyse performance only once you have a problem in a top down manner like so:

  • ensure it's JavaScript and not the DOM
  • reduce testcase to pure JavaScript and run in v8 shell
  • collect metrics and locate bottlenecks
  • sample profiling to narrow down the general problem area
    • at this point think about the algorithm, data structures, techniques, etc. used in this area and evaluate if improvements in this area are possible since that will most likely yield greater impact than any of the more fine grained improvments
  • structural profiling to isolate the exact area i.e. function in which most time is spent
    • evaluate what can be improved here again thinking about algorithm first
    • only once algorithm and data structures seem optimal evaluate how the code structure affects assembly code generated by v8 and possible optimizations (small functions, try/catch, closures, loops vs. forEach, etc.)
  • optimize slowest section of code and repeat structural profiling

Sampling CPU Profilers

watch watch walkthrough

  • at fixed frequency program is instantaneously paused by setting stacksize to 0 and the call stack sampled
  • assumes that the sample is representative of workload
  • gives no sense fo flow to due gaps between samples
  • functions that were inlined by compiler aren't shown
  • collect data for longer period of time, sampling every 1ms
  • ensure code is exercising the right code paths

Structural CPU Profilers

watch watch walkthrough

  • functions are instrumented to record entry and exit times
  • three data points per function
    • Inclusive Time: time spent in function including its children
    • Exclusive Time: time spent in function excluding its children
    • Call Count: number of times the function was called
  • data points are taken at much higher frequency than sampling
  • higher cost than sampling dut to instrumentation
  • goal of optimization is to minimize inclusive time
  • inlined functions retain markers

Instrumentation Techniques

watch

  • think about data being processed
    • is one piece of data slower?
  • name time ranges based on data
    • use variables/properties to dynamically name ranges

Instrumenting vs. Sampling

watch

+--------------------------------------------------------------------------------------------+
|                                   |      Sampling          |    Structural / Instrumenting |
|-----------------------------------+------------------------+-------------------------------|
| Time                              |       Approximate      |            Exact              |
| Invocation count                  |       Approximate      |            Exact              |
| Overhead                          |       Small            |            High(er)           |
| Accuracy                          |       Good - Poor      |            Good - Poor        |
| Extra code / instrumentation      |       No               |            Yes                |
+--------------------------------------------------------------------------------------------+
  • need both
  • manual instrumentation can reduce overhead
  • instrumentation affects performance and may affect behavior
  • samples are very accurate, but inaccurate for extacting time
  • samping requires no program modification

Plan for Performance

watch

  • each module of app sould have time budget
  • sum of modules should be < 16ms for smooth client side apps
  • track performance daily or per commit in order to catch budget busters right away

Animation Frame

watch watch walkthrough

  • queue up key handlers and execute inside Animation Frame
  • optimize for lowest common denominator that your app will run on
  • for mobile stay below 8-10ms since remaining time is needed for chrome to do its work, i.e. render

v8 Performance Profiling

Chrome Devtools Profiler

watch

  • Profile Tab -> Start -> Record Sample
  • tree view gives idea of flow (call stack) and allows drilling into tree nodes
  • save profiles to load them later i.e. for bug reports
  • use octane benchmark to experiment with the profiler

Chrome Tracing aka chrome://tracing

watch

  • access at chrome://tracing
  • hidden feature like chrome://memory originally designed by chrome developers for chrome developers
  • view into guts of what chrome is doing
  • timeline of what code is doing framed in larger chrome context
  • allows optimizing low level gpu performance

Preparation

watch

  • instrument code
    • a) manually add calls to console.time and console.timeEnd with a unique name as argument to mark entry and exit points of an area in the code
    • b) Firefox does automatic instrumentation via Firebug (Chrome's Profiler is sample based, while Firebug's is structural)
    • c) use compiler/automatic tool to add calls
    • d) use runtime instrumentation, similar to valgrind in C
  • instrumentation archieved via trace macros
    • can be nested (hierarchy reflected in profiling display)
    • when turned off cost at most a few dozen clocks
    • when turned on cost a few thousand clocks (0.01ms)
    • arguments passed to macro are only computed when macro is enabled
  • time/timeEnd spam dev tools console (keep it closed)
  • in order to easily remove macro in production wrap time/timeEnd calls

Running

watch

  • close all other tabs in order to have the least noise caused by other tabs and thus get cleaner samples
  • |Record| to start recording a trace
  • switch to app and interact with it, limit this to 10s as buffer gets large very quickly
  • switch back |Stop Tracing|
  • |Save| / |Load| trace

Evaluation

watch

  • data includes lots of noise since each tab/process will include activity from the following pieces:
    • IO thread
    • renderer thread
    • compositor thread
  • find pid of your page via chrome://memory

Filter for Signal

watch

  • in order to get nice timeline
  • remove unnec. threads and components by selecting only rows with your pid
  • filter by categories, v8 and webkit are most relevant for JS profiling

Inspect

watch

  • navigation based on quake keys and is not mouse friendly, although it seems to be improving
                  +---+
                  | W | zoom in
+---+             +---+           +---+               +---+
| A | pan left    | S | zoom out  | D | pan right     | ? |  help (other shortcuts)
+---+             +---+           +---+               +---+

Resources

v8 tools

  • ship with v8 source code
  • plot-time-events: generates png showing v8 timeline
  • (mac|linux|windows)-tick-processor: generates table of functions sorted by time spent in them

Using Chrome

v8 timeline

Capturing

watch

Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"

[ .. ]

tools/plot-timer-events /chrome/dir/v8.log

Analyzing

watch

Top Band
  • v8.GCScavenger young generation collection
  • v8.Execute executing JavaScript
  • scavenges interrupt script execution

Middle Band

  • shows code kind
  • bright green - optimized
  • blue/purple - unoptimized

Bottom Graph

  • shows pauses
  • lots in beginning since scripts are being parsed
  • no pauses when running optimized code
  • scavenges (top band) correllate with pause time spikes

Finding Slow Running Unoptimized Functions

watch

Chrome --no-sandbox --js-flags="--prof --noprof-lazy --log-timer-events"

[ .. ]

tools/mac-timer-events /chrome/dir/v8.log

watch | slide

  • generates table of functions sorted by time spent in them
  • includes C++ functions
  • * indicates optimized functions
  • functions without * could not be optimized

d8

watch | slide

/v8/out/native/d8 test.js --prof

Determining why a Function was not Optimized

watch watch | slide

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --no-sandbox --js-flags="--trace-deopt --trace-opt-verbose --trace-bailout"

[ . lots of other output. ]

[disabled optimization for xxx, reason: The Reason why function couldn't be optimized]
  • lots of output which is best piped into file and evaluated
  • especially watch out for deoptimized functions with lots of arithmetic operations

d8

watch | slide

d8 --trace-opt

Log optimizing compiler bailouts:

watch | slide

d8 --trace-bailout

Log deoptimizations:

watch | slide

d8 --trace-deopt

Improvments

  • don't use construct that caused function to be deoptimized
  • or move all code inside construct into separate function and call it instead

Resources