Skip to content

Latest commit

 

History

History
457 lines (341 loc) · 16.6 KB

data-types.md

File metadata and controls

457 lines (341 loc) · 16.6 KB

Table of Contents generated with DocToc

Data Types

Efficiently Representing Values and Tagging

watch | slide

read Numbered properties: fast elements

  • most objects in heap are 4-byte aligned
  • according to spec all numbers in JS are 64-bit floating doubles
  • v8 passes around 32-bit numbers to represent all values for improved efficiency
  • bottom bit reserved as tag to signify if value is a SMI (small integer) or a pointer to an object

watch | slide

| object pointer              | 1 |

or

| 31-bit-signed integer (SMI) | 0 |
  • numbers bigger than 31 bits are boxed
  • stored inside an object referenced via a pointer
  • adds extra overhead (at a minimum an extra lookup)

Considerations

  • prefer SMIs for numeric values whenever possible

Objects

Structure

+-------------------+
|  Object           |    +----> +------------------+     +---->  +------------------+
|-------------------|    |      |  FixedArray      |     |       |  FixedArray      |
|  Map              |    |      |------------------|     |       |------------------|
|-------------------|    |      |  Map             |     |       |  Map             |
|  Extra Properties |----+      |------------------|     |       |------------------|
|-------------------|           |  Length          |     |       |  Length          |
|  Elements         |------+    |------------------|     |       |------------------|
|-------------------|      |    |  Property "poo"  |     |       |  Property "0"    |
|  Property "foo"   |      |    |------------------|     |       |------------------|
|-------------------|      |    |  Property "baz"  |     |       |  Property "1"    |
|  Property "bar"   |      |    +__________________+     |       +__________________+
+___________________+      |                             |
                           |                             |
                           |                             |
                           +-----------------------------+
  • above shows most common optimized representation
  • most objects contain all their properties in single block of memory "foo", "bar"
  • all blocks have a Map property describing their structure
  • named properties that don't fit are stored in overflow array "poo", "baz"
  • numbered properties are stored in a separate contiguous array "1", "2"

Object Properties

read Some surprising properties of properties

  • object is a collection of properties aka key-value pairs
  • property names are always strings
  • any name used as property name that is not a string is stringified via .toString(), even numbers, so 1 becomes "1"
  • Arrays in JavaScript are just objects with magic length property

Hash Tables

  • hash table used for difficult objects
  • aka objects in dictionary mode
  • accessing hash table property is much slower than accessing a field at a known offset
  • if non-symbol string is used to access a property it is uniquified first
  • v8 hash tables are large arrays containing keys and values
  • initially all keys and values are undefined

Key Value Insertion

  • on key-vaule pair insertion the key's hash code is computed
  • low bits of hash code are used as initial insertion index
  • if that slot is taken the hash table attempts to insert it at next index (modulo length) and so on

Key Value Retrieval

  • computing hash code and comparing keys for equality is commonly a fast operation
  • still slow to execute these non-trivial routines on every property read/write
  • v8 avoids hash table representation whenever possible

Fast, In-Object Properties

read Fast, in-object properties | read

  • v8 describes the structure of objects using maps used to create hidden classes and match data types
    • resembles a table of descriptors with one entry for each property
    • map contains info about size of the object
    • map contains info about pointers to constructors and prototypes
    • objects with same structure share same map
  • objects created by the same constructor and have the same set of properties assigned in the same order
    • have regular logical structure and therefore regular structure in memory
    • share same map
  • adding new property is handled via transition descriptor
    • use existing map
    • transition descriptor points at other map

Assigning Properties inside Constructor Call

function Point () {
  // Map M0
  //  "x": Transition to M1 at offset 12

  this.x = x;

  // Map M1
  //  "x": Field at offset 12
  //  "y": Transition to M2 at offset 16

  this.y = y;

  // Map M2
  //  "x": Field at offset 12
  //  "y": Field at offset 16
}
  • Point starts out without any fields with M0
  • this.x =x -> map pointer set to M1 and value x is stored at offset 12 and "x" Transition descriptor added to M0
  • this.y =y -> map pointer set to M2 and value y is stored at offset 16 and "y" Transition descriptor added to M1

Assigning More Properties Later

var p = new Point();

// Map M2
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Transition at offset 20

p.z = z;

// Map M3
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Field at offset 20
  • assigning z later
    • create M3, a copy of M2
    • add Transition descriptor to M2
    • add Field descriptor to M3

Assigning Same Properties in Different Order

function Point(x, y, reverse) {
  // Map M0
  //  "x": Transition to M1 at offset 12ak
  //  "y": Transition to M2 at offset 12
  if (reverse) {
    // variation 1

    // Map M1
    //  "x": Field at offset 12
    //  "y": Transition to M4 at offset 16

    this.x = x;

    // Map M4
    //  "x": Field at offset 12
    //  "y": Field at offset 16

    this.y = y;
  } else {
    // variation 2

    // Map M2
    //  "y": Field at offset 12
    //  "x": Transition to M5 at offset 16

    this.y = x;

    // Map M5
    //  "y": Field at offset 12
    //  "x": Field at offset 16
    this.x = y;
  }
}
  • both variations share M0 which has two transitions
  • not all Points share same map
  • in worse cases v8 drops object into dictionary mode in order to prevent huge number of maps to be allocated
    • when assigning random properties to objects from same constructor in random order
    • when deleting properties

In-object Slack Tracking

read In-object slack tracking

  • objects allocated by a constructor are given enough memory for 32 fast properties to be stored
  • after certain number of objects (8) were allocated from same constructor
    • v8 traverses transition tree from initial map to determine size of largest of these initial objects
    • new objects of same type are allocated with exact amount of memory to store max number of properties
    • initial objects are resized (down)

Methods And Prototypes

read Methods and prototypes

Assigning Functions to Properties

function Point () {
  // Map M0
  //  "x": Transition to M1 at offset 12

  this.x = x;

  // Map M1
  //  "x": Field at offset 12
  //  "y": Transition to M2 at offset 16

  this.y = y;

  // Map M2
  //  "x": Field at offset 12
  //  "y": Field at offset 16
  // "distance": Transition to M3 <pointDistance>

  this.distance = pointDistance;

  // Map M3
  //  "x": Field at offset 12
  //  "y": Field at offset 16
  // "distance": Constant_Function <pointDistance>
}

function pointDistance(p) { /* calculates distance */ }
  • properties pointing to Functions are handled via constant functions descriptor
  • constant_function descriptor indicates that value of property is stored with descriptor itself rather than in the object
  • pointers to functions are directly embedded into optimized code
  • if distance is reassigned, a new map has to be created since the Transition breaks

Assigning Functions to Prototypes

function Point(x, y) {
  this.x = x;
  this.y = y;
}

Point.prototype.pointDistance = function () { /* calculates distance */ }
  • v8 represents prototype methods using constant_function descriptors
  • calling prototype methods maybe a tiny bit slower due to overhead of the following:
    • check receiver's map (as with own properties)
    • check maps of prototype chain (extra step)
  • the above lookup overhead won't make measurable performance difference and shouldn't impact how you write code

Numbered Properties

read Numbered properties: fast elements

see Arrays

  • numbered properties are treated and ordered differently than others since any object can behave like an array
  • element === any property whose key is non-negative integer
  • v8 stores elements separate from named properties in an elements kind field (see structure diagram)
  • if object drops into dictionary mode for elements, access to named properties remains fast and vice versa
  • maps don't need transitions to maps that are identical except for element kinds
  • most elements are fast elements which are stored in a contiguous array

Arrays

watch | slide

read Numbered properties: fast elements

[read]((https://v8.dev/blog/fast-properties)

  • v8 has two methods for storing arrays, fast elements and dictionary elements

Fast Elements

see Numbered Properties

  • compact keysets
  • linear storage buffer

Characteristics

  • contiguous (non-sparse)
  • 0 based
  • smaller than 100K elements see this test

Packed vs. Holey Elements

  • v8 makes distinction whether the elements backing store is packed or has holes
  • holes in a backing store are created by deleting an indexed element
  • missing properties are marked with special hole value to keep Array functions performant
  • however missing properties cause expensive lookups on prototype chain
  • in the past reading beyond the length of an array made it holey, however this has been fixed and is no longer a problem (TODO: resource)

Elements Kinds

read

  • fast elements kinds in order of increasing generality:
    • fast SMIs (small integers)
    • fast doubles (Doubles stored in unboxed representation)
    • fast values (strings or other objects)
Elements Kind Lattice
+--------------------+ 
| PACKED_SMI_ELEMENT |---+ 
+--------------------+   |    +------------------------+
          |              +--->| PACKED_DOUBLE_ELEMENTS |---+ 
          ↓                   +------------------------+   |    +-------------------+
+--------------------+                   |                 +--->|  PACKED_ELEMENTS  |
| HOLEY_SMI_ELEMENTS |---+               ↓                      +-------------------+
+--------------------+   |    +------------------------+                   |
                         +--->|  HOLEY_DOUBLE_ELEMENTS |---+               ↓
                              +------------------------+   |    +-------------------+
                                                           +--->|  HOLEY_ELEMENTS   |
                                                                +-------------------+
  • can only transition downwards through the lattice
  • more specific elements kinds enable more fine-grained optimizations

Dictionary Elements

see Hash Tables

  • hash table storage
  • slow access

Characteristics

  • sparse
  • large

Double Array Unboxing

watch | slide

  • Array's hidden class tracks element types
  • if all doubles, array is unboxed aka upgraded to fast doubles
    • wrapped objects layed out in linear buffer of doubles
    • each element slot is 64-bit to hold a double
    • SMIs that are currently in Array are converted to doubles
    • very efficient access
    • storing requires no allocation as is the case for boxed doubles
    • causes hidden class change
    • requires expensive copy-and-convert operation
  • careless array manipulation may cause overhead due to boxing/unboxing watch | slide

Typed Arrays

blog | spec

  • difference is in semantics of indexed properties
  • v8 uses unboxed backing stores for such typed arrays

Float64Array

  • gets 64-bit allocated for each element

Considerations

  • once array is marked as holey it is holey forever
  • don't pre-allocate large arrays (>=100K elements), instead grow as needed, to avoid them being considered sparse
  • do pre-allocate small arrays to correct size to avoid allocations due to resizing
  • avoid createing holes, and thus don't delete elements
  • don't load uninitialized or deleted elements watch | slide
  • use literal initializer for Arrays with mixed values
  • don't store non-numeric values in numeric arrays
    • causes boxing and efficient code that was generated for manipulating values can no longer be used
  • use typed arrays whenever possible especially when performing mathematical operations on an array of numbers
  • copying an array, you should avoid copying from the back (higher indices to lower indices) because this will almost certainly trigger dictionary mode
  • avoid elements kind transitions, i.e. edge case of adding -0, NaN, Infinity to a SMI array as they are represented as doubles

Strings

  • string representation and how it maps to each bit
map |len |hash|characters
0123|4567|8901|23.........
  • contain only data (no pointers)
  • content not tagged
  • immutable except for hashcode field which is lazily computed (at most once)

Resources