Data Types

find the previous version of this document at crankshaft/data-types.md

Table of Contents generated with DocToc

Efficiently Representing Values and Tagging
- Considerations
Objects
Arrays
Strings
Resources

Efficiently Representing Values and Tagging

read Numbered properties: fast elements

most objects in heap are 4-byte aligned
according to spec all numbers in JS are 64-bit floating doubles
V8 passes around 32-bit numbers to represent all values for improved efficiency
bottom bit reserved as tag to signify if value is a Smi (small integer) or a pointer to an object

| object pointer              | 1 |

or

| 31-bit-signed integer (Smi) | 0 |

numbers bigger than 31 bits are boxed
stored inside an object referenced via a pointer
adds extra overhead (at a minimum an extra lookup)
on 64 bit architectures Smis are 32-bit-signed instead of the 31-bit-signed on 32 bit architectures

Considerations

prefer Smis for numeric values whenever possible

Objects

Structure

+-------------------+
|  Object           |    +----> +------------------+     +---->  +------------------+
|-------------------|    |      |  FixedArray      |     |       |  FixedArray      |
|  Map              |    |      |------------------|     |       |------------------|
|-------------------|    |      |  Map             |     |       |  Map             |
|  Extra Properties |----+      |------------------|     |       |------------------|
|-------------------|           |  Length          |     |       |  Length          |
|  Elements         |------+    |------------------|     |       |------------------|
|-------------------|      |    |  Property "poo"  |     |       |  Property "0"    |
|  Property "foo"   |      |    |------------------|     |       |------------------|
|-------------------|      |    |  Property "baz"  |     |       |  Property "1"    |
|  Property "bar"   |      |    +__________________+     |       +__________________+
+___________________+      |                             |
                           |                             |
                           |                             |
                           +-----------------------------+

above shows most common optimized representation
most objects contain all their properties in single block of memory "foo", "bar"
all blocks have a Map property describing their structure
named properties that don't fit are stored in overflow array "poo", "baz"
numbered properties are stored in a separate contiguous array "1", "2"

Object Properties

read Some surprising properties of properties

object is a collection of properties aka key-value pairs
property names are always strings
any name used as property name that is not a string is stringified via .toString(), even numbers, so 1 becomes "1"
Arrays in JavaScript are just objects with magic length property

Hash Tables

hash table used for difficult objects
aka objects in dictionary mode
accessing hash table property is much slower than accessing a field at a known offset
if non-symbol string is used to access a property it is uniquified first
V8 hash tables are large arrays containing keys and values
initially all keys and values are undefined

HashTables and Hash Codes

on key-vaule pair insertion the key's hash code is computed
computing hash code and comparing keys for equality is commonly a fast operation
still slow to execute these non-trivial routines on every property read/write
data structures such as Map, Set, WeakSet and WeakMap use hash tables under the hood
a hash function returns a hash code for given keys which is used to map them to a location in the hash table
hash code is a random number (independent of object value) and thus needs to be stored
storing the hash code as private symbol on the object, like was done previously, resulted in a variety of performance problems
- led to slow megamorphic IC lookups of the hash code and
- triggered hidden class transition in the key on storing the hash code
performance issues were fixed (~500% improvement for Maps and Sets) by hiding the hashcode and storing it in unused memory space that is connected to the JSObject
- if properties backing store is empty: directly stored in the offset of JSObject
- if properties backing store is array: stored in extranous 21 bits of 31 bits to store array length
- if properties backing store is dictionary: increase dictionary size by 1 word to store hashcode in dedicated slot at the beginning of the dictionary

Resources

Optimizing hash tables: hiding the hash code - 2018

Fast, In-Object Properties

read Fast, in-object properties | read

V8 describes the structure of objects using maps used to create hidden classes and match data types
- resembles a table of descriptors with one entry for each property
- map contains info about size of the object
- map contains info about pointers to constructors and prototypes
- objects with same structure share same map
objects created by the same constructor and have the same set of properties assigned in the same order
- have regular logical structure and therefore regular structure in memory
- share same map
adding new property is handled via transition descriptor
- use existing map
- transition descriptor points at other map

Assigning Properties inside Constructor Call

class Point {
  constructor(x, y) {
    // Map M0
    //  "x": Transition to M1 at offset 12

    this.x = x

    // Map M1
    //  "x": Field at offset 12
    //  "y": Transition to M2 at offset 16

    this.y = y

    // Map M2
    //  "x": Field at offset 12
    //  "y": Field at offset 16
  }
}

Point starts out without any fields with M0
this.x =x -> map pointer set to M1 and value x is stored at offset 12 and "x" Transition descriptor added to M0
this.y =y -> map pointer set to M2 and value y is stored at offset 16 and "y" Transition descriptor added to M1

Assigning More Properties Later

var p = new Point(1, 2)

// Map M2
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Transition at offset 20

p.z = z

// Map M3
//  "x": Field at offset 12
//  "y": Field at offset 16
//  "z": Field at offset 20

assigning z later
- create M3, a copy of M2
- add Transition descriptor to M2
- add Field descriptor to M3

Assigning Same Properties in Different Order

class Point {
  constructor(x, y, reverse) {
    // Map M0
    //  "x": Transition to M1 at offset 12ak
    //  "y": Transition to M2 at offset 12
    if (reverse) {
      // variation 1

      // Map M1
      //  "x": Field at offset 12
      //  "y": Transition to M4 at offset 16

      this.x = x

      // Map M4
      //  "x": Field at offset 12
      //  "y": Field at offset 16

      this.y = y
    } else {
      // variation 2

      // Map M2
      //  "y": Field at offset 12
      //  "x": Transition to M5 at offset 16

      this.y = x

      // Map M5
      //  "y": Field at offset 12
      //  "x": Field at offset 16
      this.x = y
    }
  }
}

both variations share M0 which has two transitions
not all Points share same map
in worse cases V8 drops object into dictionary mode in order to prevent huge number of maps to be allocated
- when assigning random properties to objects from same constructor in random order
- when deleting properties

In-object Slack Tracking

read In-object slack tracking

objects allocated by a constructor are given enough memory for 32 fast properties to be stored
after certain number of objects (8) were allocated from same constructor
- V8 traverses transition tree from initial map to determine size of largest of these initial objects
- new objects of same type are allocated with exact amount of memory to store max number of properties
- initial objects are resized (down)

Methods And Prototypes

read Methods and prototypes

Assigning Functions to Properties

function pointDistance(p) { /* calculates distance */ }

class Point {
  constructor(x, y) {
    // Map M0
    //  "x": Transition to M1 at offset 12

    this.x = x

    // Map M1
    //  "x": Field at offset 12
    //  "y": Transition to M2 at offset 16

    this.y = y

    // Map M2
    //  "x": Field at offset 12
    //  "y": Field at offset 16
    // "distance": Transition to M3 <pointDistance>

    this.distance = pointDistance

    // Map M3
    //  "x": Field at offset 12
    //  "y": Field at offset 16
    // "distance": Constant_Function <pointDistance>
  }
}

properties pointing to Functions are handled via constant functions descriptor
constant_function descriptor indicates that value of property is stored with descriptor itself rather than in the object
pointers to functions are directly embedded into optimized code
if distance is reassigned, a new map has to be created since the Transition breaks

Assigning Functions to Prototypes

class Point {
  constructor(x, y) {
    this.x = x
    this.y = y
  }

  pointDistance() { /* calculates distance */ }
}

V8 represents prototype methods (aka class methods) using constant_function descriptors
calling prototype methods maybe a tiny bit slower due to overhead of the following:
- check receiver's map (as with own properties)
- check maps of prototype chain (extra step)
the above lookup overhead won't make measurable performance difference and shouldn't impact how you write code

Numbered Properties

read Numbered properties: fast elements

see Arrays

numbered properties are treated and ordered differently than others since any object can behave like an array
element === any property whose key is non-negative integer
V8 stores elements separate from named properties in an elements kind field (see structure diagram)
if object drops into dictionary mode for elements, access to named properties remains fast and vice versa
maps don't need transitions to maps that are identical except for element kinds
most elements are fast elements which are stored in a contiguous array

Arrays

watch | slide

read Numbered properties: fast elements

read

V8 has two methods for storing arrays, fast elements and dictionary elements

Fast Elements

see Numbered Properties

compact keysets
linear storage buffer
contiguous (non-sparse)
0 based
smaller than 64K

Dictionary Elements

hash table storage
slow access
sparse
large

Packed vs. Holey Elements

V8 makes distinction whether the elements backing store is packed or has holes
holes in a backing store are created by deleting an indexed element
missing properties are marked with special hole value to keep Array functions performant
however missing properties cause expensive lookups on prototype chain

Elements Kinds

read

fast elements kinds in order of increasing generality:
- fast Smis (small integers)
- fast doubles (Doubles stored in unboxed representation)
- fast values (strings or other objects)

Elements Kind Lattice

+--------------------+
| PACKED_SMI_ELEMENT |---+
+--------------------+   |    +------------------------+
          |              +--->| PACKED_DOUBLE_ELEMENTS |---+
          ↓                   +------------------------+   |    +-------------------+
+--------------------+                   |                 +--->|  PACKED_ELEMENTS  |
| HOLEY_SMI_ELEMENTS |---+               ↓                      +-------------------+
+--------------------+   |    +------------------------+                   |
                         +--->|  HOLEY_DOUBLE_ELEMENTS |---+               ↓
                              +------------------------+   |    +-------------------+
                                                           +--->|  HOLEY_ELEMENTS   |
                                                                +-------------------+

Double Array Unboxing

watch | slide

Array's hidden class tracks element types
if all doubles, array is unboxed aka upgraded to fast doubles
- wrapped objects layed out in linear buffer of doubles
- each element slot is 64-bit to hold a double
- Smis that are currently in Array are converted to doubles
- very efficient access
- storing requires no allocation as is the case for boxed doubles
- causes hidden class change
- requires expensive copy-and-convert operation
careless array manipulation may cause overhead due to boxing/unboxing watch | slide

Typed Arrays

blog | spec

difference is in semantics of indexed properties
V8 uses unboxed backing stores for such typed arrays

Float64Array

gets 64-bit allocated for each element

Considerations

once array is marked as holey it is holey forever
don't pre-allocate large arrays (>64K), instead grow as needed, to avoid them being considered sparse
do pre-allocate small arrays to correct size to avoid allocations due to resizing
avoid creating holes, and thus don't delete elements
don't load uninitialized or deleted elements watch | slide
use literal initializer for Arrays with mixed values
don't store non-numeric values in numeric arrays
- causes boxing and efficient code that was generated for manipulating values can no longer be used
use typed arrays whenever possible especially when performing mathematical operations on an array of numbers
when copying an array, you should avoid copying from the back (higher indices to lower indices) because this will almost certainly trigger dictionary mode
avoid elements kind transitions, i.e. edge case of adding -0, NaN, Infinity to a Smi array as they are represented as doubles

Strings

string representation and how it maps to each bit

map |len |hash|characters
0123|4567|8901|23.........

contain only data (no pointers)
content not tagged
immutable except for hashcode field which is lazily computed (at most once)

Resources

video: breaking the javascript speed limit with V8 | slides
tour of V8: garbage collection - 2013
tour of V8: object representation - 2013
V8-design
Fast Properties in V8 - 2017
“Elements kinds” in V8 - 2017
video: V8 internals for JavaScript developers - 2018
slides: V8 internals for JavaScript developers - 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-types.md

data-types.md

Data Types

Efficiently Representing Values and Tagging

Considerations

Objects

Structure

Object Properties

Hash Tables

HashTables and Hash Codes

Resources

Fast, In-Object Properties

Assigning Properties inside Constructor Call

Assigning More Properties Later

Assigning Same Properties in Different Order

In-object Slack Tracking

Methods And Prototypes

Assigning Functions to Properties

Assigning Functions to Prototypes

Numbered Properties

Arrays

Fast Elements

Dictionary Elements

Packed vs. Holey Elements

Elements Kinds

Elements Kind Lattice

Double Array Unboxing

Typed Arrays

Float64Array

Considerations

Strings

Resources

Files

data-types.md

Latest commit

History

data-types.md

File metadata and controls

Data Types

Efficiently Representing Values and Tagging

Considerations

Objects

Structure

Object Properties

Hash Tables

HashTables and Hash Codes

Resources

Fast, In-Object Properties

Assigning Properties inside Constructor Call

Assigning More Properties Later

Assigning Same Properties in Different Order

In-object Slack Tracking

Methods And Prototypes

Assigning Functions to Properties

Assigning Functions to Prototypes

Numbered Properties

Arrays

Fast Elements

Dictionary Elements

Packed vs. Holey Elements

Elements Kinds

Elements Kind Lattice

Double Array Unboxing

Typed Arrays

Float64Array

Considerations

Strings

Resources