Skip to content
This repository has been archived by the owner on Apr 14, 2020. It is now read-only.
/ tablecloth Public archive

A tool for keeping your interactions with Bigtable nice and tidy.

License

Notifications You must be signed in to change notification settings

ShopRunner/tablecloth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

this is a WIP and subject to changes.

TableCloth

A tool for keeping your interactions with Bigtable nice and tidy.

travisci Greenkeeper badge codecov

UNDER DEVELOPMENT (coming 2019)

This library was originally built for internal use at @Precognitive. We are working on finishing up some abstractions and pulling out some @Precognitive specific code. Once we do so we will release our initial version (2019).

The below document is a high-level look at the API we intend to bake into TableCloth. Some portions of the API could change, as this is just meant to give interested parties a base-level feel for the API and implemntation details.

If you are interested in being contacted/emailed when we release TableCloth, fill out the contact form at https://precognitive.io/contact/ with the message content of "Bigtable" and we will add you to the mailing list.

Why create TableCloth?

We love working with Bigtable! Its ability to predictively scale, handle tens of thousands of requests per second and its low latency I/O (sub 5ms responses). We don't love working with (raw byte) strings as the only data type. Our applications need multiple complex data types and a clean interface for modeling data. We need multiple indexes, our users expect to be able to query off multiple data points. TableCloth was built to meet these requirements (and more).

Features

  • Schema Enforcement - Column Family and Column level data type enforcement.
  • Multiple Data Type Support - Arrays, String, Numbers etc. are all supported out of the box. No more casting, JSON parse errors and other issues with storing byte strings.
  • Multiple Indexes - Bigtable has a single index (the rowKey) and for most applications this just won't work. TableCloth supports multiple indexes out of the box.
  • BigQuery Schema generation - Generate a BigQuery schema to be used when querying Bigtable via BigQuery.
  • Automated migrations - Migrations are run based off schema updates & managed by TableCloth.

NOTE: The documentation below purposefully omits the fact that Bigtable stores multiple versions of column data. An interface will be provided for returning multiple and specific versions as allowed by the Bigtable API.

Example

Below is an example that demonstrates the high-level API for interacting with TableCloth.

const {TableCloth, Schema} = require('@precognitive/tablecloth');

const db = new TableCloth({
  // pass in connection configs i.e. projectId
});

const {ColumnFamilyTypes, DataTypes} = Schema;

const userSchema = Schema({
  id: {
    type: ColumnFamilyTypes.Base,
    columns: {
      userId: {type: DataTypes.String}
    }
  },
  data: {
    type: ColumnFamilyTypes.Base,
    columns: {
      email: {type: DataTypes.String},
      created: {type: DataTypes.DateTime},
      updated: {type: DataTypes.DateTime}
    }
  }
}, {
  rowKey: ['id.userId', 'data.email'],
  indexes: {
    email: ['data.email'],
    userId: ['id.userId'],
    
    // composite index example
    userId_email: ['id.userId', 'data.email']
  }
});

// The above will create a rowKey of `<id.userId>#<data.email>` and three index tables, one for userId, one for email and a composite for userId + email (`<id.userId>#<data.email>`).

// 'users' is name of the table
const User = db.model('users', userSchema);

module.exports = User;

// This will be run in a separate task/file not in the main application.
// NOTE: Due to the nature of TableCloth, escalated permissions are required when intially creating the Base, Schema and Index tables.
User.saveSchema({destroy: true});

API & Features

Design

The API mimics the Mongoose API. This is for a couple reasons:

  • Developers are used to it.
  • Its well developed and meets our needs.
  • You can utilize Bigtable in a manner similar to a document store.
  • We were using Mongo before, so it made sense.

NOTE: TableCloth is NOT API compliant or compatiable but mirrors the feel of the Mongoose API.

Column-Family Types

  • Base - Key/value pairs with the values being any schema.
  • HashMap - All column keys can match a regex pattern, while all values are of the same type.

Column Data Types

  • String
  • Number (we will look to seperate Integer & Float if possible in v1)
  • DateTime
  • Object
  • Array
  • Set
  • Binary
  • Boolean

Nil

In TableCloth, null, undefined or "" are treated as Nil and will not be stored.

BigQuery Schema Generation

An API is provided to generate the necessary BigQuery schema based off the Schema definition.

Example:

const User = require('../models/User.js');

const bigQuerySchemaDefinition = User.generateBigQuery();

Hooks

Hooks are executed at different points during the Model's lifecycle. These hooks can be used to implement custom validation and custom de/serialization.

Hook Lifecycle
preSave This is executed before saving a Model, Column-Family or Column to Bigtable
postSave This is executed after saving a Model, Column-Family or Column to Bigtable
preFetch This is executed before fetching a Model, Column-Family or Column to Bigtable
postFetch This is executed after fetching a Model, Column-Family or Column to Bigtable

Example:

const User = db.model('users', userSchema);

User.preSave(function(data) {
  console.log('fired - presave');
  return data;
});

RowKey Generation

The row key can be defined in the Schema as an Array of Strings or Functions.

Example:

const colSchema = {
  id: {
    type: ColumnFamilyTypes.Base,
    columns: {
      userId: {type: DataTypes.String}
    }
  },
  data: {
    type: ColumnFamilyTypes.Base,
    columns: {
      email: {type: DataTypes.String},
      created: {type: DataTypes.DateTime},
      updated: {type: DataTypes.DateTime}
    }
  }
};

const userSchema1 = Schema(colSchema, {
  rowKey: ['id.userId', 'data.email'],
});

function reverseTimeStamp (cols) {
  return Number.MAX_SAFE_INTEGER - cols.data.created.getTime();
}

// use a function to get/build a reverse timestamp
const userSchema2 = Schema(colSchema, {
  rowKey: ['id.userId', reverseTimeStamp],
});

Indexes

Multiple indexes are supported in TableCloth and can be created based off of the Schema definition.

How they work

When querying via an index table under the hood multiple calls are made:

Example:

// `data` includes the entire `User` record.
const data = await User.findByEmail('18931243-13123-14241');

In the above example, what looks like one call is actually two calls:

  1. Call method "User.findByEmail"
  2. #1 Call - Bigtable query against the "users_email" index table
  3. Query returns the rowKey(s) of the user(s)
  4. #2 Call - Bigtable query made using the rowKey(s)
  5. Query returns the full user record(s)

Special Circumstances: If you are less concerned about storage, consistency etc. and more about latency it is possible to define an index as a "duplicate key". What this will do is instead of storing the data in a seperate index table. The data is stored in the base table with a different rowKey. This effectively copies data mulitple times to the same table.

MetaData & Migrations

At some point Schemas change, data is dropped, new fields are added. This is supported via Lazy Migrations in TableCloth. Every Bigtable Table has a column family named "metadata" that holds different fields that TableCloth uses under the hood (i.e. created_at, updated_at).

Migrations will be handled via the "version" column in "metadata". The version column will be the most recent version for the record, where the record will walk through all the migrations until reaching the desired version.

The API is not fully defined but we plan on supporting:

  • Full ETL based migrations - transforms all data record by record to the desired version.
  • Lazy Migration (Write Only) - will only update the version when writing to the record, will still transform on read but won't save the transformed record.
  • Lazy Migration (Read/Write) - will transform to the desired version and resave no matter if its a read or a write.

Features on Roadmap

  • User Interface for managing Bigtable Models
  • TTL based indexes
  • Immutable Schemas - Prevents unintentional Schema changes/mutations
  • PolyMorphism - allow multiple Schemas on the same Row
  • Python Support
  • Go Support
  • Java Support

Contact Us

If you have other ideas, input etc. feel free to reach out directly at [email protected].

About

A tool for keeping your interactions with Bigtable nice and tidy.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published