Skip to content
Leonya edited this page Nov 27, 2014 · 29 revisions

Introduction to JetBrains Xodus

Welcome to JetBrains Xodus. These wiki pages provide a brief introduction to Xodus concepts and features. Xodus stands for exodus, whatever that means, and is pronounced as exodus. Xodus is licensed under the Apache License, Version 2.0.

Overview
    Snapshot Isolation
    Garbage Collector
    Performance

[Kinds of API](https://github.com/JetBrains/xodus/wiki#kinds-of-api)
    [Environments](https://github.com/JetBrains/xodus/wiki#environments)
    [Entity Stores](https://github.com/JetBrains/xodus/wiki#entity-stores)
    [Virtual File Systems](https://github.com/JetBrains/xodus/wiki#virtual-file-systems)

[Getting Started](https://github.com/JetBrains/xodus/wiki#getting-started)
    [Managing Dependencies](https://github.com/JetBrains/xodus/wiki#managing-dependencies)

Overview

Xodus is a transactional schema-less embedded high-performing database written in Java. It is being successfully used in several JetBrains server-side products – YouTrack is one of them.

  • It is written in Java which means that Xodus will run on any platform that is able to run a Java virtual machine.
  • Xodus transactions have a full set of properties that guarantee reliability: atomicity, consistency, isolation & durability. Therefore Xodus is a general-purpose database that can be used in traditional database applications having high requirements for consistency and isolation.
  • On the other hand, Xodus is schema-less which makes it different from traditional database applications that require a schema. Xodus is agile to avoid migrations, schema refactorings, etc. That makes developers lives much easier when applications are required to be compatible with different versions of the database.
  • Embedded database runs inside your application. Main features of embedded database are zero deployment and zero administration. It requires no dedicated server to store and access data. Applications that use Xodus have no overhead on establishing connections with database server, SQL parsing, etc.
Snapshot Isolation

Xodus supports the only isolation level, snapshot isolation. It doesn't allow dirty reads, read-committed, repeatable-read or serializable isolation. In a transaction, snapshot isolation guarantees that all reads will see a consistent snapshot of the whole database.

Snapshot isolation follows from the log-structured design of Xodus. In log-structured databases, all changes are written sequentially to a log. In Xodus, this log is an infinite sequence of .xd files. Any data stored in the log will never be modified, any change is appended to the log creating new version of the data. Any committed transaction creates new snapshot (version) of the database, and any new transaction created just after commit holds (references) this snapshot. Thus, any Xodus database can be represented as a persistent functional data structure which naturally provides lock-free multi-version concurrency control (MVCC).

Garbage Collector

Due to append-only modifications, any modified data record becomes outdated, and it will no longer be used. Thus such records are just garbage, and the database should compact itself in order to keep its suitable physical size. In Xodus, you don't need to worry about that, since it collects garbage in background. In most cases, GC works seamlessly with default settings in a single background thread. GC tries to balance between the need to maintain the database size and the need to affect user transactions as little as possible. In particular cases, GC can be tuned with a set of additional properties.

Performance

Main JetBrains' YouTrack instance contains issues database for more than 10 years. Total number of issues is near one million, physical database size exceeds 80Gb. YouTrack runs on a moderate 8-CPU server with Java heap 20Gb. Xodus provides outstanding performance due to quite compact data storing, lock-free reads and lock-free optimistic writes, intelligent lock-free caching. Xodus is highly concurrent database since it has zero contention of read operations even if there are parallel write operations.

Kinds of API

There are three essentially different ways to deal with data, which give three different kinds of API or API layers.

Environments

Environment is a transactional key-value storage. Any environment consists of arbitrary number of named stores. Key-value pairs can be put, got and deleted from stores. A cursor can be opened over a store. Cursors allow to enumerate key-value pairs in any direction, navigate within store by different criteria. Environments API is the lowest-level API of Xodus, it is used by other layers, Entity Stores and VFS.

Entity Stores

Entity Store describes data model as a kit of typed entities with named properties (attributes) and named entity relations (links). Type of entity is specified during its creation as a string value. A property value can only be java.lang.Comparable. By default only Boolean, Byte, Double, Float, Integer, Long, Short and String property values are supported. Custom Comparable types can be added. Links can be established between arbitrary entities with arbitrary cardinality. There are different kinds of queries: searching for property values, ranges, traversing links, sorting, etc. Complex queries can be composed by means of operations on sets: intersect, union and minus.

Virtual File Systems

Virtual File System (VFS) transactionally deals with files and streams. Here you can create, rename and delete virtual file, and open streams for reading it or writing into it. Xodus has an example of using the VFS layer: class ExodusDirectory from xodus-lucene-directory implements Lucene's core absract class Directory. This allows to put and maintain Lucene text index files directly within a Xodus database.

Getting started

Before you start coding, chose API layer most suitable for your application needs. The choice will determine the set of artifacts which your application will depend on. Whichever API you chose, you have to create an instance of Environment:

final Environment env = Environments.newInstance("/Users/me/.myAppData");

All the Environment data will be physically stored in the /Users/me/.myAppData directory. Then you are to create a named store that will contain your data:

final Store store = env.computeInTransaction(new TransactionalComputable<Store>() {
    @Override
    public Store compute(@NotNull final Transaction txn) {
        return env.openStore("MyStore", StoreConfig.WITHOUT_DUPLICATES, txn);
    }
});

Here a transactional closure is used as the simplest way to manage transactions and updates within transaction. Once you get Store object you can put/get data into/from it. On the Environment layer, all data is binary and untyped, and it is represented by the ByteIterable instances. ByteIterable is a kind of byte array or Iterable<Byte>. Prepare data and proceed with a closure to put it into the store:

final ByteIterable key = StringBinding.stringToEntry("myKey");
final ByteIterable value = StringBinding.stringToEntry("myValue");

env.executeInTransaction(new TransactionalExecutable() {
    @Override
    public void execute(@NotNull final Transaction txn) {
        store.put(txn, key, value);
    }
});

You may notice that here we used TransactionalExecutable closure instead of TransactionalComputable one. The only difference is that TransactionalExecutable doesn't allow to return a value whereas TransactionalComputable does. Then get data like this:

env.computeInReadonlyTransaction(new TransactionalComputable<ByteIterable>() {
    @Override
    public ByteIterable compute(@NotNull final Transaction txn) {
        return store.get(txn, key);
    }
});

Using Environments, you can manage transactions in a more sophisticated manner and implement more complex data interoperation and control flow. After you stopped using Environment invoke env.close().

Managing Dependencies

Xodus artifacts are published to Maven Central repository. Currently, only the "1.0-SNAPSHOT" version is available. To use snapshot versions, you have to specify url of the repository for snapshots. Suppose, you are using Gradle, then it is enough to define repositories in your project as follows:

repositories {
    mavenCentral()  // for release artifacts
    maven {         // for snapshot artifacts
        url 'https://oss.sonatype.org/content/groups/public'
    }
}

To build the sample above, you just have to define dependency on the xodus-environment artifacts:

dependencies {
    compile group: 'org.jetbrains.xodus', name: 'xodus-environment', version: '1.0-SNAPSHOT'
}

A shorter definition is also ok:

dependencies {
    compile 'org.jetbrains.xodus:xodus-environment:1.0-SNAPSHOT'
}

To work with Entity Stores, make you project dependent on the xodus-entity-store artifacts:

dependencies {
    compile 'org.jetbrains.xodus:xodus-entity-store:1.0-SNAPSHOT'
}

If your project essentially consists of API and implementation parts (projects, modules, etc.) you probably should define different dependencies for them. If you use Environments or Entity Stores, it is enough to make your API part dependent on the xodus-openAPI artifacts only:

dependencies {
    compile 'org.jetbrains.xodus:xodus-openAPI:1.0-SNAPSHOT'
}

Xodus openAPI contains definitions of interfaces and abstract classes sufficient to work with these two layers. As for Virtual File System, its interface is quite simple and self-contained and can be described in terms of java.io API, whereas implementation should depend on the xodus-vfs artifacts:

dependencies {
    compile 'org.jetbrains.xodus:xodus-vfs:1.0-SNAPSHOT'
}

This can be presented with the following table:

API Dependencies Implementation Dependencies
Environments xodus-openAPI xodus-environment
Entity Stores xodus-openAPI xodus-entity-store
VFS java.io API xodus-vfs