Skip to content

The core reusable library to build a query engine for streaming Big Data without needing to store it

License

Notifications You must be signed in to change notification settings

NathanSpeidel/bullet-core

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bullet Core

Build Status Coverage Status Download

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink. It lets you run queries on this data stream - including hard queries like Count Distincts, Top K etc.

Table of Contents

Background

In Bullet, both the queries and the data flow through the system. There is absolutely no persistence layer! Queries live as long as their duration and operate on data in-memory only. So, the queries in Bullet look forward in time, which is pretty unique for most querying systems.

We created Bullet initially as a simple distributed grep like tool to find events in a click stream (containing high volume - 1 million events per sec -- user interaction data) at Yahoo. In particular, we use it for validating instrumentation that generates these events by interacting with the pages ourselves and finding our own events in this data stream and validate it for the proper key/value pairs. There was nothing as light-weight and cheap as Bullet to do this task. There are many other use-cases for Bullet and indeed, how you use it, depends on your data stream. If you put Bullet on performance metric data, your queries might mostly be finding the 99th percentile of some latency metric etc.

This project is the core library for Bullet that lets us implement Bullet agnostically on any JVM based Stream Processor. See Bullet Storm, which uses this to implement Bullet on Storm and Bullet Spark, on Spark Streaming. This code initially lived inside the Bullet Storm code base up to Bullet Storm Version 0.4.3.

Install

Bullet Core is a library written in Java and published to Bintray and mirrored to JCenter. It is meant to be used to implement Bullet on different Stream Processors or to implement a Bullet PubSub. To see the various versions and set up your project for your package manager (Maven, Gradle etc), see here.

Usage

Once you have added a dependency for Bullet Core, use our abstractions for the PubSub, Parsing, Querying, Windowing, Partitioning, and Sketching as you need to. In particular, see how we abstract running a Bullet Query. You can also look at our reference implementations in Storm and Spark to get a better idea.

Documentation

All documentation is available at Github Pages here.

Links

Quick Links

Contributing

All contributions are welcomed! Feel free to submit PRs for bug fixes, improvements or anything else you like! Submit issues, ask questions using Github issues as normal and we will classify it accordingly. See Contributing for a more in-depth policy. We just ask you to respect our Code of Conduct while you're here.

License

Code licensed under the Apache 2 license. See the LICENSE for terms.

About

The core reusable library to build a query engine for streaming Big Data without needing to store it

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%