Set of reusable big data real time streaming algorithms. Can be used by Spark Streaming, Storm or any other stream computation framework
- Plain java API that can be used from any stream computation framework
The following blogs of mine are good source of details. These are the only source of detail documentation
- http://pkghosh.wordpress.com/2014/09/10/realtime-trending-analysis-with-approximate-algorithms/
- http://pkghosh.wordpress.com/2014/10/05/tracking-web-site-bounce-rate-in-real-time/
- https://pkghosh.wordpress.com/2015/02/19/real-time-detection-of-outliers-in-sensor-data-using-spark-streaming/
- https://pkghosh.wordpress.com/2016/09/19/alarm-flooding-control-with-event-clustering-using-spark-streaming/
- Probabilstic frequent count with sketches and count based algorithms
- Probabilstic cardinality or unique item count
- Probabilstic set inclusion
- Different sampling methods
- Windowing including simple stats
- Pattern detection
- Event cluster detection
Project's resource directory has various tutorial documents for the use cases described in the blogs.
Please feel free to email me at [email protected]