GitHub - bikash/R2Time: R connector for OpenTSDB: Analyzing large time-series data in R environment using data-intensive capabilities.

R2Time connector for R to Hbase for time-series data stored by OpenTSDB.
Analyzing time-series datasets at a massive scale is one of the biggest challenges that data scientists are facing. This implementation of a tool is used for analyzing large time-series data. It describes a way to analyze the data stored by OpenTSDB. OpenTSDB is an open source distributed and scalable time series database. Currently tools available for time-series analysis are time and memory consuming. Moreover, no single tool exists that specializes on providing an efficient implementations of analyzing time-series data through MapReduce programming model at massive scale. For these reason, we have designed an efficient and distributed computing framework - R2Time.

###Implementation for R2Time:###

Master Thesis: 
http://brage.bibsys.no/xmlui/handle/11250/181819

Published Paper CloudCom 2014:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7037792

##Prerequisite Installation for R2Time.##

RHIPE Installation steps are mention in the following link: https://www.datadr.org/install.html
OpenTSDB Installation steps are mention in the following link: http://opentsdb.net/docs/build/html/installation.html#id1

Check OpenTSDB is runing successfully.

http://127.0.0.1:4242

##Installation of R2Time.##

1. $ git clone https://github.com/bikash/R2Time.git
2. $ cd R2Time
3. $ R CMD INSTALL r2time_1.0.tar.gz

To run R2Time it is necessary to have r2time.jar, hbase.jar, zookeeper.jar, asynchbase.jar to your HDFS location. Using rhput command from RHIPE, we can copy to HDFS location.

$ R
> library(RHIPE)
> rhinit()
> rhput("src_location_hbase_jar", "hdfs_location")
> rhput("src_location_zookeeper_jar", "hdfs_location")
> rhput("src_location_r2time_jar", "hdfs_location")
> rhput("src_location_asynchbase_jar", "hdfs_location")

R2time.jar can be downloaded from GitHUB

###Example:### Now running simple count example in R2Time.

#Load all the necessary libraries
library(r2time)
library(Rhipe)
rhinit()	## Initialize rhipe framework.
library(rJava)
r2t.init()  ## Initialize R2Time  framework.
library(bitops) ## Load library for bits operation, It is used for conversion between float and integer numbers.
library(gtools)


tagk = c("host") ## Tag keys. It could be list
tagv = c("*")	## Tag values. It could be list or can be separate multiple by pipe
metric = 'r2time.load.test1' ## Assign multiple metrics
startdate ='2000/01/01-00:00:00' ## Start datetime of timeseries
enddate ="2003/01/31-10:00:00"   ## END datetime of timeseries
outputdir = "/home/bikash/tmp/mean/ex1.1" ## Output file location , should be in HDFS file system.
jobname= "Calculation for number of DP for 75 million Data points with 2 node" ## Assign relevant job description name.
mapred <- list(mapred.reduce.tasks=0) ## Mapreduce configuration, you can assign number of mapper and reducer for a task. For this case is 0, no reducer is required.
#Location of jar file in HDFS file system. Replace "/home/ekstern/haisen/bikash/tmp/r2time.jar" with your required hdfs_location of jar files.
jars=c("/home/ekstern/haisen/bikash/tmp/r2time.jar","/home/ekstern/haisen/bikash/tmp/zookeeper.jar", "/home/ekstern/haisen/bikash/tmp/hbase.jar")
# This jars need to be in HDFS file system. You can copy jar in HDFS using RHIPE rhput command
 
## Assign Zookeeper configuration. For HBase to read data, zookeeper quorum must be define.
zooinfo=list(zookeeper.znode.parent='/hbase',hbase.zookeeper.quorum='haisen24.ux.uis.no')
 
## running map function
map <- expression({
	library(bitops)
	library(r2time)
	library(gtools)
	len <- 0
	m <- lapply(seq_along(map.values), function(r) {
	 	attr <- names(map.values[[r]]);
		leng <-  length(attr)
		})
	 	rhcollect(1,sum(unlist(m)))
 })

#Reduce function 
reduce <- expression(
	pre={ len <- 0 }, 
	reduce={ len <- len + sum(sapply(reduce.values, function(x) sum(x))) }
	post={ rhcollect(reduce.key, len)
})
 
## Run job.
r2t.job(table='tsdb',sdate=startdate, edate=enddate, metrics=metric, tagk=tagk, tagv=tagv, jars=jars, zooinfo=zooinfo,	output=outputdir, jobname=jobname, mapred=mapred, map=map, reduce=reduce, setup=NULL)
t = rhread(outputdir)

LIST OF FUNCTIONS

r2t.job: Submit job to MapReduce. Input Parameters:

table = name of the table by default 'tsdb'
sdate = Start data of metric
edate = End date of Metric
Metrics= Name of metric
tagk = List of tag keys
tagv = List of tag values
jars = list of jars files in HDFS location
output = Path of output result to be stored in HDFS.
zooinfo = Zookeeper informations
jobname = Name of job by default MapReduce Job
map = name of map function
reduce = name of reduce function, if no reduce assign 0
setup = initialization function need before map function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LIST OF FUNCTIONS

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
R		R
examples		examples
lib		lib
opentsdb		opentsdb
r2time		r2time
src		src
CONTRIBUTORS		CONTRIBUTORS
README.md		README.md
exampleSpark.md		exampleSpark.md
r2time_1.0.tar.gz		r2time_1.0.tar.gz

bikash/R2Time

Folders and files

Latest commit

History

Repository files navigation

LIST OF FUNCTIONS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages