niceR

Few nice functions for every-day use

Download and Install

Install devtools package

install.packages("devtools")

Load the devtools package

library(devtools)

Install package

install_github("AZhyrenkov/niceR")

Load package

library(niceR)

Strings:

Concatenate strings

x %+% y

Read sql query from file

Only one query per file.

read_query(path) - read single query.

Read all queries from sql directory and execute them:

library(magrittr)
library(purrr)
library(dplyr)
library(RPostgreSQL) #your_database_library
library(stringr)

queries <- list.files('sql/',full.names = T)
all_data <- tibble(queries) %>% 
    rowwise() %>% 
    mutate(query = map_chr(queries, read_query),  # read all queries
           name = str_sub(str_extract(string = queries, '/.+\\.'),2,-2), # extract file name as query name
           result = map(query,~dbGetQuery(db, query)) # execute query
    )

Databases:

Kill all MySQL Connections.

killMySQLConnections() - will print all open connections and close them.

Write data to redshift db by chunks

write_data_redshift(con,df,name, chunk = 100000)

con - RPostgreSQL formal class database connection
df - dataframe with data to write
name - Name of targeted table (will be overwritten)
chunk - chunk-size, default 100000

Read data from database by chunks

read_data(con,chunk = 100000, table, order_column = 'Id')

con - DBI formal class database connection (RMySQL, RPostgreSQL)
table - string with table to read name
order_column - Name of Primary key to sort during splitting by chunks, must be unique.
chunk - chunk-size, default 100000

Copy data from one DB to anoother by chunks

Copy data from DBI-formal-class connection database (tested with MySQL) to redshift by chunks

copy_data(redshift, mysql, from_table, to_table, chunk = 100000, order_column = 'Id', rewrite = T, test = F)

redshift - RPostgreSQL formal class database connection? Targeted database
mysql - DBI formal class database connection (RMySQL, RPostgreSQL), source database
to_table - string with final table name
order_column - Name of Primary key to sort during splitting by chunks, must be unique.
chunk - chunk-size, default 100000
rewrite - If False will append data to existing table
test - If TRUE will be limited to 3 iterations

Biathlon(financial pattern retentions):

groupby-summarise function

Thanks to lxii for contribution!

Note: Dataset in tidy format, months must be factor variables. If not, run something like this: mutate(x, month = factor(month, levels = c('Feb', 'Mar', 'Apr', 'May'), ordered = T)

Using: getBiathlon(x, length)

x - the month or time period factor variable
length - the number of periods to build the biathlon for

Add financial pattern column to dataset

Note: monthly data must be in columns

make_biatlon <- (dataset,columns,base_column)

dataset - input dataframe
columns - vector with strings names of montly data. Without NAs
base_column - column to compare with.

Example: make_biatlon(ds,month_range$MYM[1:4],threshold)

Calculate retentioans by financial pattern segments

get_retentions(biathlon)

biathlon - vector with all finpattern data

Distribution of rows by groups:

Distribution by all available items

pool_ds consist of two columns:

user_id (key for distributed row)
distribution_value - partly missing identificator (branch, region, sales rep etc)

Example:

# Distribution
distribution <- distribute_honestly(pool_ds)
rm(pool_ds)

Gathering results:

# How many rows per one distribution_value
capacity <- distribution$capacity
# final dataset
distribution_result <- distribution$distribution_result

Distribution with precalculated volumes

distribution <- distribute(pool_ds,capacity = capacity)

Error logging

Log errors and warnings to slack channel.

Thanks to lxii for contribution!

Using: errorHandler(code, prefix)

Example:

library(slackr)
slackr_setup(config_file = slackr'), echo = T) # configure slackR bot
errorHandler({
  # some code, connection to database e.g.
}, prefix = 'Connection to database is broken')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

niceR

Download and Install

Strings:

Concatenate strings

Read sql query from file

Databases:

Kill all MySQL Connections.

Write data to redshift db by chunks

Read data from database by chunks

Copy data from one DB to anoother by chunks

Biathlon(financial pattern retentions):

groupby-summarise function

Add financial pattern column to dataset

Calculate retentioans by financial pattern segments

Distribution of rows by groups:

Distribution by all available items

Distribution with precalculated volumes

Error logging

Files

README.md

Latest commit

History

README.md

File metadata and controls

niceR

Download and Install

Strings:

Concatenate strings

Read sql query from file

Databases:

Kill all MySQL Connections.

Write data to redshift db by chunks

Read data from database by chunks

Copy data from one DB to anoother by chunks

Biathlon(financial pattern retentions):

groupby-summarise function

Add financial pattern column to dataset

Calculate retentioans by financial pattern segments

Distribution of rows by groups:

Distribution by all available items

Distribution with precalculated volumes

Error logging