Few nice functions for every-day use
- Install devtools package
install.packages("devtools")
- Load the
devtools
package
library(devtools)
- Install package
install_github("AZhyrenkov/niceR")
- Load package
library(niceR)
x %+% y
Only one query per file.
read_query(path)
- read single query.
Read all queries from sql
directory and execute them:
library(magrittr)
library(purrr)
library(dplyr)
library(RPostgreSQL) #your_database_library
library(stringr)
queries <- list.files('sql/',full.names = T)
all_data <- tibble(queries) %>%
rowwise() %>%
mutate(query = map_chr(queries, read_query), # read all queries
name = str_sub(str_extract(string = queries, '/.+\\.'),2,-2), # extract file name as query name
result = map(query,~dbGetQuery(db, query)) # execute query
)
killMySQLConnections()
- will print all open connections and close them.
write_data_redshift(con,df,name, chunk = 100000)
- con - RPostgreSQL formal class database connection
- df - dataframe with data to write
- name - Name of targeted table (will be overwritten)
- chunk - chunk-size, default 100000
read_data(con,chunk = 100000, table, order_column = 'Id')
- con - DBI formal class database connection (RMySQL, RPostgreSQL)
- table - string with table to read name
- order_column - Name of Primary key to sort during splitting by chunks, must be unique.
- chunk - chunk-size, default 100000
Copy data from DBI-formal-class connection database (tested with MySQL) to redshift by chunks
copy_data(redshift, mysql, from_table, to_table, chunk = 100000, order_column = 'Id', rewrite = T, test = F)
- redshift - RPostgreSQL formal class database connection? Targeted database
- mysql - DBI formal class database connection (RMySQL, RPostgreSQL), source database
- to_table - string with final table name
- order_column - Name of Primary key to sort during splitting by chunks, must be unique.
- chunk - chunk-size, default 100000
- rewrite - If
False
will append data to existing table - test - If
TRUE
will be limited to 3 iterations
Thanks to lxii for contribution!
Note: Dataset in tidy format, months must be factor variables. If not, run something like this:
mutate(x, month = factor(month, levels = c('Feb', 'Mar', 'Apr', 'May'), ordered = T)
Using:
getBiathlon(x, length)
- x - the month or time period factor variable
- length - the number of periods to build the biathlon for
Note: monthly data must be in columns
make_biatlon <- (dataset,columns,base_column)
- dataset - input dataframe
- columns - vector with strings names of montly data. Without NAs
- base_column - column to compare with.
Example:
make_biatlon(ds,month_range$MYM[1:4],threshold)
get_retentions(biathlon)
- biathlon - vector with all finpattern data
pool_ds
consist of two columns:
- user_id (key for distributed row)
- distribution_value - partly missing identificator (branch, region, sales rep etc)
Example:
# Distribution
distribution <- distribute_honestly(pool_ds)
rm(pool_ds)
Gathering results:
# How many rows per one distribution_value
capacity <- distribution$capacity
# final dataset
distribution_result <- distribution$distribution_result
distribution <- distribute(pool_ds,capacity = capacity)
Log errors and warnings to slack channel.
Thanks to lxii for contribution!
Using: errorHandler(code, prefix)
Example:
library(slackr)
slackr_setup(config_file = slackr'), echo = T) # configure slackR bot
errorHandler({
# some code, connection to database e.g.
}, prefix = 'Connection to database is broken')