ROBUSTA

Author: Eitan Hemed

robusta is a statistics package in Python3 providing an interface to many common statistical analyses, performed using through R and RPY2.

PLEASE NOTE robusta is under active development and is supplied as-is with no guarantees.

Installation

Install with pip using pip install robusta-stats, see also Installation.

Documentation

See here.

Usage

For the most recent, thorough tutorial in the different features of robusta, head on to Google Colab.

Some of the features are shown below.

Importing the library and loading data

This could take ~15 seconds as many R libraries are imported under the hood. If you begin with an empty R environment the first you import robusta should take at least a couple of minutes, as R dependencies will be installed.

import robusta as rst

First, define a helper function used to pretty-print output of dataframes when converting the notebook to .md (credit).

from tabulate import tabulate
import IPython.display as d

def md_print_df(df):
    md = tabulate(df, headers='keys', tablefmt='pipe')
    md = md.replace('|    |','| %s |' % (df.index.name if df.index.name else ''))
    return d.Markdown(md)

First off, we need data. Using robusta we can import R built-in and some imported datasets. You can get a full list of the datasets, similarly to calling to data() with no input arguments in R.

md_print_df(rst.get_available_datasets().tail())

	Package	Item	Description
284	ARTool	Higgins1990Table5	Split-plot Experiment Examining Effect of Moisture and Fertilizer on Dry Matter in Peat Pots
285	ARTool	Higgins1990Table1.art	Aligned Rank Transformed Version of Higgins1990Table1
286	ARTool	Higgins1990Table1	Synthetic 3x3 Factorial Randomized Experiment
287	ARTool	ElkinABC	Synthetic 2x2x2 Within-Subjects Experiment
288	ARTool	ElkinAB	Synthetic 2x2 Within-Subjects Experiment

We can import a dataset using rst.load_dataset

iris = rst.load_dataset('iris')
md_print_df(iris.head())

	dataset_rownames	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
0	1	5.1	3.5	1.4	0.2	setosa
1	2	4.9	3	1.4	0.2	setosa
2	3	4.7	3.2	1.3	0.2	setosa
3	4	4.6	3.1	1.5	0.2	setosa
4	5	5	3.6	1.4	0.2	setosa

Running statistical analyses

Analyses are performed through using designated model objects that also store the . The model objects are returned through calls to the function API. In this example we create a model (m) object by calling t2samples. m will be used to fit the statistical model, returning the results object.

Here is a paired-samples t-test using the Students' sleep dataset previously loaded:

# Create the model
m = rst.groupwise.T2Samples(
    data=rst.load_dataset('sleep'), independent='group', 
    dependent='extra', subject='ID', paired=True, tail='less')

# Dataframe format of the results
md_print_df(m.report_table())

	t	df	p-value	Cohen-d Low	Cohen-d	Cohen-d High
1	-4.06213	9	0.00141645	-2.11801	-1.28456	-0.414622

# Textual report of the results - copy and paste into your results section!
m.report_text()

't(9) = -4.06, p = 0.001'

We can reset the models in order to update the model parameters and re-fit it. In this example, we run the same model an an independent samples t-test:

m.reset(paired=False, assume_equal_variance=True, refit=True)
md_print_df(m.report_table())

	t	df	p-value	Cohen-d Low	Cohen-d	Cohen-d High
1	-1.86081	18	0.0395934	-1.73882	-0.832181	0.0954595

Bayesian t-tests

bayes_t2samples and bayes_t1sample allow you to calculate Bayes factors or sample from the posterior distribution:

m = rst.groupwise.BayesT2Samples(
        data=rst.load_dataset('mtcars'), subject='dataset_rownames',
        dependent='mpg', independent='am', prior_scale=0.5,
        paired=False)

md_print_df(m.report_table())

	model	bf	error
0	Alt., r=0.5	71.3861	7.97835e-07

# Test different null intervals and prior values:
m.reset(prior_scale=0.1, null_interval=[0, 0.5], refit=True)
print(f'{m.report_text()}\n\n')
md_print_df(m.report_table())

Alt., r=0.1 [BF1:0 = 18.64, Error = 0.001%]

	model	bf	error
0	Alt., r=0.1	18.6411	2.33663e-05

Analysis of variance

use Anova to run between, within or mixed-design ANOVA, we load the anxiety dataset for the next demonstrations.

For non-parametric ANOVAs see KruskalWallisTest, FriedmanTest and AlignedRanksTest

# Load the dataset and modify it from a 'wide' to 'long' format dataframe
anxiety = rst.load_dataset('anxiety').set_index(['id', 'group']
                                           ).filter(regex='^t[1-3]$').stack().reset_index().rename(
    columns={0: 'score',
             'level_2': 'time'})
md_print_df(anxiety.head())

	id	group	time	score
0	1	grp1	t1	14.1
1	1	grp1	t2	14.4
2	1	grp1	t3	14.1
3	2	grp1	t1	14.5
4	2	grp1	t2	14.6

m = rst.groupwise.Anova(
        data=anxiety, subject='id',
        dependent='score', between='group', within='time')
md_print_df(m.report_table())

R[write to console]: Contrasts set to contr.sum for the following variables: group

	Term	p-value	Partial Eta-Squared	F	df1	df2
1	group	0.019	0.172	4.35	2	42
2	time	0.001	0.904	394.91	1.79	75.24
3	group:time	0.001	0.84	110.19	3.58	75.24

Similarly, we run the model usign only the between subject term (group). As the model was already generated we can simpyl drop the within-subject term:

m.reset(within=None, refit=True)
md_print_df(m.report_table())

R[write to console]: Contrasts set to contr.sum for the following variables: group

	Term	p-value	Partial Eta-Squared	F	df1	df2
1	group	0.019	0.172	4.35	2	42

R and many other statistical packages (e.g., statsmodels support a formula interface to fit statistical models. Here it is shown that a model can also be specified by the formula kwargs rather than specifying dependent, between etc. The formula indicates that the score column is regressed by the time variable, with observations nested within the id column.

m.reset(formula='score~time|id', refit=True)
md_print_df(m.report_table())

	Term	p-value	Partial Eta-Squared	F	df1	df2
1	time	0.001	0.601	66.23	1.15	50.55

We can also run a similar, bayesian ANOVA using BayesAnova comparing the specified terms to the null model:

m = rst.groupwise.BayesAnova(data=anxiety, within='time',
                             dependent='score', subject='id')
md_print_df(m.report_table())

	model	bf	error
0	time	496.129	7.82496e-05

Work in progress and planned features

robusta includes several other features that are either under development or planned for the future.

Currently under work

Regressions and correlations modules

Planned

Sequential analysis plots (inspired by JASP)

How to contribute

All help is welcome, but currently there are no specific guidelines. Please contact Eitan Hemed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ROBUSTA

Author: Eitan Hemed

Installation

Documentation

Usage

For the most recent, thorough tutorial in the different features of robusta, head on to Google Colab.

Importing the library and loading data

Running statistical analyses

Bayesian t-tests

Analysis of variance

Work in progress and planned features

How to contribute

Files

README.md

Latest commit

History

README.md

File metadata and controls

ROBUSTA

Author: Eitan Hemed

Installation

Documentation

Usage

For the most recent, thorough tutorial in the different features of robusta, head on to Google Colab.

Importing the library and loading data

Running statistical analyses

Bayesian t-tests

Analysis of variance

Work in progress and planned features

How to contribute