Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clr #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .RData
Binary file not shown.
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ Imports: compositions,
robCompositions,
testthat,
ggplot2
RoxygenNote: 7.1.1
RoxygenNote: 7.1.0
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
152 changes: 10 additions & 142 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,169 +1,37 @@

# The `deltacomp` package

Functions to analyse compositional data and produce predictions (with confidence intervals) for relative increases and decreases in the compositional components
The functions in the `deltacomp` package produce predictions (with confidence intervals) for relative increases and decreases in the compositional parts.

## 1. Background
The development of the package was initiated by Ty Stanford and Dorothea Dumuid in 2018 and is still under development. Changes and corrections are expected to be made during 2021.

For an outcome variable `Y`, *D* compositional variables (`x_1, ..., x_D`) and *C* covariates (`z_1, ..., z_C`); this package fits the compositional data analysis model (notation inexact):

## Installing `deltacomp`

`Y = b_0 + b_1 ilr_1 + ... + b_{D-1} ilr_{D-1} + a_1 z_1 + ... + a_C z_C + e`

where `ilr_i` are the *D-1* isometric log ratio variables derived from the *D* compositional variables (`x_1, ..., x_D`), `b_0, ..., b_{D-1}, a_1, ..., a_C` are *D+C* parameters to be estimated and `e ~ N(0, sigma)` is the error. The package then makes predictions in alterations of the time-use variables (the linearly dependent set of compositional components) based on this model.


For a starting point to learn about compositional data analysis please see [Aitchison (1982)](https://doi.org/10.1111/j.2517-6161.1982.tb01195.x) or [van den Boogaart and Tolosana-Delgado (2013)](https://link.springer.com/book/10.1007%2F978-3-642-36809-7). However the articles [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835) and [Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805) may be more approachable introductions.


## 2. Reallocation of time-use component options

Please note that the use of 'mean composition' means the geometric mean on the compositional simplex and *not* the arithmetic mean. If these words have little meaning to you, that is no problems as these differently calculated means likely do not differ much in your dataset. `deltacomp` only uses the simplex geometric mean in its calculations from version 0.2.0 onwards.

### 2.1. Option `comparisons = "prop-realloc"`

Information on outcome prediction with time-use exchange between one component and the remaining compositional components proportionally (`comparisons = "prop-realloc"` option of the `predict_delta_comps()` function), please see [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835).

### 2.1.1. Example

Suppose you have three (predictor) components in a day summing to 1 (e.g., a day) to predict an outcome variable. The three components are `sedentary`, `sleep` and `activity`. Let's assume the mean sampled composition is:

* `sedentary = 0.5` (i.e., half a day)
* `sleep = 0.3` (i.e., 30% a day)
* `activity = 0.2` (i.e., 20% a day)

If you wanted to predict the change in the outcome variable from the above mean composition with `delta = +0.05` (5% of the day) is added to `sedentary`, the option `comparisons = "prop-realloc"` reduces the remaining components by the 5% proportionately based on their mean values, illustrated below:

* `sedentary* = 0.5 + delta = 0.5 + 0.05 = 0.55`
* `sleep* = 0.3 - delta * sleep / (sleep + activity) = 0.3 - 0.05 * 0.3 / (0.3 + 0.2) = 0.3 - 0.03 = 0.27`
* `activity* = 0.2 - delta * activity / (sleep + activity) = 0.2 - 0.05 * 0.2 / (0.3 + 0.2) = 0.2 - 0.02 = 0.18`

Noting that the new compsition: `sedentary* + sleep* + activity* = 0.55 + 0.27 + 0.18 = 1`.

Note for the example above, the option `comparisons = "prop-realloc"` in `predict_delta_comps()` will actually automatically produce seperate predictions for a `delta = +0.05` on each of the components against the remaining components. i.e., not only the `sedentary* = 0.5 + delta` scenario as illustrated above but also `sleep* = 0.3 + delta` and `activity* = 0.2 + delta` cases.

### 2.2. Option `comparisons = "one-v-one"`

For information on outcome prediction with time-use exchange between two compositional components (i.e., the `comparisons = "one-v-one"` option of the `predict_delta_comps()` function), please see
[Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805).

### 2.2.1. Example

Similarily to the previous example, suppose you have three (predictor) components in a day summing to 1 (i.e. a day) to predict an outcome variable. The three components are `sedentary`, `sleep` and `activity`. Let's assume the mean sampled composition is:

* `sedentary = 0.5` (i.e., half a day)
* `sleep = 0.3` (i.e., 30% a day)
* `activity = 0.2` (i.e., 20% a day)

If you wanted to predict the change in the outcome variable from the above mean composition with `delta = +0.05` (5% of the day), the option `comparisons = "one-v-one"` looks at all pairwise exchanges between the components `(sedentary*, sleep*, activity*)`:

* `(0.5 + 0.05, 0.3 - 0.05, 0.2 )`
* `(0.5 + 0.05, 0.3 , 0.2 - 0.05)`
* `(0.5 , 0.3 + 0.05, 0.2 - 0.05)`
* `(0.5 - 0.05, 0.3 + 0.05, 0.2 )`
* `(0.5 - 0.05, 0.3 , 0.2 + 0.05)`
* `(0.5 , 0.3 - 0.05, 0.2 + 0.05)`


### 2.3. Option `comparisons = "one-v-all"`

Depreciated.


## 3. Datasets in package

Two datasets are supplied with the package:

* `fairclough` and
* `fat_data`.

The `fairclough` dataset was kindly provided by the authors of [Fairclough et al. (2017)](https://doi.org/10.1186/s12966-017-0521-z). `fat_data` is a randomly generated test dataset that might roughly mimic a real dataset.

## 4. Example usage
Run the following code to install and load the `deltacomp` package

```R
library(devtools) # see https://www.r-project.org/nosvn/pandoc/devtools.html
devtools::install_github('tystan/deltacomp')
library(deltacomp)
### see help file to run example
?predict_delta_comps

predict_delta_comps(
dataf = fat_data,
y = "fat",
comps = c("sl", "sb", "lpa", "mvpa"),
covars = c("sibs", "parents", "ed"),
deltas = seq(-60, 60, by = 5) / (24 * 60),
comparisons = "prop-realloc",
alpha = 0.05
)

# OR

predict_delta_comps(
dataf = fat_data,
y = "fat",
comps = c("sl", "sb", "lpa", "mvpa"),
covars = c("sibs", "parents", "ed"),
deltas = seq(-60, 60, by = 5) / (24 * 60),
comparisons = "one-v-one",
alpha = 0.05
)

```


## 5. Output and plotting results

Output is a `data.frame` that can be turned into the plot below using the following code.
The following code are run to see help files:

```R

pred_df <-
predict_delta_comps(
dataf = fairclough,
y = "z_bmi",
comps = c("sleep", "sed", "lpa", "mvpa"),
covars = c("decimal_age", "sex"),
# careful deltas greater than 25 min in magnitude induce negative compositions
# predict_delta_comps() will warn you about this :-)
deltas = seq(-20, 20, by = 5) / (24 * 60),
comparisons = "prop-realloc", # or try "one-v-one"
alpha = 0.05
)

plot_delta_comp(
pred_df, # provide the returned object from predict_delta_comps()
# x-axis can be converted from propotion of composition to meaningful units
comp_total = 24 * 60, # minutes available in the composition
units_lab = "min" # just a label for plotting
)


?predict_delta_comps
```

## How to use `deltacomp`?

![](https://github.com/tystan/deltacomp/blob/master/inst/img/delta_comps2.png)


### 5.1. Prediction for the mean composition

The function `predict_delta_comps()` now outputs the predicted outcome value (with `100 * (1 - alpha)`% confidence interval). This data is printed to the console but also can be extracted from the output of `predict_delta_comps()` as per the below code:
Please see the package vignette for examples of what the `deltacomp` package can do. To view, run the following:

```R

# produces a 1 line data.frame that contains
# the (simplex/geometric) mean composition,
# the "average" covariates (the median of the factor variables in order of the levels are taken as default),
# the ilr coords of the (simplex/geometric) mean composition, and
# the predicted outcome value with 100*(1-alpha)% confidence interval
attr(pred_df, "mean_pred")


vignette("deltacomp vignette")
```


## 6. Release notes
## Release notes

See [/change-notes.md](https://github.com/tystan/deltacomp/blob/master/change-notes.md).

10 changes: 10 additions & 0 deletions tests/testthat/test_create_seq_bin_part.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
context("create_seq_bin_part() checks")

test_that("create_seq_bin_part() throws error if wrong inputs", {

expect_error(create_seq_bin_part("b"))
expect_error(create_seq_bin_part(c(2,3)))

})


28 changes: 28 additions & 0 deletions tests/testthat/test_extract_lm_quantities.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
context("extract_lm_quantities() checks")

x <- runif(10)
y <- 3 * x + 7 + rnorm(10)
example_lm1 <- lm(y ~ x)

test_that("extract_lm_quantities() correctly throws errors for bad input", {

expect_error(
extract_lm_quantities(example_lm1, "alpha")
)

expect_error(
extract_lm_quantities(y~x, "alpha")
)

})

test_that("extract_lm_quantities() is a list", {

expect_output(str(extract_lm_quantities(example_lm1)), "List of 5")

expect_output(str(extract_lm_quantities(example_lm1)), "List of 5", fixed=T)

})



Loading