-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
116 lines (81 loc) · 3.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# bigsimr <a href='https://github.com/SchisslerGroup/r-bigsimr'><img src='man/figures/logo.png' align="right" height="139" /></a>
`bigsimr` is an R package for simulating high-dimensional multivariate data with a target correlation and arbitrary marginal distributions via Gaussian copula. It utilizes [Bigsimr.jl](https://github.com/SchisslerGroup/Bigsimr.jl) for its core routines. For full documentation and examples, please see the [Bigsimr.jl docs](https://schisslergroup.github.io/Bigsimr.jl/stable/).
## Features
* **Pearson matching** - employs a matching algorithm (Xiao and Zhou 2019) to account for the non-linear transformation in the Normal-to-Anything (NORTA) step
* **Spearman and Kendall matching** - Use explicit transformations (Lebrun and Dutfoy 2009)
* **Nearest Correlation Matrix** - Calculate the nearest positive [semi]definite correlation matrix (Qi and Sun 2006)
* **Fast Approximate Correlation Matrix** - Calculate an approximation to the nearest positive definite correlation matrix
* **Random Correlation Matrix** - Generate random positive [semi]definite correlation matrices
* **Fast Multivariate Normal Generation** - Utilize multithreading to generate multivariate normal samples in parallel
## Installation
You can install the release version of the package from GitHub:
```r
remotes::install_github("SchisslerGroup/r-bigsimr")
```
To get a bug fix or to use a new feature, you can install the development version from GitHub:
```r
remotes::install_github("SchisslerGroup/r-bigsimr", ref="develop")
```
Note that the first invocation of `bigsimr::bigsimr_setup()` will install both Julia
and the required packages if they are missing. If you wish to have it use an existing Julia binary, make sure that `julia` is found in the path. For more information see the `julia_setup()` function from [JuliaCall](https://github.com/Non-Contradiction/JuliaCall).
## Usage
```{r, message=FALSE, warning=FALSE}
library(bigsimr)
bs <- bigsimr_setup()
dist <- distributions_setup()
set.seed(2024-02-20)
```
### Examples
Pearson matching
```{r}
(target_corr <- bs$cor_randPD(3))
margins <- c(
dist$Binomial(20, 0.2),
dist$Beta(2, 3),
dist$LogNormal(3, 1)
)
(adjusted_corr <- bs$pearson_match(target_corr, margins))
x <- bs$rvec(100000, adjusted_corr, margins)
bs$cor(x, bs$Pearson)
```
Spearman/Kendall matching
```{r}
(spearman_corr <- bs$cor_randPD(3))
(adjusted_corr <- bs$cor_convert(spearman_corr, bs$Spearman, bs$Pearson))
x <- bs$rvec(100000, adjusted_corr, margins)
bs$cor(x, bs$Spearman)
```
Nearest correlation matrix
```{r}
s <- bs$cor_randPSD(200)
r <- bs$cor_convert(s, bs$Spearman, bs$Pearson)
bs$is_correlation(r)
```
```{r}
p <- bs$cor_nearPD(r)
bs$is_correlation(p)
```
Fast approximate nearest correlation matrix
```{r}
s <- bs$cor_randPSD(2000)
r <- bs$cor_convert(s, bs$Spearman, bs$Pearson)
bs$is_correlation(r)
```
```{r}
p <- bs$cor_fastPD(r)
bs$is_correlation(p)
```
## Issues
This package is just a wrapper for the Julia package. Please file any bug reports or feature requests over at the [Bigsimr.jl](https://github.com/SchisslerGroup/Bigsimr.jl/issues) package repo.