-
Notifications
You must be signed in to change notification settings - Fork 10
/
README.Rmd
executable file
·116 lines (82 loc) · 4.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
fig.align = "center",
out.width = "100%"
)
```
# Rdimtools <a href='https://www.kisungyou.com/Rdimtools/'><img src='man/figures/logo.png' align="right" height="139" /></a>
<!-- badges: start -->
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/Rdimtools?color=green)](https://cran.r-project.org/package=Rdimtools)
[![Travis-CI Build Status](https://travis-ci.org/kisungyou/Rdimtools.svg?branch=master)](https://travis-ci.org/kisungyou/Rdimtools)
[![](https://cranlogs.r-pkg.org/badges/Rdimtools)](https://cran.r-project.org/package=Rdimtools)
<!-- badges: end -->
```{r echo=FALSE, include=FALSE}
library(Rdimtools)
ndo = (sum(unlist(lapply(ls("package:Rdimtools"), startsWith, "do."))))
nest = (sum(unlist(lapply(ls("package:Rdimtools"), startsWith, "est."))))
```
**Rdimtools** is an R package for dimension reduction (DR) - including feature selection and manifold learning - and intrinsic dimension estimation (IDE) methods. We aim at building one of the *most comprehensive* toolbox available online, where current version delivers `r ndo` DR algorithms and `r nest` IDE methods.
The philosophy is simple, **the more we have at hands, the better we can play**.
## Elephant
Our logo characterizes the foundational nature of multivariate data analysis; we may be blind people wrangling the data to see an [elephant](https://en.wikipedia.org/wiki/Blind_men_and_an_elephant) to grasp an idea of what the data looks like with partial information from each algorithm.
## Installation
You can install a release version from CRAN:
```r
install.packages("Rdimtools")
```
or the development version from github:
```r
## install.packages("devtools")
devtools::install_github("kisungyou/Rdimtools")
```
## Minimal Example : Dimension Reduction
Here is an example of dimension reduction on the famous `iris` dataset. Principal Component Analysis (`do.pca`), Laplacian Score (`do.lscore`), and Diffusion Maps (`do.dm`) are compared, each from a family of algorithms for linear reduction, feature extraction, and nonlinear reduction.
```{r message=FALSE, warning=FALSE, fig.align='center', fig.width=7}
# load the library
library(Rdimtools)
# load the data
X = as.matrix(iris[,1:4])
lab = as.factor(iris[,5])
# run 3 algorithms mentioned above
mypca = do.pca(X, ndim=2)
mylap = do.lscore(X, ndim=2)
mydfm = do.dm(X, ndim=2, bandwidth=10)
# visualize
par(mfrow=c(1,3))
plot(mypca$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="PCA")
plot(mylap$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Laplacian Score")
plot(mydfm$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Diffusion Maps")
```
## Minimal Example : Dimension Estimation
![](https://people.cs.uchicago.edu/~dinoj/manifold/swissroll.gif)
Swill Roll is a classic example of 2-dimensional manifold embedded in $\mathbb{R}^3$ and one of 11 famous model-based samples from `aux.gensamples()` function. Given the ground truth that $d=2$, let's apply several methods for intrinsic dimension estimation.
```{r message=FALSE, warning=FALSE, fig.align='center', fig.width=7, fig.height=3}
# generate sample data
set.seed(100)
roll = aux.gensamples(dname="swiss")
# we will compare 6 methods (out of 17 methods from version 1.0.0)
vecd = rep(0,5)
vecd[1] = est.Ustat(roll)$estdim # convergence rate of U-statistic on manifold
vecd[2] = est.correlation(roll)$estdim # correlation dimension
vecd[3] = est.made(roll)$estdim # manifold-adaptive dimension estimation
vecd[4] = est.mle1(roll)$estdim # MLE with Poisson process
vecd[5] = est.twonn(roll)$estdim # minimal neighborhood information
# let's visualize
plot(1:5, vecd, type="b", ylim=c(1.5,2.5),
main="true dimension is d=2",
xaxt="n",xlab="",ylab="estimated dimension")
xtick = seq(1,5,by=1)
axis(side=1, at=xtick, labels = FALSE)
text(x=xtick, par("usr")[3],
labels = c("Ustat","correlation","made","mle1","twonn"), pos=1, xpd = TRUE)
```
We can observe that all 5 methods we tested estimated the intrinsic dimension around $d=2$. It should be noted that the estimated dimension may not be integer-valued due to characteristics of each method.
## Acknowledgements
The logo icon is made by [Freepik](https://www.flaticon.com/authors/freepik/) from [www.flaticon.com](https://www.flaticon.com/).The rotating Swiss Roll image is taken from [Dinoj Surendran](https://people.cs.uchicago.edu/~dinoj/manifold/swissroll.html)'s website.