Skip to content

Commit

Permalink
Preparing release for JOSS
Browse files Browse the repository at this point in the history
  • Loading branch information
gvegayon committed Sep 15, 2023
1 parent bc36ab7 commit 36c5d72
Show file tree
Hide file tree
Showing 5 changed files with 233 additions and 9 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Makefile
defm\.code-workspace
README\.html
paper/
paper\.md
bibliography\.bib
\.vscode
^docker$
\.gitattributes$
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Authors@R: c(
comment = "Award/W81XWH-18-PH/TBIRP-LIMBIC under Award No. I01 RX003443"
))
Description: Multi-binary response models are a class of models that allow for the estimation of multiple binary outcomes simultaneously. This package provides functions to estimate and simulate these models using the Discrete Exponential-Family Models [DEFM] framework. In it, we implement the models described in Vega Yon, Valente, and Pugh (2023) <doi:10.48550/arXiv.2211.00627>. DEFMs include Exponential-Family Random Graph Models [ERGMs], which characterize graphs using sufficient statistics, which is also the core of DEFMs. Using sufficient statistics, we can describe the data through meaningful motifs, for example, transitions between different states, joint distribution of the outcomes, etc.
URL: https://github.com/UofUEpiBio/defm, https://uofuepibio.github.io/defm/
BugReports: https://github.com/UofUEpiBio/defm/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Expand Down
175 changes: 175 additions & 0 deletions bibliography.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
@article{Martin2021,
author = {Glen P. Martin and Matthew Sperrin and Kym I. E. Snell and Iain Buchan and Richard D. Riley},
doi = {10.1002/sim.8787},
issn = {0277-6715},
issue = {2},
journal = {Statistics in Medicine},
month = {1},
pages = {498-517},
title = {Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches},
volume = {40},
year = {2021},
}

@article{Bai2020,
abstract = {<p>In clinical research, study outcomes usually consist of various patients’ information corresponding to the treatment. To have a better understanding of the effects of different treatments, one often needs to analyze multiple clinical outcomes simultaneously, while the data are usually mixed with both continuous and discrete variables. We propose the multivariate mixed response model to implement statistical inference based on the conditional grouped continuous model through a pairwise composite-likelihood approach. It can simplify the multivariate model by dealing with three types of bivariate models and incorporating the asymptotical properties of the composite likelihood via the Godambe information. We demonstrate the validity and the statistic power of the multivariate mixed response model through simulation studies and clinical applications. This composite-likelihood method is advantageous for statistical inference on correlated multivariate mixed outcomes.</p>},
author = {Hao Bai and Yuan Zhong and Xin Gao and Wei Xu},
doi = {10.3390/stats3030016},
issn = {2571-905X},
issue = {3},
journal = {Stats},
month = {7},
pages = {203-220},
title = {Multivariate Mixed Response Model with Pairwise Composite-Likelihood Method},
volume = {3},
year = {2020},
}


@article{Davenport2018,
author = {Clemontina A. Davenport and Arnab Maity and Patrick F. Sullivan and Jung-Ying Tzeng},
doi = {10.1007/s12561-017-9189-9},
issn = {1867-1764},
issue = {1},
journal = {Statistics in Biosciences},
month = {4},
pages = {117-138},
title = {A Powerful Test for SNP Effects on Multivariate Binary Outcomes Using Kernel Machine Regression},
volume = {10},
year = {2018},
}

@article{CAREY1993,
author = {Vicent Carey and Schott L. Zeger and Peter Diggle},
doi = {10.1093/biomet/80.3.517},
issn = {0006-3444},
issue = {3},
journal = {Biometrika},
pages = {517-526},
title = {Modelling multivariate binary data with alternating logistic regressions},
volume = {80},
year = {1993},
}

@article{TeixeiraPinto2009,
author = {Armando Teixeira-Pinto and Sharon-Lise T. Normand},
doi = {10.1002/sim.3588},
issn = {02776715},
issue = {13},
journal = {Statistics in Medicine},
month = {6},
pages = {1753-1773},
title = {Correlated bivariate continuous and binary outcomes: Issues and applications},
volume = {28},
year = {2009},
}

@article{Holland1981,
author = {Holland, Paul W. and Leinhardt, Samuel},
doi = {10.2307/2287037},
journal = {Journal of the American Statistical Association},
keywords = {generalized iterative scaling,networks,random digraphs,sociome-,try},
number = {373},
pages = {33--50},
title = {{An exponential family of probability distributions for directed graphs}},
volume = {76},
year = {1981}
}

@article{Frank1986,
abstract = {Log-linear statistical models are used to characterize ran- dom graphs with general dependence structure and with Markov dependence. Sufficient statistics for Markov graphs are shown to be given by counts of various triangles and stars. In particular, we show under which assumptions the triad counts are sufficient statistics. We discuss inference methodology for some simple Markov graphs.},
author = {Frank, O and Strauss, David},
doi = {10.2307/2289017},
journal = {Journal of the American Statistical Association},
keywords = {log-linear network model,markov field},
mendeley-groups = {network dependence,ergms},
number = {395},
pages = {832--842},
pmid = {7439394},
title = {{Markov graphs}},
url = {http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1986.10478342},
volume = {81},
year = {1986}
}


@article{Wasserman1996,
author = {Wasserman, Stanley and Pattison, Philippa},
doi = {10.1007/BF02294547},
journal = {Psychometrika},
keywords = {categorical data analysis,random graphs,social network analysis},
number = {3},
pages = {401--425},
pmid = {10613111},
title = {{Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*}},
volume = {61},
year = {1996}
}

@article{Snijders2006,
author = {Snijders, Tom A B and Pattison, Philippa E and Robins, Garry L and Handcock, Mark S},
doi = {10.1111/j.1467-9531.2006.00176.x},
issn = {0081-1750},
journal = {Sociological Methodology},
month = {12},
number = {1},
pages = {99--153},
title = {{New specifications for exponential random graph models}},
url = {http://www.jstor.org/stable/25046693 http://smx.sagepub.com/lookup/doi/10.1111/j.1467-9531.2006.00176.x},
volume = {36},
year = {2006}
}


@article{Robins2007,
author = {Robins, Garry and Pattison, Pip and Kalish, Yuval and Lusher, Dean},
doi = {10.1016/j.socnet.2006.08.002},
journal = {Social Networks},
keywords = {Exponential random graph models,Statistical models for social networks,p* models},
number = {2},
pages = {173--191},
pmid = {18449326},
title = {{An introduction to exponential random graph (p*) models for social networks}},
volume = {29},
year = {2007}
}

@Manual{handcock2023,
author = {Mark S. Handcock and David R. Hunter and Carter T. Butts and Steven M. Goodreau and Pavel N. Krivitsky and Martina Morris},
title = {ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks},
organization = {The Statnet Project (\url{https://statnet.org})},
year = {2023},
note = {R package version 4.5.0},
url = {https://CRAN.R-project.org/package=ergm},
}

@Article{ergmpkg,
title = {{ergm}: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks},
author = {David R. Hunter and Mark S. Handcock and Carter T. Butts and Steven M. Goodreau and Martina Morris},
journal = {Journal of Statistical Software},
year = {2008},
volume = {24},
number = {3},
pages = {1--29},
doi = {10.18637/jss.v024.i03},
}


@Misc{defmarxiv,
title = {{Discrete Exponential-Family Models for Multivariate Binary Outcomes}},
author = {George {Vega Yon} and Thomas Valente and Mary Jo Pugh},
year = {{2022}},
archiveprefix = {{arXiv}},
archiveprefix = {{arXiv}},
primaryclass = {{stat.ME}},
doi = {10.48550/arXiv.2211.00627},
}

@Manual{R,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2023},
url = {https://www.R-project.org/},
}
54 changes: 54 additions & 0 deletions paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: 'defm: Estimation and Simulatio of Multi-Binary Response Models'
tags:
- R
- statistics
- graphical models
- Markov random fields
authors:
- given-names: George G.
family: Vega Yon
affiliation: '1'
orc-id: 0000-0002-3171-0844
affiliations:
- index: 1
name: Division of Epidemiology, University of Utah, Salt Lake City, UT, United States of America
---

# Introduction

Datasets featuring multiple outcomes and methods to deal with them are increasingly common [@Martin2021]. In the case of binary outcomes, the usual statistical approach is to either look into one variable at a time or, in the case of longitudinal data, use Markov random fields [MRF] to estimate transitions between states. In models with $k$ states, the number of parameters to estimate grows exponentially at a rate of $2^{2k}$. While MRF are a better modeling alternative than looking at the outcomes separately, MRF can become awkward to estimate and interpret. The `defm` R package [@R] implements the Discrete Exponential-Family Models [DEFM,] a modeling framework that simplifies the problem using sufficiency [@Holland1981; @Frank1986; @Wasserman1996; @Snijders2006; @Robins2007]. DEFMs' most popular model is the Exponential-Family Random Graph Model [ERGM], currently available in the R ergm package [@handcock2023; @ergmpkg]. This package provides the first resource for fitting DEFMs for data other than network data.

# Statement of Need

As mentioned in the introduction, multi-outcome data is more common. Although there are statistical models built for estimating multi-binary outcome models [@Bai2020; @Davenport2018; @CAREY1993; @TeixeiraPinto2009], most methods are focused on controlling for correlations between outcomes rather than hypotheses about their interactions. With DEFMs, we can answer questions such as:

- Does consuming tobacco and alcohol lead to a higher risk of using marijuana?
- Are depression and substance more likely to co-occur than depression and anxiety?
- Are phenotypes A and B jointly mediated by gender?
- Is a joint disease A and B prevalence model more appropriate than independent logistic regressions?

DEFMs provide an elegant framework to answer these questions, providing a formal strategy to assess independence between outcomes. @defmarxiv introduces the method and provides a detailed description of the model.

# Key Features

The package's core functionality is implemented in C++. `defm` is a module part of a larger project we also maintain called [`barry`](https://github.com/USCbiostats/barry), a header-only C++ library that provides functionality to build DEFMs. Key features of this R package are:

- Estimation of DEFM using maximum likelihood.
- Simulation of DEFM data.
- Build Markov models of arbitrary order.
- Fast motif counting using the C++ backend.
- Hashing of models' normalizing constant to avoid recomputing it.

The package is available on the Comprehensive R Archive Network [[CRAN](https://cran.r-project.org/package=defm)] and [GitHub](https://github.com/UofUEpiBio/defm).


# Conclusion

The `defm` R package provides the first implementation of Discrete Exponential-Family Models for data other than network data. The package's core functionality is implemented in C++, which makes it fast. Using `defm`, researchers can model data featuring multiple binary outcomes and make hypothesis testing about their interactions. The package is available on CRAN and GitHub.

# Acknowledgements

This work was supported by the Assistant Secretary of Defense for Health Affairs endorsed by the Department of Defense, through the Psychological Health/Traumatic Brain Injury Research Program Long-Term Impact of Military-Relevant Brain Injury Consortium (LIMBIC) Award/W81XWH-18-PH/TBIRP-LIMBIC under Award No. I01 RX003443. The U.S. Army Medical Research Acquisition Activity, 839 Chandler Street, Fort Detrick MD 21702-5014 is the awarding and administering acquisition office. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the Department of Defense. Any opinions, findings, conclusions recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Government, the U.S. Department of Veterans Affairs or the Department of Defense and no official endorsement should be inferred.

# References
9 changes: 0 additions & 9 deletions paper/paper.md

This file was deleted.

0 comments on commit 36c5d72

Please sign in to comment.