Skip to content

Commit

Permalink
Merge pull request #120 from kosukeimai/dev
Browse files Browse the repository at this point in the history
Development Release 3.0.0
  • Loading branch information
1beb authored Feb 28, 2024
2 parents fefdd59 + 33fbbea commit f10531b
Show file tree
Hide file tree
Showing 71 changed files with 2,952 additions and 864 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ ChangeLog

^cran-comments\.md$
^CRAN-SUBMISSION$
^README\.Rmd$
^data-raw$
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.DS_Store

# History files
.Rhistory
.Rapp.history
Expand All @@ -8,6 +10,7 @@
# RStudio files
.Rproj.user/
.Rproj
.lazytest

# produced vignettes
vignettes/*.html
Expand All @@ -21,4 +24,4 @@ vignettes/*.pdf
src/RcppExports.o
src/aux_funs.o
src/sample_me.o
src/wru.so
src/wru.so
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Date Version Comment
2022-06-17 1.0.0 Updates to BISG, inclusion of fBISG and other package improvements
2022-10-04 1.0.1 Bug fixes for census url and census year
2023-06-12 2.0.0 Updated defaults to 2020 data, specifiy as next major version 2.0.
2024-02-15 3.0.0 Adding back age and sex functionality. Other improvements.
61 changes: 36 additions & 25 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,47 +1,58 @@
Package: wru
Version: 2.0.0
Date: 2023-07-12
Title: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and
Geolocation
Title: Who are You? Bayesian Prediction of Racial Category Using Surname,
First Name, Middle Name, and Geolocation
Version: 3.0.0
Date: 2024-02-14
Authors@R: c(
person("Kabir", "Khanna", email = "[email protected]", role = c("aut")),
person("Brandon", "Bertelsen", email = "[email protected]", role = c("aut","cre")),
person("Santiago", "Olivella", email = "[email protected]", role = c("aut")),
person("Evan", "Rosenman", email = "[email protected]", role = c("aut")),
person("Kosuke", "Imai", email = "[email protected]", role = c("aut"))
person("Kabir", "Khanna", , "[email protected]", role = "aut"),
person("Brandon", "Bertelsen", , "[email protected]", role = c("aut", "cre")),
person("Santiago", "Olivella", , "[email protected]", role = "aut"),
person("Evan", "Rosenman", , "[email protected]", role = "aut"),
person("Alex", "Rossell Hayes", , "[email protected]", role = "aut"),
person("Kosuke", "Imai", , "[email protected]", role = "aut")
)
Description: Predicts individual race/ethnicity using surname, first name, middle name, geolocation,
and other attributes, such as gender and age. The method utilizes Bayes'
Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given
individual. The package implements methods described in Imai and Khanna (2016)
"Improving Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022)
"Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023)
"Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
Description: Predicts individual race/ethnicity using surname, first name,
middle name, geolocation, and other attributes, such as gender and
age. The method utilizes Bayes' Rule (with optional measurement error
correction) to compute the posterior probability of each racial
category for any given individual. The package implements methods
described in Imai and Khanna (2016) "Improving Ecological Inference by
Predicting Individual Ethnicity from Voter Registration Records"
Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and
Rosenman (2022) "Addressing census data problems in race imputation
via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data
described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity
data for first, middle, and surnames"
<DOI:10.1038/s41597-023-02202-2>.
License: GPL (>= 3)
URL: https://github.com/kosukeimai/wru
BugReports: https://github.com/kosukeimai/wru/issues
Depends:
R (>= 4.1.0),
utils
Imports:
cli,
dplyr,
tidyr,
furrr,
future,
piggyback (>= 0.1.4),
PL94171,
purrr,
Rcpp,
piggyback (>= 0.1.4),
PL94171
rlang
Suggests:
covr,
testthat (>= 3.0.0),
covr
tidycensus
LinkingTo:
Rcpp,
RcppArmadillo
LazyLoad: yes
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: yes
LazyDataCompression: xz
License: GPL (>= 3)
LazyLoad: yes
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Encoding: UTF-8
Config/testthat/edition: 3
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# Generated by roxygen2: do not edit by hand

export(as_fips_code)
export(as_state_abbreviation)
export(format_legacy_data)
export(get_census_data)
export(predict_race)
import(PL94171)
importFrom(Rcpp,evalCpp)
importFrom(dplyr,coalesce)
importFrom(dplyr,pull)
importFrom(furrr,future_map_dfr)
importFrom(piggyback,pb_download)
importFrom(purrr,map_dfr)
importFrom(rlang,"%||%")
importFrom(stats,rmultinom)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
Expand Down
7 changes: 1 addition & 6 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,16 @@
#'
#' @param last_name Integer vector of last name identifiers for each record (zero indexed; as all that follow). Must match columns numbers in M_rs.
#' @param first_name See last_name
#' @param middle_name See last_name
#' @param mid_name See last_name
#' @param geo Integer vector of geographic units for each record. Must match column number in N_rg
#' @param N_rg Integer matrix of race | geography counts in census (geograpgies in columns).
#' @param M_rs Integer matrix of race | surname counts in dictionary (surnames in columns).
#' @param M_rf Same as `M_rs`, but for first names (can be empty matrix for surname only models).
#' @param M_rm Same as `M_rs`, but for middle names (can be empty matrix for surname, or surname and first name only models).
#' @param alpha Numeric matrix of race | geography prior probabilities.
#' @param pi_s Numeric matrix of race | surname prior probabilities.
#' @param pi_f Same as `pi_s`, but for first names.
#' @param pi_m Same as `pi_s`, but for middle names.
#' @param pi_nr Matrix of marginal probability distribution over missing names; non-keyword names default to this distribution.
#' @param which_names Integer; 0=surname only. 1=surname + first name. 2= surname, first, and middle names.
#' @param samples Integer number of samples to take after (in total)
#' @param burnin Integer number of samples to discard as burn-in of Markov chain
#' @param me_race Boolean; should measurement error in race | geography be corrected?
#' @param race_init Integer vector of initial race assignments
#' @param verbose Boolean; should informative messages be printed?
#'
Expand Down
24 changes: 4 additions & 20 deletions R/census_data_preflight.R
Original file line number Diff line number Diff line change
@@ -1,31 +1,15 @@
#' Preflight census data
#'
#' @param census.data See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @inheritParams predict_race
#' @keywords internal

census_data_preflight <- function(census.data, census.geo, year) {

if (year != "2020"){
vars_ <- c(
pop_white = 'P005003', pop_black = 'P005004',
pop_aian = 'P005005', pop_asian = 'P005006',
pop_nhpi = 'P005007', pop_other = 'P005008',
pop_two = 'P005009', pop_hisp = 'P005010'
)
} else {
vars_ <- c(
pop_white = 'P2_005N', pop_black = 'P2_006N',
pop_aian = 'P2_007N', pop_asian = 'P2_008N',
pop_nhpi = 'P2_009N', pop_other = 'P2_010N',
pop_two = 'P2_011N', pop_hisp = 'P2_002N'
)
}
vars_ <- unlist(census_geo_api_names(year = year))
legacy_vars <- unlist(census_geo_api_names_legacy(year = year))

test <- lapply(census.data, function(x) {
nms_to_test <- names(x[[census.geo]])
all(vars_ %in% nms_to_test)
all(vars_ %in% nms_to_test) || all(legacy_vars %in% nms_to_test)
})
missings <- names(test)[!unlist(test)]

Expand Down
Loading

0 comments on commit f10531b

Please sign in to comment.