Skip to content

Utilities for loading data from MAG-Proquest database

License

Notifications You must be signed in to change notification settings

f-hafner/magutils

Repository files navigation

magutils

R-CMD-check

The goal of magutils is to facilitate loading and extracting data from a database with records from Microsoft Academic Graph and ProQuest Dissertations and make the functions available to co-authors and RAs. In the future, we may publish a “back-end” package to generate the database.

Installation

You can install the development version of magutils from GitHub with:

# install.packages("devtools")
devtools::install_github("f-hafner/magutils", build_vignettes = TRUE)

Example

If you do not have access to the full database, use the example database like this:

library(magutils)

db_file <- db_example("AcademicGraph.sqlite")
conn <- connect_to_db(db_file)
#> The database connection is: 
#> src:  sqlite 3.39.3 [/tmp/RtmpfnS8S0/temp_libpath1e5445e211e0e/magutils/extdata/AcademicGraph.sqlite]
#> tbls: author_coauthor, author_output, AuthorAffiliation, current_links,
#>   current_links_advisors, FieldsOfStudy, FirstNamesGender, pq_advisors,
#>   pq_authors, pq_fields_mag, pq_unis

Then query the graduate links:

links <- get_links(conn, from = "graduates", lazy = TRUE)

Or query info on graduates:

graduates <- get_proquest(conn, from = "graduates", lazy = FALSE, limit = 3)

You can join the two together

library(magrittr)
links <- get_links(conn, from = "graduates", lazy = TRUE)
d_full <- get_proquest(conn, from = "graduates", limit = 5) %>%
  dplyr::left_join(links, by = "goid") %>%
  dplyr::collect()

At the end, do not forget to disconnect from the database:

DBI::dbDisconnect(conn)

Main functions

Extracting key tables

  • get_proquest: Source data on dissertations in United States from ProQuest.

  • get_links: Load links between ProQuest and MAG. Can be links from PhD graduates to MAG authors, or from PhD advisors to MAG authors

  • define_field: define the field of study for records in a table.

  • define_gender: define gender of a table of persons with firstnames.

  • augment_tbl: augment a table with various additional information: output, affiliations, co-authors. Because output and affiliations are at the unit-year level, the result will be a table at the unit-year level. I am not sure if this is the best way to do it (also the naming wrt to the previous functions), but we have to see how it works in practice.

Suggested usage

Load the links and/or proquest data, augment them as necessary, and then collect into memory.

For more details, browseVignettes("magutils").

About

Utilities for loading data from MAG-Proquest database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages