Skip to content

A repo to discover workflows that utilize multiple paleogeoscience data resources

Notifications You must be signed in to change notification settings

throughput-ec/crossResourceGithubScraper

Repository files navigation

Throughput Cross-Resource Workflow Scraper

One component of the broader Throughput project is the ability to link to resources on the web that indicate ways in which individuals have linked records, data resources or objects to provide scientific insight.

To help establish a baseline of data integration this project uses Python to search for code on GitHub (with other implementations to come) that invoke commands such as import packageName (Python) or library(packageName) (R), and adds these to a graph database described elsewhere using the W3C annotation model.

Contributions

  • Chris Heiser - University of Northern Arizona
  • Nick McKay - University of Northern Arizona
  • Simon Goring - University of Wisconsin -- Madison

We welcome contributions from all individuals, but expect contributors to follow the Code of Conduct for this repository.

Current Packages of Interest

The list of packages to be searched includes packages from the ROpenSci registry, as well as Python packages, including lipd and packages in the SciTools repository.

Using this repository

To scrape the GitHub API you must have a valid user token. The .gitignore and the current R script look for that file in gh.token. You can generate a token using your developer settings in GitHub.

Support

This work is funded through the National Science Foundation's EarthCube Program through awards 1740699 and 1740667.

About

A repo to discover workflows that utilize multiple paleogeoscience data resources

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages