Skip to content

DSC-SPIDAL/SciDatBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

SciDatBench: Principles and Prototypes of Science Data Benchmarks

This is the Digital Science Center component of the SciDatBench Science Data Benchmarking project. It works with the MLPerf Science Data Experimental Working Group.

The importance of Big Data is now recognized across a broad of scientific, societal, and commercial problems. Analysis of this data requires new research in both the data analysis methods and the information technology hardware and software to use in the analysis. SciDatBench is establishing a new collection of important and representative Big scientific datasets together with typical software implementations of the machine learning algorithms that are needed for best practice analysis. It generates particular instances and is establishing a sustainable process for maintaining and enhancing them. This collection includes both standalone examples and end to end examples needing multiple components that are seen in the analysis of many science experiments. SciDatBench is affiliated as an approved Science Data working group with the very successful MLPerf activity with 80 organizational members looking at Industry machine learning benchmarks. The state-of-the-art examples in SciDatBench are contributing to progress in scientific discovery that advances the national health, prosperity, and welfare, as stated by NSF's mission. The project is proactively involving under-represented communities in its activities.

The SciDatBench collection is accompanied by documentation allowing it to be used in the training of researchers in the rapidly evolving Big Data analysis techniques. SciDatBench pursues performance, quality, and pedagogical goals. The heart of the project is a set of virtual working group meetings associated with Science Data and other MLPerf activities of importance to SciDatBench. The project naturally impacts a broad range of scientific disciplines including eventually material sciences, environmental sciences, life sciences including epidemiology, fusion, particle physics, astronomy, earthquake, and earth sciences, with more than one representative problem from each of these domains. SciDatBench supports comparative studies and identifies requirements for future cyberinfrastructure to support scientific data analysis. The benchmarks not only record time to a solution but also multiple measures of the quality of the solution.

  • Early deliverables include building a community interested in Science Data Benchmarks and MLPerf,
  • Weekly working group meetings
  • Jupyter notebook approach to accessing Science and he other MLPerf benchmarks
  • Initial Benchmarks including many collected at the Rutherford Laboratory, UK by Tony Hey, and Jeyan Thiyagalingam
  • Tutorial material built around benchmarks

Useful Links are

SciDatBench Initial Timetable

SciDatBench at IU is funded by NSF through an EAGER Grant NSF-OAC-2038007

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published