HEST-Library: Bringing Spatial Transcriptomics and Histopathology together

Designed for querying and assembling HEST-1k dataset

Welcome to the official GitHub repository of the HEST-Library introduced in "HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis". This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital.

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

What does this repository provide?

HEST-1k: Free access to HEST-1K, a dataset of 1,108 paired Spatial Transcriptomics samples with HE-stained whole-slide images
HEST-Library: A series of helpers to assemble new ST samples (from ST, Visium, Visium HD, or Xenium) and work with HEST-1k
HEST-Benchmark: A new benchmark to assess the predictive performance of foundation models for histology in predicting gene expression from morphology

Download/Query HEST-1k (743GB)

To download/query HEST-1k, follow the tutorial 1-Downloading-HEST-1k.ipynb or follow instructions on Hugging Face.

NOTE: The entire dataset weighs 743 GB but you can easily download a subset by querying per id, organ, species...

HEST-Library installation

git clone https://github.com/mahmoodlab/hest.git
cd hest
conda create -n "hest" python=3.9
conda activate hest
pip install -e .

Additional dependencies (for WSI manipulation):

sudo apt install libvips libvips-dev openslide-tools

Additional dependencies (GPU acceleration):

If a GPU is available on your machine, we recommend installing cucim on your conda environment. (hest was tested with cucim-cu12==24.4.0 and CUDA 12.1)

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.6.* dask-cudf-cu12==24.6.* cucim-cu12==24.6.* \
    raft-dask-cu12==24.6.*

NOTE: HEST-Library was only tested on Linux/macOS machines, please report any bugs in the GitHub issues.

Inspect HEST-1k with HEST-Library

You can then simply view the dataset as,

from hest import load_hest

print('Lazy loading of hest...')
hest_data = load_hest('hest_data') # location of the data
print('loaded hest')
for d in hest_data:
    print(d)

HEST-Library API

The HEST-Library allows assembling new samples using HEST format and interacting with HEST-1k. We provide two tutorials:

2-Interacting-with-HEST-1k.ipynb: Playing around with HEST data for loading patches. Includes a detailed description of each scanpy object.
3-Assembling-HEST-Data.ipynb: Walkthrough to transform a Visum sample into HEST.

In addition, we provide complete documentation.

HEST-Benchmark

The HEST-Benchmark was designed to assess foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes 10 tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in 10 different organs and 9 cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in 4-Running-HEST-Benchmark.ipynb.

HEST-Benchmark results (06.24.24)

HEST-Benchmark was used to assess 10 publicly available models. Reported results are based on a Random Forest regression model (70 trees). Model performance measured with Pearson correlation. Best is bold, second best is underlined. Models sorted by publication date. Additional results based on Ridge regression are provided in the paper.

	ResNet50	KimiaNet	Ciga	CTransPath	Remedis	Phikon	PLIP	UNI	CONCH	GigaPath
IDC	0.440	0.420	0.406	0.454	0.491	0.430	0.436	0.502	0.504	0.492
PRAD	0.318	0.328	0.332	0.346	0.335	0.377	0.362	0.357	0.373	0.372
PAAD	0.389	0.410	0.397	0.406	0.451	0.372	0.392	0.424	0.431	0.425
SKCM	0.446	0.452	0.484	0.535	0.577	0.516	0.461	0.613	0.582	0.541
COAD	0.107	0.080	0.102	0.123	0.125	0.137	0.112	0.147	0.124	0.139
READ	0.051	0.038	0.046	0.083	0.099	0.138	0.063	0.162	0.132	0.156
CCRCC	0.136	0.136	0.127	0.171	0.200	0.178	0.124	0.186	0.149	0.182
HCC	0.034	0.028	0.045	0.060	0.059	0.041	0.038	0.051	0.040	0.055
LUAD	0.497	0.507	0.515	0.531	0.573	0.541	0.533	0.511	0.569	0.547
LYMPH_IDC	0.205	0.206	0.218	0.238	0.243	0.243	0.229	0.234	0.249	0.248
Average	0.262	0.261	0.267	0.295	0.315	0.297	0.275	0.319	0.315	0.316

Benchmarking your own model

Our tutorial in 4-Running-HEST-Benchmark.ipynb will guide users interested in benchmarking their own model on HEST-Benchmark.

Note: Spontaneous contributions are encouraged if researchers from the community want to include new models. To do so, simply create a Pull Request.

Issues

The preferred mode of communication is via GitHub issues.
If GitHub issues are inappropriate, email [email protected] (and cc [email protected]).
Immediate response to minor issues may not be available.

Citation

If you find our work useful in your research, please consider citing:

@article{jaume2024hest,
	author = {Jaume, Guillaume and Doucet, Paul and Song, Andrew H. and Lu, Ming Y. and Almagro-Perez, Cristina and Wagner, Sophia J. and Vaidya, Anurag J. and Chen, Richard J. and Williamson, Drew F. K. and Kim, Ahrong and Mahmood, Faisal},
	title = {{HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis}},
	journal = {arXiv},
	year = {2024},
	month = jun,
	eprint = {2406.16192},
	url = {https://arxiv.org/abs/2406.16192v1}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
barcode_coords		barcode_coords
bench_config		bench_config
docs		docs
figures		figures
metadata		metadata
models		models
spot_templates		spot_templates
src/hest		src/hest
tests		tests
tutorials		tutorials
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEST-Library: Bringing Spatial Transcriptomics and Histopathology together

Designed for querying and assembling HEST-1k dataset

What does this repository provide?

Download/Query HEST-1k (743GB)

HEST-Library installation

Additional dependencies (for WSI manipulation):

Additional dependencies (GPU acceleration):

Inspect HEST-1k with HEST-Library

HEST-Library API

HEST-Benchmark

HEST-Benchmark results (06.24.24)

Benchmarking your own model

Issues

Citation

About

Releases 2

Packages

Contributors 2

Languages

License

mahmoodlab/HEST

Folders and files

Latest commit

History

Repository files navigation

HEST-Library: Bringing Spatial Transcriptomics and Histopathology together

Designed for querying and assembling HEST-1k dataset

What does this repository provide?

Download/Query HEST-1k (743GB)

HEST-Library installation

Additional dependencies (for WSI manipulation):

Additional dependencies (GPU acceleration):

Inspect HEST-1k with HEST-Library

HEST-Library API

HEST-Benchmark

HEST-Benchmark results (06.24.24)

Benchmarking your own model

Issues

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages