ragdata
builds knowledge bases for Retrieval Augmented Generation (RAG).
This project has processes to build txtai embeddings databases for common datasets.
The currently supported datasets are:
Each of the links above has full instructions on how to build those datasets, including using this project.
The easiest way to install is via pip and PyPI
pip install ragdata
Python 3.9+ is supported. Using a Python virtual environment is recommended.
ragdata
can also be installed directly from GitHub to access the latest, unreleased features.
pip install git+https://github.com/neuml/ragdata