Skip to content

gsajko/dharmaQA

Repository files navigation

dharmaQA

This is my "Hello World!" in the realm of RAGs.

This is a project to build a basic question answering system with RAG (Retrieval-Augmented Generation). Dataset used is dataset thats important to me, which is dataset made from Rob Burbea's Dharma talks.

Unfortunatelly, that dataset isn't well-suited for RAG system - it's not factual, it has long-winded answers, that are sometimes not directly related to the question.

For this kind of dataset, fine-tuning a language model would be more appropriate.

I'll explore RAG using a different dataset, and then come back to this dataset later.

Notes

App is deployed with streamlit cloud

It retrieves context from transcripts of Rob Burbea's Dharma talks, and generates a response based on the context.

Transcripts where downloaded from https://airtable.com/appe9WAZCVxfdGDnX/shr9OS6jqmWvWTG5g/tblHlCKWIIhZzEFMk/viw3k0IfSo0Dve9ZJ in the form of a pdf files.

I used marker to convert pdf to Markdown files before ingesting them.

About

chatbot using lancedb, langchain and streamlit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published