RAG Application for Any File

Welcome to the first version of the RAG (Retrieval-Augmented Generation) application! You can try the application here. With this application, you can upload any PDF document, with a limit of 200 MB per file. You can upload as many PDFs as you'd like.

Once uploaded, you can engage in a chat-like interface to ask questions about the content of the documents. Whether you're querying about company policies and rules or seeking rules for a board game, you're in control.

Installation

To install the app scripts, follow these steps:

Clone the repository to your local machine using the command: git clone https://github.com/pr0fi7/RAG_for_any_file.git
Ensure that you have the required dependencies installed by executing: pip install -r requirements.txt
Create a .env file with your OpenAI API key in the same directory: OPENAI_API_KEY = 'sk-hghghghhhfhfhffhhf'

What is RAG?

RAG is a conversational model where you chat with a Language Model (LLM) that has been trained on a specific corpus of data. This allows for contextually relevant responses based on the provided data.

How it Works

Here's a breakdown of the steps involved, as seen in app.py:

Getting PDF Files: Upload PDF files which are combined into one big text chunk.
Text Preprocessing: The text chunk is split into smaller chunks for processing.
Creating Vector Store: A vector store is created to store embedded data and perform vector searches. The application uses Faiss for this, becaus it runs locally.
Conversational Retrieval Chain: The application utilizes a Conversational Retrieval Chain mechanism to provide answers based on the provided data. If you want to learn more about it here is the article to explore.

Streamlit Integration: Everything is combined in Streamlit, providing a chat interface for user interaction. Custom HTML and CSS templates for chat styling can be found in htmlTemplates.py.

Future Changes

For future versions, the following improvements are planned:

Ability to upload any type of document, not just PDFs.
Prompt engineering to enhance performance.
Consideration of alternative embedding tools, such as Instructor, which has better embeddins than OpenAi according to benchmarks.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
temp		temp
venv_rag10d		venv_rag10d
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app2.py		app2.py
htmlTemplates.py		htmlTemplates.py
link_parser.py		link_parser.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Application for Any File

Installation

What is RAG?

How it Works

Future Changes

About

Releases

Packages

Languages

slvg01/10d_RAG_OnTheFly

Folders and files

Latest commit

History

Repository files navigation

RAG Application for Any File

Installation

What is RAG?

How it Works

Future Changes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages