Skip to content

OCR to scan texts from a PDF and return the text as output.

Notifications You must be signed in to change notification settings

JaynouOliver/NLPTextMapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📖 NLP Text Mapping Project

Dive into Natural Language Processing with ease!


📌 Introduction

This NLP project focuses on text mapping techniques, exploring various methodologies to handle and analyze textual data efficiently. With Jupyter notebooks for proofs of concept and a structured source code setup, it's designed for easy understanding and extensibility.

🛠 Setup

Prerequisites

  • Python 3.8 or above
  • pip for managing Python packages

Installation

  1. Clone this repository:
    gh repo clone JaynouOliver/NLPTextMapping
  2. Navigate to the project directory:
    cd NLPTextMapping
  3. Install the required Python packages:
    pip install -r requirements.txt

🚀 Usage

Running Notebooks

To explore the text mapping concepts:

  1. Navigate to the Notebooks directory.
  2. Open the desired notebook (e.g., PoC - Text mapping.ipynb) using Jupyter Notebook or JupyterLab.
  3. Run the cells sequentially to understand the workflow and outputs.

Running the Application

  1. Navigate to the src directory.
  2. Run the main application:
    python main.py

🔧 Development

This project is structured for easy understanding and further development:

  • Notebooks: For experimenting with text mapping concepts and visualizing results.
  • src: Contains the core logic, split into modular components for ease of enhancement and maintenance.

Key Components:

  • config.py: Central configuration file.
  • embedding.py: Handles embedding generation and manipulation.
  • main.py: The starting point of the application.
  • match.py: Implements matching logic.
  • pre_process.py: Prepares data for processing.
  • split_doc.py: Splits documents for easier handling.
  • utils.py: Utility functions supporting various tasks.

📈 Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Made with ❤️ by Suvrakamal


About

OCR to scan texts from a PDF and return the text as output.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published