persian_pdf_converter

A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.

Features

Convert PDF files to Word documents with text recognition
Modify URLs based on directory paths

Requirements

Python 3.6 or higher
Tesseract OCR installed and configured

Installation

To install the package, use pip:

pip install persian-pdf-converter

Usage

Here is an example of how to use the functions provided by this package:

from persian_pdf_converter.pdf_converter import pdf_to_word

# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'

# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")

pdf_to_word Function

This function converts a PDF file to a Word document with text recognition.

Parameters:

pdf_path (str): Path to the PDF file.
output_dir (str): Directory where the output Word file will be saved.
lang (str): Languages to be used by Tesseract for text recognition (default is "fas+eng").
Additional keyword arguments for convert_from_path.

Returns:

str: Name of the output Word file.

Development

To contribute to this project, follow these steps:

Clone the repository:

git clone https://github.com/mahdiramezanii/persian_pdf_converter.git

Navigate to the project directory:
```
cd persian_pdf_converter
```

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Make your changes and run tests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to contact me at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
build/lib		build/lib
dist		dist
persian_pdf_converter.egg-info		persian_pdf_converter.egg-info
persian_pdf_converter		persian_pdf_converter
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

persian_pdf_converter

Features

Requirements

Installation

Usage

pdf_to_word Function

Parameters:

Returns:

Development

License

Contact

About

Releases

Packages

Languages

License

sut-developer/persian_pdf_converter

Folders and files

Latest commit

History

Repository files navigation

persian_pdf_converter

Features

Requirements

Installation

Usage

pdf_to_word Function

Parameters:

Returns:

Development

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages