By- Azfar Lari
Files
-
extract.py: contains functions for extracting data from pdfs of diffetent banks(currently, Yes Bank and Allahabad Bank)
-
analysis.py: contains functions for processing and analizing the data
-
main.py: the driver code that accepts the filename of the PDF
Setup Prequisites:- Python 3 and tesseract need to be installed for tesseract, see: https://github.com/tesseract-ocr/tesseract/wiki
-
Run pip install -r requirements.txt
-
Run python main.py <filename.pdf>
This starts the process and generates the outputs in excelsheets and returns JSON output