The repository contains a model for binary semantic segmentation of the documents.
- Left: input.
- Center: prediction.
- Right: overlay of the image and predicted mask.
pip install -U midv500models
Jupyter notebook with an example:
Model is trained on MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream.
Download the dataset from the ftp server with
wget -r ftp://smartengines.com/midv-500/
Unpack the dataset
cd smartengines.com/midv-500/dataset/
unzip \*.zip
The resulting folder structure will be
smartengines.com
midv-500
dataset
01_alb_id
ground_truth
CA
CA01_01.tif
...
images
CA
CA01_01.json
...
...
...
...
...
To preprocess the data use the script
python midv500models/preprocess_data.py -i <input_folder> \
-o <output_folder>
where input_folder
corresponds to the file with the unpacked dataset and output folder will look as:
images
CA01_01.jpg
...
masks
CA01_01.png
target binary masks will have values [0, 255], where 0 is background and 255 is the document.
python midv500models/train.py -c midv500models/configs/2020-05-19.yaml \
-i <path to train>
python midv500models/inference.py -c midv500models/configs/2020-05-19.yaml \
-i <path to images> \
-o <path to save preidctions>
-w <path to weights>