Repository to host GStreamer based Edge AI applications for TI devices
Barcodes are crucial in inventory management, asset tracking, ticketing, and information sharing. 1-D and 2-D barcodes condense information into a visually coded form. 1-D codes can be scanned with lasers, but 2-D codes, which can encode much more information and include error-correction of damaged codes, require imaging with cameras.
The most challenging part of barcode reading for 1-D or 2-D barcodes is not information extraction, but rather finding the barcode within an image. Complex algorithms using conventional computer vision have been used for many years for this task, but improvements in deep neural networks (a.k.a. deep learning) have shown this technology to be similarly effective and much easier to develop. Significant industry effort has gone into accelerating neural networks at the edge, meaning barcode-localization models can directly benefit as opposed to requiring custom CV algorithm implementation and optimization.
This demo runs a custom trained YOLOX-nano neural network1 on the AM62A and performs object detection on imagery to find the 1-D and 2-D barcodes. Regions with detected codes are cropped and converted to grayscale for decoding using an open source library2. The decoded text is displayed with the bounding box resulting from object detection. C++ and Python implementations are available in this repo.
See Resources for links to AM62A and other Edge AI training material.
DEVICE | Supported |
---|---|
AM62A | ✔️ |
Follow the AM62A Quick Start guide for the AM62A Starter Kit
- Download the Edge AI SDK from ti.com.
- Ensure that the tisdk-edgeai-image-am62axx.wic.xz is being used.
- Install the SDK onto an SD card using a tool like Balena Etcher.
- Establish a network connection onto the device and log in through an SSH session.
Run the setup script below within this repository on the EVM. This requires a network connection on the EVM.
- An ethernet connection is recommended.
- Proxy settings for HTTPS_PROXY may be required if the EVM is behind a firewall.
./setup_barcode_demo.sh
- If the network fails, clone the zbar repository and download the barcode-modelartifacts on a PC, and transfer to the SD card manually, then rerun the setup script.
This will download several tools to the EVM.
- The script (setup_zbar.sh) will download, compile, and install the "zbar" open source library for barcode decoding. Compiling zbar will take several minutes.
- The (setup_model.sh2) script will download a pretrained barcode detection model and install it under /opt/model_zoo in the filesystem
This demo runs with both C++ and Python to show an example of how post-processing code of edgeai-gst-apps can be modified for application-specific models. Note that running other object detection models may be less effective due to these changes. Run commands as follows from the base directory of this repo on the EVM.
For C++:
./apps_cpp/bin/Release/app_edgeai ./configs/barcode-reader.yaml
For Python3:
python3 ./apps_python/app_edgeai.py ./configs/barcode-reader.yaml
On the AM62A starter kit EVM, the barcode detection model uses yolox-nano architecture and runs at >100 fps. However, performance will slow down for more barcodes since each must be decoded individually, adding linear overhead. The fps for this application is likely to operate in the 15-20 fps range for 2+ barcodes in the field of view. This application can work on multiple types of barcodes, like QR codes and EAN-8.
There is significant opportunity for improving the performance with more multiprocessing on the Arm CPU cores and by developing a more specific, optimized implementation of 1-D or 2-D barcode decoding.
This section of the Readme describes how the application was developed.
A survey of research papers and other literature on barcode scanning showed that localizing the barcode takes substantially more processing than decoding. This makes the problem ripe for a 2-stage solution of
- deep learning for localization and
- conventional methods for decoding the barcode in a cropped region. We'll use an open-source library for this.
Note that since barcodes contain dense, highly structured information, developing a deep neural network that can decode this information directly is nigh-impossible given the size of the search space (2^NUM_BITS possible in the code space) without developing a very complex, custom architecture with a massive training dataset. Conventional means of decoding are more appropriate.
The first stage is building a model to localize a variety of barcode types. We selected among several public and openly licensed datasets found from Kaggle and a github pointing to many existing labelled barcode datasets.
After selecting several datasets, we realized these had inconsistent formats for labeling. TI's training tools use the COCO-JSON format, so we wrote a (rough) script to "COCOify" these datasets. A checker script can check that the COCO json is valid and that referenced image files exist.
- Each of the selected datasets already had labels associated. We assume these labels are high enough quality to use, although they did need to be reformatted.
- Some datasets use segmentation masks instead of bounding boxes. We found an (unoptimized) algorithm online to extract bounding boxes from segmentation masks.
- One of the datasets used was very large and constructed of synthetic data. In fact, many barcode datasets in the research space use synthetically generated images with barcodes given the lack of otherwise publicly available datasets. These synthetic images typically less useful than real-world images, so we only selected a small percentage of those datasets for use within our own.
The data_manipulation.py script performs operations like combining multiple COCO formatted datasets, making a test-train split, performing (heavy) augmentation. This produces outputs for training (with and without augmentation) and testing.
A small set of images (25) were collected and manually labelled as well using Edge AI Studio. The entire dataset was around 1k images before augmentation, and we used 8k augmented images for training.
- Note that augmentation is important! This significantly boosts accuracy and robustness. Comparing augmented and unaugmented training results, we see the accuracy improve by 16.5% on the distinct testing set (0.50 -> 0.58 mAP50-95) as well as performing noticeably better on live input.
The fully combined and augmented dataset was uploaded to Edge AI Studio, and the Yolox-nano model was trained using a batch size of 4 and 20 epochs. This trained model was then compiled and the artifacts were downloaded to the PC.
Barcode decoding requires extracting structured information from a (generally) grayscale image. Internet searches showed that there are several libraries available for use, including several open-source libraries like ZBar1 and ZXing. The latter is not preferred because it is developed specifically for mobile applications in Java.
We selected ZBar since it has ports to several languages beyond its base C++ implementation. Importantly, Python bindings are available. Once we figured out how to build and install the library, we could give it a test run to ensure it could decode 1-D and 2-D codes.
We generated a few QR codes online, copied the files, and used zbar example code to extract the encoded text from the image. The same was done for 1-D barcodes.
Additionally, we ran a test on the AM62A in a small standalone program to evaluate the performance boost of DL for localiztion + decoding on cropped image vs. decoding on the entire image (of which <1/20th had pixels relevant to a barcode).
On a 1280x720 image, running zbar took around 120 ms and failed to find the QR code within the image.
Alternatively, we ran the deep learning model to localize and crop an image to the barcode. The model ran in <10 ms and decoding on the small portion of the image with the QR code (approx. 100x100 pixels) took ~5ms. In this way, we see nearly 10x improvement in performance as well as improvement in accuracy. We did not find noticeable differences in performance between C++ and Python implementations.
We did note from this that when codes are turned, the bounding box coordinates resulting from the barcode-localization model will cut off corners, so additional space needs to be added to the original coordiantes prior to cropping. Running on a larger area increases compute time per zbar-call.
The original repo from which this is forked, edgeai-gst-apps, handles the bulk of the work in creating this application. For the end-application, we wanted to simply show detection results from the deep learning model and print text to the screen corresponding to the code's data.
To accomplish this, all that's needed is to add logic to the post-processing code for either C++ or Python. Here, we look at the bounding boxes for where barcodes should be and crop a section of the image to include that code. Then, we run the zbar library to decode the code within that image. The decoded text is drawn onto the image with OpenCV.
The rest of application (grab live input, preprocess, run the deep learning model, output final result to display/file) is handled by the rest of edgeai-gst-apps. To see another example that constructs a gstreamer pipeline for a more specific use-case (and thus, somewhat easier to interpret), please see the retail-checkout repo.
The current program is not running at maximum efficiency. The detection model is running at >100 fps, but the rest of the pipeline is slower than this. Reasons for the bottleneck are as follows:
- Slow cameras: USB 2.0 cameras are generally limited at their max resolution. For instance, a C920 1080p webcam only produces 15 fps
- Too many detections: Many barcodes in the space will cause linear scaling due to more croppings and barcode-decoding API calls
- Drawing on images with CPU: OpenCV calls can add substantial latency
Accuracy for decoding is also not perfect. Reasons include:
- The barcode is oriented towards the cameras such that part of the code is cropped out based on the deep learning model detection
- We attempt to decode directly after cropping, without regard for rotations or perspective. Cropping to a larger area and running edge/corner detection before rotating and/or recropping may help improve decoding accuracy
- Calls to the decoding library are generic to the code type, such that many can be recognized. Selecting a particular one, like QR-codes, may substantially boost performance and accuracy
- Depending on the camera, the focus settings may cause the image to be blurry. This makes it difficult for the decoding library to distinguish where one bit/bar is in the code versus an adjacent one