Low Latency Deep Learning on Smartphones

In this project, we empirically evaluate the performance of two mobile DL frameworks TensorFlow Lite and CoreML , on the inference performance on Android and iOS devices, respectively, using various Convolution Neural Network Architectures.

Frameworks

TensorFlow Lite (for Android)
CoreML (for iOS)

On-board devices and API Support

CPU:
- XNNPack Optimized [TF Lite, Android]
GPU
Apple Neural Engine (ANE) [CoreML, iOS]
Neural Network API (NNAPI) [TF Lite, Android]

Usage

TFLite Performance

In Android Studio, use the Open an Existing Project option and select the Android-TFLite\TFLitePerformance folder.
If you wish to use a custom TF-Lite model, copy the .tflite file to the app\assets folder. Also, update the path in the model's classifier java file (like ClassifierSqueezeNet.java), in the lib_support library. The path is present in the getModelPath function.
Use the Device File Explorer to upload the test data onto the device. Default location of data is /data/local/tmp/DataSet/. This can be modified by updating the following line of code at line 114 in the MainActivity.java.
```
File dataset_folder = new File(<Dataset location on device>);
```
Build the project using Build > Make Project.
Run the application on device/emulator using Run > Run 'app'.
If you wish to start profiling with application launch, use Run > Profile 'app' instead.
On Device/Emulator, ensure the correct model and device is selected.
Click the Benchmark button on the mobile application to initiate the inferencing.

Benchmark Metrics

CPU

TensorFlow Lite

These metrics were recorded using Android Studio CPU Profiler.

MobileNet V2 on CPU (no delegates)

4 Threads

8 Threads

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate
CoreML

Metrics are recorded in three different scenarios

GPU

TensorFlow Lite

These metrics were recorded using Qualcomm Snapdragon Profiler.

MobileNet V2 on CPU (no delegates)

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate

Throughput

CoreML

Memory

CoreML

FPS

For evluation purposes Frames Per second, captured in real time video is restriced to 50, following is the FPS processed by the model in real time.

CoreML

Impact of quantization on accuracy

CoreML

Evaluation Results

Size vs Accuracy	Inference Time vs Accuracy

Key Points:

The same AI model on different platform has different accuracy.
On-board computation device (CPU, GPU) selection is highly dependent on the mobile device.
Neural network optimized capabilities such as ANE (CoreML, iOS) and NNAPI (TFLite, Android) are not always the best choice for inferencing.
There is a substantial trade-off between accuracy and throughput. This trade-off must be addressed as per the application requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Android-TFLite/TFLitePerformance		Android-TFLite/TFLitePerformance
Evaluation_Results		Evaluation_Results
IOS-CoreML		IOS-CoreML
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Low Latency Deep Learning on Smartphones

Frameworks

On-board devices and API Support

Usage

TFLite Performance

Benchmark Metrics

CPU

TensorFlow Lite

MobileNet V2 on CPU (no delegates)

4 Threads

8 Threads

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate

CoreML

GPU

TensorFlow Lite

MobileNet V2 on CPU (no delegates)

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate

Throughput

CoreML

Memory

CoreML

FPS

CoreML

Impact of quantization on accuracy

CoreML

Evaluation Results

Key Points:

Environments

About

Releases

Packages

Contributors 2

Languages

License

sukumarh/low-latency-mobile-CV

Folders and files

Latest commit

History

Repository files navigation

Low Latency Deep Learning on Smartphones

Frameworks

On-board devices and API Support

Usage

TFLite Performance

Benchmark Metrics

CPU

TensorFlow Lite

MobileNet V2 on CPU (no delegates)

4 Threads

8 Threads

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate

CoreML

GPU

TensorFlow Lite

MobileNet V2 on CPU (no delegates)

MobileNet V2 using GPU delegate

MobileNet V2 using NNAPI delegate

Throughput

CoreML

Memory

CoreML

FPS

CoreML

Impact of quantization on accuracy

CoreML

Evaluation Results

Key Points:

Environments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages