This project is to localization and predict an object in the image note: this project only detect cucumber, eggplant, and mushroom due the dataset that I used only contains those object. I also using flask as a backend to create an API and html as an interface to make a web from it.
You can get the dataset from Kaggle - Image Localization Dataset, The dataset contains object image with jpg format and xml file is contains annotation from the corresponding images.
I built the model in .ipynb file, I used google colab to helped me built the model and this is the explanation about the .ipynb file:
- I test to plot image with the bounding box, I done this using
xml.etree.ElementTree
library to extract xml fit corresponding image, I extract xmin, ymin, xmax, and ymax from xml file and plot the bounding box around the image usingcv2.rectangle()
with xmin, ymin, xmax, and ymax from the xml files, and this is the result
- Then I read all xml files to extract label, xmin, ymin, xmax, and ymax from those xml files and append them into list. I encode the categorical value into numerical value {"cucumber": 0, "eggplant": 1, "mushroom": 2}, I also read all image files and append the image into list
- I used
np.array()
to convert the lists of image files and outputs (contains label, xmin, ymin, xmax, and ymax) - Then I split inputs and outputs array into x_train, x_test, y_train, and y_test, using
sklearn.model_selection.train_test_split()
with parameters as follows test_size = 0.3 and random_state = 42) - Because y_train and y_test has 5 values contains (label, xmin, ymin, xmax, and ymax) I seperate label with other values (coordinate xmin, ymin, xmax, and ymax to build the bounding box) because our model will have 2 outputs (labels and bounding box coordinate) and 1 input (image array).
- I encode the labels using
tf.keras.utils.to_categorical()
- For the model I used pretrained model MobileNetV2 with input_shape = (224,224,3), with 3 classes, weight = 'imagenet' and include_top = False
- then I added pretrained model into my own layers, I also compile the model with optimizers = Adam(lr=1e-4), loss function has 2 loss for classification is categorical_crossentropy and for bounding box is mse, also in metrics I used 2 metrics, for classification is accuracy and bounding box is mse. Then I fit the model with 50 epochs, and I get this result
- I saved the model to used in API later
- I test the model to predict image and got predict object localization as follows:
For the web app I have:
- app.py for my backend and build API
- static folder for save static files like image and predicted image
- template folder to save html or front end folder
Here's the result