Skip to content

Welcome to advanced Image Data Cleansing Algorithm, a powerful tool designed to enhance data quality by accurately determining the gender of individuals in images through facial analysis. With its robust face detection capabilities, the algorithm efficiently identifies and verifies gender information and facilitating data cleansing processes.

License

Notifications You must be signed in to change notification settings

nelson123-lab/Gender_based_cleaning_algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Gender Based Cleaning Algorithm

Python contributions welcome LinkedIn Youtube Medium

Data cleansing is the major use for this algorithm. It aids in determining the gender of an image by looking at the face. The image is erased if the face cannot be located. The algorithm can be altered to suit different requirements.

General Script

from deepface import DeepFace # Pretrained model which is present in DeepFace library.
from tqdm import tqdm # Used to create a bar that represents process progress.
import cv2
import matplotlib.pyplot as plt
import time
import os
start = time.time()
# plt.imshow(img[:,:,::-1])
# plt.show() # To display the image if required.

dire = r"Location of folder in which all the files are present"
for img in tqdm(os.listdir(dire)):
    path = dire+'/'+img
    try:
        # print(path)
        img = cv2.imread(path)
        result = DeepFace.analyze(img, actions= ['gender'])
        # print("Gender: ", result['gender']) 
        if result['gender'] # We can make changes here for custom use.
            os.remove(path)
    except ValueError:
        os.remove(path)
print("All is done.") # To understand that all the process is finished.
time.sleep(1)
end = time.time()
print(f"Runtime of the program is {end-start}") # To print out the final execution time.

Use Cases:-

1. To eliminate noisy photos and only keep images with human faces.

Multiple photographs taken from the internet are combined in the folder. The files contain photos of various genders, some of which are corrupt. These noisy photos can be removed with the help of our script.

Noise in Face data

The noisy images displayed here are not just arbitrary snapshots. In reality, these are pictures that in some way depict the attributes of a face. These are the results of a face detector model using MTCNN that was cropped out.

Implementation

We only need to make changes to one line of the general script as follows:-

if result['gender'] != "Man" and result['gender'] != "Woman": #change the General script with this line of code.
    os.remove(path)

After running the script we will obtain the following results as shown below.

The only photographs left are those with human faces.

Progress bar is shown for understanding the cleaning status. Total execution time will be printed out at the end along with the text "All is done".

2. To determine how many photos contain human faces.

This uses the same directory as above. We must add a variable count and make the appropriate adjustments in order to determine the number of photos that contain human faces.

from deepface import DeepFace
from tqdm import tqdm
import cv2
import os

dire = r"Location of folder in which all the files are present"
count = 0 #Initiated count
for img in tqdm(os.listdir(dire)):
    path = dire+'/'+img
    try:
        img = cv2.imread(path)
        result = DeepFace.analyze(img, actions= ['gender'])
        if result['gender'] == "Man" or result['gender'] == "Woman":
            count += 1 # Count value is incremented when a face is found.
    except ValueError:
        pass
print("No of human faces =",count)

Output is given as

No of human faces = 9

3. To only save pictures with male faces.

Implementation

We only need to make changes to one line of the general script as follows:-

if result['gender'] != "Man" #change the General script with this line of code.
    os.remove(path)

After executing the script, we will receive a folder with only photographs of men in it and the rest empty.

4. To only save pictures of women's faces.

Implementation

We only need to make changes to one line of the general script as follows:-

if result['gender'] != "Woman" #change the General script with this line of code.
    os.remove(path)

After executing the script, we will receive a folder with just photographs of women in it, with the rest of the images being deleted.

Dependency Installation

The essential libraries can be downloaded from 'PyPI' for installation. The libraries themselves as well as their requirements will be installed.

pip install deepface

-Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace and Dlib. The library is mainly powered by TensorFlow and Keras. Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.

pip install tqdm

-tqdm instantly make your loops show a smart progress meter - just wrap any iterable with tqdm(iterable), and you’re done!

pip install opencv-python

-OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source library that includes several hundreds of computer vision algorithms.

pip install matplotlib

-Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

-The time and OS modules are part of Python's standard library. So no need to download it.

Then you will be able to import the libraries and use its functionalities.

Contribution

Pull requests are welcome.

About

Welcome to advanced Image Data Cleansing Algorithm, a powerful tool designed to enhance data quality by accurately determining the gender of individuals in images through facial analysis. With its robust face detection capabilities, the algorithm efficiently identifies and verifies gender information and facilitating data cleansing processes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published