This tutorial will introduce you to the concept of object detection in Python using the OpenCV library and how you can utilize it to perform tasks like Facial detection.
Face detection is a computer vision technology that helps to locate/visualize human faces in digital images. This technique is a specific use case of object detection technology that deals with detecting instances of semantic objects of a certain class (such as humans, buildings or cars) in digital images and videos. With the advent of technology, face detection has gained a lot of importance especially in fields like photography, security, and marketing.
Hands-on knowledge of Numpy and Matplotlib is required before working on the concepts of OpenCV. Make sure that you have the following packages installed and running before installing OpenCV.
Table of Contents
2. Face Detection
OpenCV was started at Intel in the year 1999 by Gary Bradsky. The first release came a little later in the year 2000. OpenCV essentially stands for Open Source Computer Vision Library. Although it is written in optimized C/C++, it has interfaces for Python and Java along with C++. OpenCV boasts of an active user base all over the world with its use increasing day by day due to the surge in computer vision applications.
OpenCV-Python supports all the leading platforms like Mac OS, Linux, and Windows. It can be installed in either of the following ways:
Packages for standard desktop environments (Windows, macOS, almost any GNU/Linux distribution)
You can either use Jupyter notebooks or any Python IDE of your choice for writing the scripts.
2. Face Detection
Basics Of Haar-cascade
Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.
Here we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. For this, haar features shown in the below image are used. They are just like our convolutional kernel. Each feature is a single value obtained by subtracting the sum of pixels under a white rectangle from a sum of pixels under a black rectangle.
Now all possible sizes and locations of each kernel are used to calculate plenty of features. (Just imagine how much computation it needs? Even a 24x24 window results over 160000 features). For each feature calculation, we need to find the sum of pixels under white and black rectangles. To solve this, they introduced the integral images. It simplifies calculation of the sum of pixels, how large may be the number of pixels, to an operation involving just four pixels. Nice, isn’t it? It makes things super-fast.
But among all these features we calculated, most of them are irrelevant. For example, consider the image below. The top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other place is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by AdaBoost.
For this, we apply each and every feature on all the training images. For each feature, it finds the best threshold which will classify the faces to positive and negative. But obviously, there will be errors or misclassifications. We select the features with the minimum error rate, which means they are the features that best classifies the face and non-face images. (The process is not as simple as this. Each image is given equal weight in the beginning. After each classification, weights of misclassified images are increased. Then again the same process is done. New error rates are calculated. Also new weights. The process is continued until the required accuracy or error rate is achieved or required number of features are found).
Final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can’t classify the image, but together with others forms a strong classifier. The paper says even 200 features provide detection with 95% accuracy. Their final setup had around 6000 features. (Imagine a reduction from 160000+ features to 6000 features. That is a big gain).
So now you take an image. Take each 24x24 window. Apply 6000 features to it. Check if it is a face or not. Wow. Wow... Isn’t it a little inefficient and time-consuming? Yes, it is. Authors have a good solution for that.
For this, they introduced the concept of Cascade of Classifiers. Instead of applying all the 6000 features on a window, group the features into different stages of classifiers and apply one-by-one. (Normally first few stages will contain very less number of features). If a window fails the first stage, discard it. We don’t consider remaining features on it. If it passes, apply the second stage of features and continue the process. The window which passes all stages is a face region. How is the plan !!!
Authors’ detector had 6000+ features with 38 stages with 1, 10, 25, 25 and 50 features in the first five stages. (Two features in the above image is actually obtained as the best two features from Adaboost). According to authors, on an average, 10 features out of 6000+ are evaluated per sub-window.
Haar-cascade Detection in OpenCV
OpenCV comes with a trainer as well as a detector. If you want to train your own classifier for any object like car, planes etc. you can use OpenCV to create one.
There are my pre-trained models which can find your eyes, nose, car, planes and etc. You can download those models on GitHub.com
First, we need to load the required XML classifiers. Then load our input image (or video) in grayscale mode.
import numpy as np
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
img = cv2.imread('sachin.jpg')
Now we find the faces in the image. If faces are found, it returns the positions of detected faces as Rect(x,y,w,h). Once we get these locations, we can create an ROI for the face and apply eye detection on this ROI (since eyes are always on the face !!! ).
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
The result looks like below:
Real_time face Detection
First, we have to import these libraries :
import numpy as np
Then, after importing those libraries we have to write this code so, that we can detect our face by using our webcam and when the face is detected it’s close after 50 seconds of detection and the program will close.
base = 700 #it's set the image width and height automatically face_cascade = cv2.CascadeClassifier(r'C:\Users\avish\Downloads\faace.xml') #haar cascade file of face dectection eye_cascade = cv2.CascadeClassifier(r'C:\Users\avish\Downloads\eye1.xml') #haar cascade file of eye dectection font = cv2.FONT_HERSHEY_SIMPLEX #font style for text overlay bottomLeftCornerOfText = (10,50) fontScale = 1 fontColor = (255,255,255) lineType = 2
# capture frames from a camera cap = cv2.VideoCapture(0) c_var = 0 # loop runs if capturing has been initialized. while 1: # reads frames from a camera ret, img = cap.read() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, 1.3, 5) if(type(faces)==type((2,3))): c_var-=1
for (x,y,w,h) in faces: #now,we are putting it in loop img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2) roi_gray = gray[y:y+h, x:x+w] roi_color = img[y:y+h, x:x+w] if(c_var<0): c_var = 0 else: c_var+=1 cv2.putText(img,'Face Detected', bottomLeftCornerOfText, font, fontScale, fontColor, lineType) eyes = eye_cascade.detectMultiScale(roi_gray) for (ex,ey,ew,eh) in eyes: cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2) cv2.putText(img,'c_var : '+str(c_var), (10,100), font, fontScale, fontColor, lineType) cv2.imshow('img',img) ## if(c_var==50): break
# Wait for Esc key to stop k = cv2.waitKey(30) & 0xff if k == 27: break # Close the window cap.release() cv2.destroyAllWindows()
Visit our site: https://pytholabs.com/
Our machine learning course Enroll now!