How does the face recognition feature work in gadgets?
The world we are living in is such that the general interaction among individuals is not just limited to communicating and expressing thoughts, but it also involves us being observant and aware of the disposition and ‘vibe’ we receive from the person before us. While interacting, each individual is subconsciously involved in observing the facial expressions and general attitude with which a person communicates. This signals one's discretion to generate an impression about the person standing before them and also cautions their senses on how much information or knowledge is to be conveyed at their end.
The unique ability of humans to recognise and differentiate between individuals is based on facial appearance and expression that they tend to read. This recognition ability made the human brain different from machines and operating systems of devices that earlier could not recognise, efficiently regulate or authenticate the access made to them or to the data which is stored.
However, with the introduction of Facial Recognition Technology (FRT) in the field of biometrics, a benchmark has been set, by enabling electronic gadgets, software devices and e-payment gateways to regulate the passage or information access by an individual by first recognising and authenticating the user identity through their facial features and expression.
This groundbreaking application of Machine Learning makes one's facial features a source of data, or a password for the device to recognise the user identity and only then allow access.
The main question which arises is how a device that works on an in-built algorithm will be able to perform this procedure effectively and match the correct details of one with the help of facial analysis?
The procedure that follows, in layman’s terms can be described as a game of “Where’s Waldo”, whose elaborate steps with respect to FRT are as follows:
Firstly, a scene is interpreted in a way the computer understands. The picture is decolourised and focus is given only to the amount of brightness at each pixel. The surrounding pixels are then considered and a vector is calculated that shows the direction and intensity of change in brightness. This can be done at each point to get a Histogram of Oriented Gradient or HOG. All this tells the computer that when there is a strong amount of difference, there is an edge. The face that is to be detected in a scene captured, its face HOG is found and then the entire picture is scanned to find a match i.e. the region in the scene where the face HOG that is to be detected is found.
On obtaining the HOGs, the landmarks are identified on each face as coordinates. Typically researchers use 68 landmarks, that corresponds to points on the chin, left eyebrow, nose etc. Now, the image is ready to be processed with a neural network. The interesting part of this is that a match can be found between two images having the same face which are captured at different orientations. The reference image is processed and its HOG is obtained which would be different from that of the off-angle image. Hence in the additional step, the facial landmark data of both are obtained and the off-angle image is scaled and rotated and hence in this method, it can be normalised. This same technology is used by Snapchat in overlaying its filters. Hence in a scene where many faces are present in different angles, they are all made to orient in a normalised position using facial landmarks before we try to find a match.
At this stage, we realise comparing facial landmark data for facial recognition isn’t enough when the data set of a region has a large number of facial data such that there are multiple images of the same person. This is solved by the method of ‘Convolutional Neural Network’.
A person's picture would contain a lot of information where each pixel is represented by three numbers, each corresponding to red, green and blue values. For a picture which is 64 by 64 pixel, that implies 12288 data points to work with. Training a set of even 500 known images of the same person implies 644000 data points, enough to find the specific features in each picture unique to a person.
Various feature maps are created using different types of filter patterns to mark specific edges on one's face. Stack of all the resulting feature maps forms a convolutional layer in the neural network. The interesting part about an architecture like this is that we can have another convolutional layer which only reads the layer in front of it. Here simple features are morphed into complex representations. To reduce the area of these layers, a process called ‘Pooling’ is implemented, to save computational power in analysing a picture. This way we get a mathematical representation of a person by a large network of convolutional and pooling layers. Based on the strength or weight of these filters a score is output by the network, which is objective for a person’s image. This is the actual learning aspect of Machine Learning where one guarantees the score of a person's image to be far different from that of another. However, the score of different images of a person is expected to be in the same range.
Thus, it is important to know whichever kind of data facial recognition collects, and what different companies are doing with it both good and bad.
-Lakshya Rohera, Techniche Media