Motivation


Face identification systems find wide usage in biometric security applications. Face identification algorithms have become progressively better over the years, but still fall short of human-like performance in most situations. Unconstrained face identification involves face identification in non tailor-made situations - real-life situations with real-life issues like noise, occlusion, lighting etc.


Isn't full-frontal face identification a long-solved problem?

While it is widely considered, in the computer vision community, that full-frontal face identification is no longer a challenge, with many commercial algorithms achieving excellent results on most datasets, it is not true that algorithms have reached a "ceiling", of sorts, in performance. Except in the best-of-best situations algorithms still fall short massively in images which hardly pose much of a problem to humans. Face recognition algorithms are particularly sensitive to changes in illumination, shadows, difference in expressions, hairstyles etc. Humans, on the other hand, are quite insensitive to these situations, performing just as well in matching "hard" pairs vis-a-vis "easy" pairs. It is clearly necessary to develop a new face identification feature set that is invariant to shifts in pose, illumination and expression.


The Idea


Given the performance of humans on these tasks, it is natural to look to the human brain as a starting point for algorithm development. Biologically-inspired algorithms, as the name suggests, draw inspiration from the computational architecture of the brain, attempting to recreate, and eventually mimic its computational capabilities. Prof. Cox and his group recently proposed a set of neuromorphic feature descriptors obtained through a large-scale random search in parameter space of a multilayer convolutional network.

We apply this feature set to NIST's "The Good, The Bad and The Ugly" (GBU) dataset. GBU comprises three partitions - 'good, 'bad' and 'ugly'. Images in the 'good' partition are well-suited to face identification algorithms, not corrupted with high illumination or expression variation. Most algorithms do quite well on this partition, with the best reported accuracy being ~98%. The 'ugly' partition, at the other end of the spectrum, is highly ill-suited to existing face identification frameworks, with the best reported results being ~15% for a fusion-based algorithm. The 'bad' partition sits somewhere in-between these two extremes, the best reported results being ~80%.

We are particularly interested in the 'ugly' partition, because this seems to best bring out the weaknesses of existing face identification frameworks. It is from this dataset that we can best understand the places current state-of-the-art falls short and learn methods to fix these issues. We employ a wide range of techniques with the aim of ultimately achieving near-human performance on this dataset.


How did we do?



Given the strong variation in lighting throughout the GBU 'ugly' partition, our first step was to normalize the images using an eye-detector based normalization technique. We then perform a large scale random search in parameter space to find the features that best describe the challenge problem. A linear SVM is trained on the training set using these feature descriptors. Using just this SVM as a classifier we achieved a verification rate of ~14%. This is the highest verification rate observed on the GBU 'ugly' partition using a single classifier (15% was achieved using a fusion-based technique).

To better this result, we drew further inspiration from biological face identification and made use of attribute classifiers. Using the scores of different facial attributes, we established a measure of confidence in that attribute. For example, say in one image the person is smiling and in the other he/she is not. The "smiling" attribute classifier will then yield widely different scores for the two images. This can be interpreted as lack of confidence in using that facial region (the lips in this case) for identification purposes as the similarity or difference in the look of the facial region in the two images could be due to the presence or absence of the facial attribute. This is quite similar to the way our own brains identify faces - if in one image a person is smiling and in another he is not we intuitively ignore the mouth region and base our judgement on other regions of the face (the eyes, for example).

Since this work is still underway and has not yet been published, technical aspects of our face identification pipeline cannot be described here. For more details, contact me.




Current and Future Work


YouTube Faces is a dataset for unconstrained face identification in video sequences. We are currently working on extending the existing face identification framework to video sequences and benchmarking our performance against the results reported on the set.

In theory, since it is impractical to perform a random search through the entire parameter space of feature descriptors, it is always possible to generate better features by searching for longer. A faster alternative to random search is HyperOpt, a hyper parameter optimization technique that, in theory, can converge to a better set of parameters much faster than random search. We are currently using HyperOpt to further explore the parameter space to see if we can find descriptors that better describe the face identification problem.