With the ongoing reduction in size and cost of computing and optical
monitoring equipment, many governmental and commercial groups are
attempting to develop "intelligent automobiles." According to a review
of the field, most major automakers have been pursuing some sort of drowsy
driver detection, using methods ranging from monitoring the weaving of the
car using yaw rate sensors to monitoring the driver's eye with an in-car
camera.
Particularly notable is the U.S. Department of Transportation's Intelligent Vehicle Initiative
. Intelligent cars are cars that can respond to the state of the driver
and increase both safety and convenience. As 90% of accidents occur due to
driver error, a great savings in human life and financial loss could be
realized by devices that help effective accident avoidance.
Driver drowsiness is one specific form of human error that has been well
studied. Studies have shown that immediately prior to fatigue-induced
accidents, the driver's eye exhibits a change in blinking behavior.
Specifically, the frequency of blinking increases and the percentage of the
eye covered by the lid increases. We reproduce a graph summarizing eyelid
closure percentage over time before accidents (from Eye-Activity
Measures of Fatigue and Napping as a Fatigue Countermeasure, Federal Highway
Administration) as Figure 1.
As the eye closure occurrences dramatically increase during the 10-second
period preceding an accident, monitoring such closures could allow the car
to take some form of automated response to wake up the driver, e.g. a loud
noise, a bright light, possibly even the activation of an "autopilot" if
that capability is developed. It is also known that the duration of the eye
closures one minute before an accident is much higher than at earlier times.
Finally, partial eye closures (measured by the ratio between horizontal and
vertical portions of the visible pupil) have been shown to be an excellent
way to detect drownsiness, as much as 10-12 minutes prior to an accident (IVI
Brochure, http://www.its.dot.gov/ivi/ivi.htm
)
Consequently, we attempted to devise a camera + image processor system to
take images of a driver's face and then attempted to process those images to
determine whether the eyes were open or closed.
After trying to use cameras in the visible range of the spectrum to detect
eye closure, and having considerable difficulty due to false recognition
of eyes in the background of the image and due to the large change in
image from day to night, we decided to implement an infrared imaging system
consisting of a camera and an infrared LED as a source of illumination. This
allows several benefits. First, the illumination level can be held constant
-- an IR LED can illuminate the driver day and night without distracting the
driver. This makes the development of image processing algorithims simpler.
Second, we can select the intensity and focus of the IR light such that the
driver's face is the only object illuminated. By placing appropriate spectral
filters in front of the camera aperture, one could restrict the signal to
only the IR light scattered from the face.
Theory and Methodology
Using IR illumination and IR camera
Below are some examples of using an "ordinary" digital camera during day and
nighttime driving conditions. As you can see, not enough light is captured by
the particular camera we were using to handle both situations. We felt like we
needed an imaging system that could handle both daytime and nighttime
conditions. So, using the literature as an example, we chose an IR camera.
Selected images taken with a normal visible light camera
Selected images taken with a IR light camera
Human face during daylight
Human face in total visual darkness illuminated with IR
light
The daytime images are especially encouraging, since only the eyes seem to
show up in the picture at all.
Neural Network
Neural networks are programming abstractions that attempt to make decisions
based on a complex network. By training this network, the neural network
attempts to classify how close an input is to the space spanned by the training
set. Thresholding in this closeness measure produces a result. Neural networks
are notorious for being difficult to train.
Originally, we had an idea to use some code developed for another class to
perform face detection and modify this code to perform eye detection.
Conveniently, MATLAB has a neural network toolbox which this implementation
utilized. It seemed like a good chance to learn about neural networks and their
applications to image processing. However, as you will see below, we quickly
abandoned this idea for more conventional image processing techniques.
Correlation Methods
Template matching is one possible technique for searching for a pattern
within an image. Typically, a suitable "template" is chosen as a feature to
search for within in an image. In our case, we chose an "average" eye by
averaging over test cases. An example of an average is shown below.
"facemask" used for finding 2 eyes in image
"eyemask" used for finding 1
eye
Template matching is finding the minimum error between "windows" of the image
and the template. Alternatively, this is searching for a maxima of the
convolution of the image with the template. However, to avoid a bias towards
bright sections of the image, each window should be "mean removed" to insure
proper correlation.
Hough Transform
The Hough transform is transform that searches for maxima in a parametric
space. Thus, any "shapes" that can be expressed parametrically are well suited
to techniques using the hough transform. In our images, the pupils of the eye
formed nearly perfect circles. If we restrict the distance to the camera, this
also roughly fixed the radius of the pupil at 5 pixels.
Once we find the maxima in the parametric space, we perform the inverse hough
transform to determine where the original circles are in the image. Subsequent
processing can occur in these regions to further increase SNR.
Our implementation of the Hough transform for circular objects is based on this code developed at
the University of Minnesota by Dan Pou.
Results and Discussion
Neural Network
We attempted to use a matlab neural network face recognition
routine developed by Scott Sanner for CS223B. Our original idea was to
modify this routine to detect eyes. Though the previously noted literature by
Wierwille points to neural networks for pattern recognition as a promising
approach, this routine had severe difficulty actually detecting faces, most
likely due to insufficient or improper training. A selected result is shown for
the face detection implementation.
It doesn't seem to find a face at all! After playing with this method, we
dropped it to develop our own techniques from methods presented in class.
Correlation Methods
Correlation is much harder than it sounds. First, implementing a true
correlation function which removes the mean of each window was not done.
Performing correlation on the mean removed image and mean removed window is
quite tricky. Many spurious maxima are found in the image, including eyebrows,
hair, and nostrils.
To combat this, we used many "tricks" to try and zero in on the eyes
themselves. In one implementation, we first search for the maxima correlation of
the "facemask" to find the general area of eyes and nose. Then we search only in
that area for the eyes themselves. This has improved performance compared to a
global eye search over the whole image.
Another problem is determining what the "eye" and "face" templates should
look like. Different people have different eye shapes and sizes. In addition,
they will be closer and further away from the camera, changing the eye's
relative size compared to some arbitrary template. Moreover, different
illumination conditions will change the response of the pupils to IR light. This
makes template matching a hard problem indeed.
Correlation maps with different templates
Original Image
Correlation map with "facemask."
The large bright
center correctly indicates the location of the center of the
eyemask in the test image.
Correlation map with "eyemask."
Although local maxima
occur at the eye locations, the global maximum occurs in the lower
right section of the image, where the hair and background
intermix.
Below are video implementations on different video frames. A guesstimate puts
eye location accuracy at about 75%.
Selected results of convolution template matching.
These are
animated gifs that should loop--if they do not, click refresh on your
browser. Click on links below to see full .avi movies
Error checking can be done on video to insure that the proper eyes are being
found. Constraints such as "maximum" motion of eyes from frame to frame, last
position of eyes, distance between the eyes etc can reduce spurious eye
detections. The results are in "nathanprocessed.avi"
Here this correlation implementation seems to work quite well. The places
where the correlation does not map are places where the subject has blinked. In
fact, a simple blink counter (treating blinks are consecutive frames of unfound
eyes) correctly estimates the number of blinks at 3 for this sequence
Hough Transform
Hough
Transform
When attempting
circle detection with the Hough Transform, it is important to remember that it
the function is dependant on a black/white edge map (shown on the bottom left of
the animation). We used an edge detection function with varying thresholds to
find the pupils with the least amount of background noise. In the first example,
this thresholding had to be decreased to such a level that it also captured a
lot of background edges in the hair and ears. Unfortunately, when this edge map
is passed into the Hough detection function, it shows that there are many
potential circles of a radius of five pixels in the image (shown on the bottom
right of the animation). This Hough image is then morphologically thresholded
(using the erode and dissolve functions) such that the most likely circles are
distinguished from the noise (top right of animation) and the corresponding
coordinates are plotted to the original image.
Error Checking
The animation to the left uses the same processes mentioned above, except an
error-checking filter is also applied. This filter only passes potential eye
locations that are the correct distance apart (which we assume to be constant
within 10%). Other improvements added have included an angle analysis method
that weights potential eye locations on their angle with the horizontal. (This
is based on the assumption that the eyes will form an angle typically close to
zero.) Although not perfect, this seems to work with over 80% accuracy.
Conclusions
We had a lot of trouble using a neural network. The training of the network
seems unable to handle the various conditions inherent to this problem. Template
matching had improved results, but a similar (but less severe) training problem
occurs. In addition, in the presence of noise, various image features can
attract the template better than the actual feature itself, leading to spurious
measurements. Similarly with the Hough transform, spurious edges have similar
radii to the actual eye itself--causing false measurements. The addition of
error checking, such as constraints on head motion, eye to eye distance and head
angle, drastically improves the performance of both techniques.
References
Dan Pou, "Image Processing Homework 5", Hough Circle Transform.
http://www.ece.umn.edu/users/dpou/hw1-5.html Unniversity of Minnesota.
M. Yang, D. Kriegman, N. Ahuja, Detecting Faces in Images: A Survey,
Department of Computer Science and Beckman Institute Technical Monograph,
University of Illinois at Urbana-Champaign, Urbana IL, 61801
H. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1,
January, 1998, pp. 23-38.
Sanner, Scott, "CS223B Winter quarter, Final Project"
http://www.stanford.edu/~sanner/Vision/Project.html
M. Eriksson and N. Papanikotopoulos. Eye tracking for detection of driver
fatigue. In IEEE Conference on Intelligent Transportation Systems, pages
314319, 1997.
M. Funada, S. Ninomija, S. Suzuki, I. Idogawa, Y. Yazu, and H. Ide. On an
image processing of eye blinking to monitor awakening levels of human beings. In
18th Annual International Conference of the IEEE Engineering in Medicine and
Biology, volume 3, pages 966 967, 1996.
S. Kumakura. Apparatus for estimating the drowsiness level of a vehicle
driver. U.S. patent no. 5786765.
Hernandez-Gress, N. Driver drowsiness detection : past, present and
prospective work, N. Hernandez-Gress and D. Esteve. Traffic technology
international. June/July 1997
Katahara, Shunji. Driver drowsiness detection by eyelids movement from face
image, Shunji Katahara, Satoko Nara and Masayoshi Aoki (Seikei University).
World Congress on Intelligent Transport Systems (2nd : 1995 : Yokohama-shi,
Japan). Steps forward. Vol. 3. Tokyo, Japan : VERTIS, 1995.
Research on vehicle-based driver status/performance monitoring : development,
validation, and refinement of algorithms for detection of driver drowsiness,
W.W. Wierwille ... et al., Washington, DC, National Highway Traffic Safety
Administration, 1994.
Sherman, Peter J. The potential of steering wheel information to detect
driver drowsiness and associated lane departure, Peter J. Sherman, Michael
Elling, Monty Brekke. Ames, Iowa : Midwest Transportation Center, Iowa State
University, 1996.
Taoka, George T. Driver drowsiness and falling asleep at the wheel, George T.
Taoka. Transportation quarterly. Vol. 47, no. 4 (Oct. 1993)
Wierwille, Walter W. Development of improved algorithms for on-line detection
of driver drowsiness, Walter W. Wierwille, Stephen S. Wreggit, Ronald R.
Knipling. International Congress on Transportation Electronics (1994 : Dearborn,
Mich.). Leading change. Warrendale, PA : Society of Automotive Engineers, 1994.
Wierwille, Walter W. Evaluation of driver drowsiness by trained raters,
Walter W. Wierwille and Lynne A. Ellsworth. Accident analysis and prevention.
Vol. 26, no. 5 (Oct. 1994)
Code and Who did What
Dion wrote the convolution "template matching" code and some error checking
code. He also wrote many sections of the report, including the discussion of IR
light, the template matching sections, the theory of the hough transform
section, and the conclusions with Nathan. He also performed the painstaking task
of making sure the links worked (There must be a better way!) He also worked on
much of the error checking algorithms along with Nathan. His code
includes:
Nathan attempted to adapt the neural network functions to be
more responsive to drivers' faces. He started the research on IR cameras by
building an IR flashlight and discovered the unique IR signature of a face in
daylight. He also wrote the code which detects eyes using the Hough Transform
(with the help from some of Dion's error checking code). For the report, he
wrote the results and discussion section on the Hough Transform, handled the
processing of the avis and created the animated gifs.
Ben performed most of the literature search, built IR light sources that were
later replaced by a commercial camera (in the end, we used a Sony Digital
HandyCam in its "NightShot" mode, where the camera has an IR light source and
some sort of filtering for IR), took digital images and videos and worked (with
Dion and Nathan) on code to convert the images to Unix MATLAB- readable format.
He wrote the introduction to the report, debugged the HTML code, and edited the
entire report for clarity and style.