Face and Head Action Detection

Detection of face and head actions is done by locating feature points on the face and tracking them across consecutive frames in a video.

The concept of tracking feature points is based on the Facial Action Coding System published in 1978 by Paul Ekman, in which the contraction or relaxation of one or a group of muscles is described as an Action Unit. There are over 40 defined Action Units (AU), such as (AU6) cheek raiser and (AU12) lip corner puller, which are two units responsible for a smiling gesture.

Facial feature points being tracked are shown in the diagram. They are identified by comparing the initial frame of the video to a face template. The displacement of feature points from the initial position is recognized as motion or geometric changes in the system.

A change in colour (luminance) can also be recognized, for example, when whiter teeth is shown between red lips.

Example of Facial Action Detection from Feature Points

-- Detecting Mouth Actions:

The mouth is represented as polygon formed of 8 feature points (in blue), surrounding an anchor point (in yellow).

The polar angles of the feature points are tracked, with respect to the anchor point; and the displacement from the anchor point measured as a ratio to the initial position.

Lip Pull Lip Pucker

Lip pull and lip pucker actions are determined by the magnitude and direction of the change in displacement of the lip corners.

Lip pull:

Displacement between lip corners increase above a threshold value

Lip pucker:

Displacement between lip corners decreases above a threshold value

Lips Part Mouth Stretch Jaw Drop

A colour filter applied shows aperture of mouth as red, and teeth as green.

Lips part:

A small ratio of aperture to the overall mouth size, and lack of teeth

Mouth Stretch:

A small ratio of aperture to the overall mouth size, show of teeth

Jaws Drop:

A large ratio of aperture to the overall mouth size

Head Actions:

Out-of-plane motion of the head can skew the tracking of feature points, hence anchor points are used in an attempt to compensate.

An anchor point remains unskewed to head rotations along the three axes, and is normalised against the distance between the two eye corners to account for scale variations.

The diagram shows the inital position of the anchor point. It is between the two mouth corners when the mouth is at rest, and is at a distance d from the line joining the two inner eye corners l. In subsequent frames the point is measured at distance d from l, after accounting for head turns.

This model also allows the detection of head actions as follows:

Head action Detection

Head pitch (up or down) Vertical displacement of the nose tip

Head yaw (turn) Ratio of left to right eye widths

Head roll (tilt) Slope of the two inner eye corners

* All diagrams on this page (El Kaliouby & Robinson, 2004, 2005)

Detailed list of Action Units

Introduction

Facial Gesture Recognition