2 RGB-Depth camera input analysis

Detecting Gesture using Kinect-like System

Another way to carry out gesture recognition is to use data

capture from RGB camera and depth sensor, as used in the

well-known Microsoft Kinect system.

Figure 1 : Microsoft Kinect

Motion History Image

The pixels of the subject will be captured from the depth sensor forming an image of an action the subject is expressing. The brighter pixels represent recently moving pixel. The sensor identifies the background as a silhouette, as it shouldn’t move, so its intensity should be zero (displayed in black).

Figure 2 : Example of motion history image

Extracting subject from background noise

In order to accurately segment detailed body parts like hands and

fingertips, image noise which causes ambiguity must be taken care of.

Pixels will go through decision process, Bayesian Decision, to be

determined if the pixel belongs to the background or the foreground.

Here small focus points (e.g. fingertips) are marked so that they

can be easily spotted by the sensors. This is vital as hand gestures

play a big part in emotion recognition though gesture.

Figure 3 : An example of

ambiguous RGB-D input

Dealing with the Data

The system now will calculate 2 values for each action to be used in emotion recognition.

- Arousal measures how stimulating an emotion is which can be seen from the swiftness of movements. This can be calculated from the motion history image from an action and the orientation of each pixel.

- Valence is the magnitude of arousal, the pleasure or displeasure (can be negative) from each emotion.

By mapping these and depth data into a cuboid graph we can see the correlation between the video input and emotion of the user.

Figure 4 : cuboid graph mapped from depth, valence, and arousal