A context-aware attention system is fundamental for regulating the robot behaviour in a social interaction. It enables social robots to autonomously select the right interactive target (human and non-human) at the right time in a multiparty social interaction. In addition, it controls the robot’s expressive behavior according to the target’s intention and modality. The system is a necessary part of the robot’s intelligence and allows the robot to successfully function in various social situations specifically in a collaboration-oriented task.
We have design a modular context-aware attention system which selects the environmental target and drives the robot behaviour in a multiparty social interaction. As shown in figure bellow, it is composed by two major modules: the scene analyser module and the attention module.
The Scene Analyser Module
This module is deputed to provide the robot with a human-like understanding of the surrounding environment. It consists of three different units: pre-processing, attentive feature extraction and meta-scene creation. The pre-processing unit employs a software layer in order to extract: a 2D map of the visual saliency of the scene (based on the FastSUN algorithm); social relevant features such as the subject’s distance and orientation, body shape, skeletal information and gesture (through the Microsoft Kinect SDK); facial expressions and subject’s name (through a facial analysis engine and a PCA engine). Some other features such as non-human target and the human body saliency cannot be directly inferred through pre-processing unit and are identified as following.
Non-human target: it is the most important point of the saliency map identified by the robot. It allows the robot to be attracted by environmental stimuli during a social interaction. To obtain it, the module performs a local spatial competition across the image and analyses the low-level features of each pixel. The pixel with the highest contrast of luminance, colour, and orientation will win the competition as non-human target.
Human body saliency: it is the average score of the saliency values in the area circumscribed by the body shape. It allows the robot’s attention to be influenced by parameters such as the colour of the clothes worn by the subject.
Finally, the scene analyser module creates a meta-scene object and stores all the low-level and social relevant extracted features of human and non-human targets with corresponding ID. The output is streamed out to the attention module.
The Attention Module
The core of the module is a computational model that calculates for each subject the total amount of the elicited attention considering both the saliency of low-level features and high-level human relevant features. For example, in the current version of the attention model, a subject who speaks or raises her/his hand is more important than a subject who smiles. The total elicited attention is assigned to each subject in order to select the winner among human and non-human targets through a competition. The winner is the target with the highest amount of EA which should be watched by the robot. The presence of the non-human target allows the robot to have dynamic gaze behaviour. For instance if all the subjects are not enough interesting, the robot will switch to an environmental target as same as human being does in a similar situation. The 3D position of the target is sent to the behaviour engine which controls the robot’s gaze in term of amplitude and velocity of head-eye movement on the base of human-like gaze model.