Issue boards:
| Setup:
| Maintainer:
Foreword
This project aims to enable robots to effectively control their gaze by implementing a biologically inspired gaze control system. The approach is grounded in neuroscience research, particularly in the study of visual attention mechanisms. The foundational concept originates from computational models of visual attention, as introduced in the seminal work:
Itti, L., Koch, C., & Niebur, E. (2002). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, 20(11), 1254-1259.
This paper proposed a model where attention is guided by a combination of bottom-up (stimulus-driven) and top-down (task-driven) processes. The idea that gaze selection emerges from the interplay of spontaneous exploration, salient stimuli, and task-specific objectives.
In this project, we implement a similar multi-layered system. Gaze targets are categorized into one of three layers:
- Random: This layer represents spontaneous gaze movements that are not influenced by external stimuli or specific tasks. It mimics natural exploratory behavior, allowing the robot to scan its environment without a predefined objective.
- Stimulus-based: This layer focuses on gaze targets that are driven by external stimuli. These targets are selected based on their saliency, such as bright colors, sudden movements, or other visually prominent features in the environment.
- Task-driven: This layer prioritizes gaze targets that are directly related to the robot's current task or objective. These targets are determined by the specific requirements of the task, ensuring that the robot's gaze aligns with its operational goals.
Additionally, within each layer, gaze targets are assigned both a priority and a duration. The priority determines the importance of a target relative to others within the same layer, while the duration specifies how long the target remains active. A central component, the GazeScheduler
, dynamically evaluates all available gaze targets and selects the one with the highest priority for execution. This ensures that the robot's gaze is always directed toward the most relevant target based on the current context.
Publications
This work is described in the following publications:
List of papers that build upon this work.
Terms and Concepts
Gaze targets
A gaze target consists of the following main properties:
- Name: A string representing the name of the gaze target.
- Position: A spatial position of the gaze target. It can both be specified in the global frame as well as relative to the robot or an object in the scene.
- Priority: It defines the layer (random, stimulus-based, or task-driven) and priority within this layer (scalar value, the higher the more important the gaze target) of the gaze target
- Duration: It specifies how long the gaze target should be active
Gaze scheduler
The GazeScheduler
is the central component responsible for managing and prioritizing gaze targets. It receives gaze targets from multiple sources through the memory system, and dynamically schedules them based on a strict priority system.
The selected gaze target will then be sent to a robot-specific gaze controller.
Gaze controller
The robot-specific gaze controller receives a gaze target as either a global position or a position relative to a robot node. It then uses an Inverse-Kinematics-based approach to control the robot's gaze towards the target.
Gaze Control Strategies:
The following strategies are currently implemented for controlling the robot's gaze:
-
Atan2: This strategy calculates the pitch and yaw angles required for the robot's head to focus on a target. It is suitable for robots whose head kinematics align with these two joints, allowing direct usage.
-
GazeIK: A general-purpose solution that leverages inverse kinematics (IK) based on a virtual linkage model. For more details, refer to the implementation in Simox.
-
Hemisphere: An extension of the
Atan2
strategy, specifically designed for a hemisphere joint configuration. In this setup, the two degrees of freedom control the pitch and roll of the robot's head. To maintain an upright head posture, roll is constrained to zero, and both degrees of freedom are controlled by the same value.
Structure of this Package
On the code level, this package contains the following libraries and components:
Libraries
Core libraries:
- The common library with shared utilities and definitions.
- The gaze_controller library for gaze control logic.
- The gaze_scheduler library for dynamic gaze target prioritization.
- The gaze_targets library for defining and managing gaze targets.
- The interfaces library for interface definitions.
Generating gaze targets:
- The target_provider library for providing gaze targets.
Client API to interface with GazeScheduler
:
- The client library for client-side functionality.
Access of functionality through skills:
- The skills library for skill implementations.
Components
Core components
- The component
view_memory
. - The component
gaze_scheduler
. It executes the gaze scheduling logic.
Gaze target providers
- The component
person_target_provider
. It provides gaze targets related to detected persons. - The component
handover_target_provider
. It provides gaze targets for handover scenarios.
Skills
- The component
view_selection_skill_provider
. It implements skills for view selection.
Examples
- The component
object_tracking_example
. It demonstrates object tracking functionality. - The component
scheduler_example
. It showcases an example of gaze scheduling.
Skills
view_selection_skill_provider
Implemented in the library skills
are the following skills, made available through the view_selection_skill_provider
Basic skills
Skill Name | Parameters | Description |
---|---|---|
LookAt | (ARON, header, ...) | Allow setting a gaze target. |
SetCustomGazeTarget | (None) | Allows setting a custom gaze target. TODO: check how it differs LookAt
|
ResetGazeTargets | (None) | Removes all gaze targets from the GazeScheduler . Only the idle target (e.g., looking ahead) will remain. |
Skills for predefined directions:
Skill Name | Parameters | Description |
---|---|---|
LookUp | (None) | Directs the robot's gaze upward. |
LookDown | (None) | Directs the robot's gaze downward. |
LookDownstraight | (None) | Directs the robot's gaze straight down. |
LookLeft | (None) | Directs the robot's gaze to the left. |
LookRight | (None) | Directs the robot's gaze to the right. |
LookAhead | (None) | Directs the robot's gaze straight ahead. |
Skills for scene exploration (looking around):
Skill Name | Parameters | Description |
---|---|---|
LookForObjects | (None) | |
LookForObjectsWithKinematicUnit | (None) | Searches for objects using the robot's kinematic unit. This skill is deprecated. |
ScanLocation | (None) | |
ScanLocationsForObject | (None) |
Focussing specific elements in the scene:
Skill Name | Parameters | Description |
---|---|---|
LookAtArticulatedObjectFrame | (None) | Directs the robot's gaze to a specific frame of an articulated object. |
LookAtObject | (None) | Directs the robot's gaze to a specific object. |
LookAtHumanFace | (None) | Directs the robot's gaze to a human's face. |
LookAtHumanHand | (None) | Directs the robot's gaze to a human's hand. |
Usage Example
The following examples show how the client API can be used to send gaze targets to the GazeScheduler
.
GazeScheduler client example
Code: component scheduler_example
.
It shows how different gaze targets are scheduled and executed by the robot.
Object tracking example
It queries the object memory for specific known objects and directs the robot's gaze toward them. The gaze dynamically adjusts as the objects' positions change, ensuring continuous focus on the targets.
Scenarios
How To
(prefer the doxygen documentation)
=> add link to howto's in documentation