TY - THES T1 - Joint attention in spoken human-robot interaction A1 - Staudte,Maria Y1 - 2010/08/06 N2 - Gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream - speakers look at entities before mentioning them (Griffin, 2001; Meyer et al., 1998), while listeners look at objects as they are mentioned (Tanenhaus et al., 1995). Thus, a speaker's gaze to mentioned objects in a shared environment provides the listener with a cue to the speaker's focus of visual attention and potentially to an intended referent. The coordination of interlocutor's visual attention, in order to learn about the partner's goals and intentions, has been called joint attention (Moore and Dunham, 1995; Emery, 2000). By revealing the speakers communicative intentions, such attentional cues thus complement spoken language, facilitating grounding and sometimes disambiguating references (Hanna and Brennan, 2007). Previous research has shown that people readily attribute intentional states to non-humans as well, like animals, computers, or robots (Nass and Moon, 2000). Assuming that people indeed ascribe intentional states to a robot, joint attention may be a relevant component of human-robot interaction as well. It was the objective of this thesis to investigate the hypothesis that people jointly attend to objects looked at by a speaking robot and that human listeners use this visual information to infer the robot's communicative intentions. Five eye-tracking experiments in a spoken human-robot interaction setting were conducted and provide supporting evidence for this hypothesis. In these experiments, participants' eye movements and responses were recorded while they viewed videos of a robot that described and looked at objects in a scene. The congruency and alignment of robot gaze and the spoken references were manipulated in order to establish the relevance of such gaze cues for utterance comprehension in participants. Results suggest that people follow robot gaze to objects and infer referential intentions from it, causing both facilitation and disruption of reference resolution, depending on the match or mismatch between inferred intentions and the actual utterance. Specifically, we have shown in Experiments 1-3 that people assign attentional and intentional states to a robot, interpreting its gaze as cue to intended referents. This interpretation determined how people grounded spoken references in the scene, thus, influencing overall utterance comprehension as well as the production of verbal corrections in response to false robot utterances. In Experiments 4 and 5, we further manipulated temporal synchronization and linear alignment of robot gaze and speech and found that substantial temporal shifts of gaze relative to speech did not affect utterance comprehension while the order of visual and spoken referential cues did. These results show that people interpret gaze cues in the order they occur in and expect the retrieved referential intentions to be realized accordingly. Thus, our findings converge to the result that people establish joint attention with a robot. KW - Mensch-Maschine-Kommunikation KW - Sprachverstehen KW - Multimodales System KW - Aufmerksamkeit CY - Saarbrücken PB - Universitäts- und Landesbibliothek AD - Postfach 151141, 66041 Saarbrücken UR - http://scidok.sulb.uni-saarland.de/volltexte/2010/3242 ER -