Using Speech Related Gestures to Aid Referential Communication in Face-to-face and Computer-Supported Collaborative Work.
Alison Newlands, Anthony Anderson, Avril Thomson, Bill Ion and Neil Dickson.
University of Strathclyde.
Department of Psychology;
Department of Design, Manufacture and Engineering ManagementÜ.


Advances in computer and telecommunications technology have permitted the development of computer-based tools that enable workers in geographically distributed locations to communicate and work together. These tools can take a variety of forms, but one generic pattern is a shared workspace in the form of a shared virtual whiteboard accompanied by communication media permitting textual, graphical, audio and visual communication links. These systems are frequently referred to as ‘computer-supported collaborative work’ (CSCW) tools. That used in the present study is typical of these systems in that it permitted real-time shared (e.g. alternating) use of the drawing tools, plus a view of one’s interlocutor in a small window within the screen. Given such computer-mediation of communication, questions naturally arise as to its efficacy: can designers successfully collaborate using such mediation? Does such mediation constrain designers’ dialogues in any way, rendering them less effective?

Some previous research has suggested that computer mediation of interaction might indeed constrain dialogues. For example, O’Conaill and Whittaker (1997) found that video-mediation results in the increased use of formal methods of turn-taking, such as using first names to indicate next speaker. The overall effect is to increase the degree of formality of the participants’ interactions. This could adversely affect the processes of grounding (i.e. achieving mutual understanding and establishing what is commonly known between participants), since the latter has been shown (see for example, Clark, 1996; Clark and Wilkes-Gibbs, 1986; Isaacs and Clark, 1987) to be a collaborative and highly interactive process. Increased formalisation of conversational contributions would potentially interfere with the natural interactivity of this process.

We therefore examined the use of these CSCW tools by pairs of participants who were either sitting side by side (face-to-face), or working remotely over a network using a video conference system ‘PictureTel 550’ (see figure 1, for illustration of CSCW set-up). Half of the design student participants were relative novices who had had only 2 months’ previous experience of AutoCAD, whilst the other half were more experienced (with two years’ training in various CAD tools). Both groups were further subdivided into pairs who worked in the video-mediated condition and pairs who worked side-by-side in a ‘face-to-face’ or copresent condition. We examined both task progress and the participants’ dialogues to ascertain whether there were any effects of video-mediation or expertise.

Method: Participants worked together to transform a 2D diagram of a trolley wheel bracket into a 3D diagram, using a standard, computer-aided design (CAD) tool. Two groups of University undergraduates took part in the study ‘novices’ (2nd year engineering students, after 2 months of training with AutoCAD) and ‘experts’ (4th Year engineering students with 2 years of training in a range of CAD tools). All participants were naïve users of the video conferencing system. A small group of Industrial expert users of CAD also completed the task, using a ‘think aloud’ protocol, and their views on the usefulness of this way of collaborating were sought via semi-structured interviews. In the CSCW condition participants could converse as if face-to-face; they can see each other via a small video window, and were provided with a duplex audio channel. The AutoCAD diagram is displayed in a second window, which can be manipulated or changed by either participant but only when they have control of the ‘mouse’.

Results: Analysis of the communication and joint problem-solving activities has been undertaken; including analysis of task performance, turn-taking management, content analysis and the strategies employed during referential communication. Overall the results indicate that the different communicative contexts (face-face versus CSCW) were associated with similar levels of task outcome. There were some differences in the process of communication as a function of expertise, and we are currently exploring these further. The apparent lack of an effect of communication medium on task outcomes is perhaps surprising in view of previous literature, which would indicate that turn-taking procedures could be disrupted in the CSCW context (e.g. Newlands et al. 1996; O’Conaill and Whittaker, 1997), a more in-depth analysis of the pragmatic functions of utterances of the dialogues is being undertaken to examine the process of communication in greater detail. This analysis has highlighted an important point, that participants used a lot of gesture during the task, especially during acts of reference; these gestures needed to be incorporated into the transcriptions before the dialogues became fully comprehensible.
Acts of Reference: Acts of reference play an important role in the design task, as participants need to be able to discuss and refer to different parts of a complex diagram in order to complete the task. One effective strategy is to use gesture as a way of pointing to referents, or to illustrate what is being said verbally. In face-to-face communication hand gestures are frequently used in this deictic manner, to point to objects or places. In video-mediated and CSCW contexts participants can rarely see each others hand movements, but they can use the on-screen cursor in a deictic manner, to point to parts of a diagram or draw their addressees’ attention to part of the visual display. These types of ‘mouse gestures’ appeared to occur frequently during the design task, but more so in some conditions than others. To determine which factors influenced this gestural behaviour the transcriptions of the dialogues were annotated to show where hand and computer mouse gestures were used. The majority of gestures were deictic, however a couple of iconic gestures were observed in the face-face and CSCW conditions. The data for the deictic gestures are given in Table 1, which shows the group mean frequency of each type of gesture (standard deviations are given in brackets).

Table 1. Mean number of Hand and Mouse Gestures in CSCW and Face-to-face interactions.

                        Hand Gestures                            Mouse Gestures
                  CSCW         Face-to-Face                      CSCW           Face-to-Face
Novices          10.57 (5.94)     49.29 (13.50)               31.57 (11.99)        23.29 (12.92)
Experts           9.43 (6.29)      52.86 (18.20)               10.71 (3,73)         19.00 (9.68)



All participants used deictic hand gestures for the purpose of referential communication, but as expected these occurred more frequently in the face-face condition; on average five times as many hand gestures in the face-face context compared to the CSCW context. Additionally, the frequency of use of mouse gestures varied between the novice and experienced participants. Novice users made greater use of mouse gestures than more experienced users regardless of communicative context, but this behaviour occurred even more frequently for novice users in the CSCW context. Examination of the annotated transcripts indicates that there may be some benefits for novice users of CAD to work in a CSCW environment. Some example segments of dialogue are considered below, to illustrate the variety of uses to which gestures are put. (Pauses are indicated with three dots, duration of gestures by vertical lines (to mark start and end) and underlined text).

The first example from two second year students (see below), illustrates the use of mouse gestures in which particular referents on the shared diagram are ‘pointed’ to using a circular motion of the cursor. In this particular case, mouse gestures are used no less than seven times within one utterance to deictically indicate particular referents within the diagram.

Person A: See that hole there, see that?
Person B: Uhm yeah right
Person A: |that’s that bit| … |that’s a top view| … right so that |hole there … is that| and … |that hole there is that| … so we’ve got a thing, |that’s this| … you |take that| … and that bit |there is the third view, taken from the side|.

The second utterance by speaker A is so heavily indexical that it makes little sense when considered in isolation from the gestures and the referents. Such speech does, however, have the virtue that whilst it relies heavily on the extra-linguistic context for interpretation, it minimises the need for speakers to produce and listeners to interpret technical terminology within the design field; given that these speakers are relative beginners, this facet of mouse gestures is one sensible way of reducing collaborative effort.
In the second example, one of the more expert students uses a mouse gesture to get the attention of his interlocutor:

Person A: what’s happened to that part at the bottom, with the blue bits?
Person B: I’ve highlighted it, you can do that if you choose different views, |see
these buttons at the top, they do that|

This gesture obviously serves an indexical function, but it also serves to draw the partner’s attention to some of the icons at the top of the screen which allow the user to chose different views of the drawing.
The third example involves two novices using mouse gestures first to draw attention to the cursor position and subsequently to refer indexically to a part on the jointly-visible drawing:

Person A: |You see where my cursor is?|
Person B: Yes
Person A: right, we’ll have to extrude |that part first|
Person B: Mhmm
Person A: and then add |on this part on top|
Person B: Uh huh, yeah.

Again this dialogue relies heavily on the extra-linguistic context for interpretability and makes little sense in isolation.

Summary: Our findings suggest that the users, particularly the less experienced second year students adapt to the situation in a sensible manner and use ‘mouse gestures’ strategically to assist grounding of acts of reference, rather than having to rely on their inexpert knowledge of the technical jargon involved in the task.

References:


Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
Clark, H.H., and Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22: 1-39.
Isaacs, E.A., and Clark, H.H. (1987). References in conversation between experts and novices. Journal of Experimental Psychology: General, 116: 26-37.
Newlands, A., Anderson, A.H., and Mullin, J. (1996). Dialog structure and cooperatiave task performance in two CSCW environments. In J.H. Connolly and L Pemberton (Eds.) Linguistic Concepts and Methods in CSCW (pp. 41-60). Springer-Verlag. London.
O’Conaill, B., and Whittaker, S. (1997). Characterizing, predicting, and measuring video-mediated communication: a conversational approach. In K.E. Finn, A.J. Sellen and S.B. Wilbur (Eds.) Video-Mediated Communication (pp. 107-131). NJ: Lawrence Erlbaum Associates.