Representing Meaning: Morphemic-level analysis within a holistic approach to gesture transcription.
Fey Parrill and Eve Sweetser
University of Chicago/University of California, Berkeley


Abstract of some sort.

Keywords: transcription, mental spaces, conceptual integration, meaning1. Gestural meaning and decomposition.

Researchers from a number of different areas have begun to apprehend the relevance of the study of gesture to their disciplines. With this increase in interest and desire for data exchange, the difficulty of designing a system of transcription which will be amenable to the requirements of disparate methodologies becomes apparent. As with signed languages, the spatial nature of gesture results in a substantial increase in the number of features which must be represented, but a speech-gesture transcription system must address both the inadequacies of speech transcription and the complexities of transcribing motion. Transcription is, of course, intended to represent a maximum amount of information as economically as possible. This enterprise is generally carried out by devising mutually exhaustive categories of features. Decomposing gesture into a series of features, however, is problematic. Furthermore, without suggesting that economical representation is not to be sought, we would argue that in some cases this approach is counterproductive. The most obvious of such cases is where the mappings between the form and motion of the gesture and the conceptual structure which underlies it are concerned, or what may be called the meaning of a gesture.

Meaning is here assumed to be a cognitive construct built up during production or interpretation (Lakoff, 1986; Lakoff & Johnson, 1980, 1999). One central mechanism employed in the construction of gestural meaning is iconic mapping. When we perceive a hand in a certain configuration and conclude that the hand represents a physical entity being talked about in the accompanying discourse, we are engaging in a cognitive process whereby features of the hand’s configuration map onto features in our mental model of a particular referent. This mapping is the result of preservation of structure: perceived resemblance comes from the abstraction of correspondences between shared features (Taub, 2001). For example, when speech about a ball is accompanied by a gesture with the speaker’s hand in the shape of a fist, features of the ball --namely the outer curve and inner solidity-- are preserved in the hand shape. Our knowledge of the shape of a (prototypical) ball enables us to extract these features in order to fill out a mental model whereby the hand can represent the ball, the motion of the hand can be construed as the motion of the ball, and so on. Sarah Taub (2001) gives this topic a detailed treatment in her discussion of conceptual metaphor and iconicity in ASL. She begins by expanding Mandel's (1977) catalogue of sign-language iconicity in order to describe the ways in which the correspondences we perceive between the sign and its referent enable us to fill out a schematic mental model of the scene. Some these constraints on iconicity are as follows.

In cases where the hands depict a physical entity, the shape of the hands may map onto the shape of a physical referent (shape-for-shape mapping). Similarly, the motion of the hands may map onto the motion of the referent (motion-for-motion mapping). Motion can also be used to represent shape (path-for-shape), as when the path of the hand traces the shape of a referent. A location in physical space can represent a location in some abstract or metaphorical space, and so on. Yet identity of content between language and gesture is always contextually embedded. How do we know that when a person is talking about the sun and makes a gesture with his or her hand in a fist that this fist represents the sun, and in what respect is this the same phenomenon as when a fist striking a palm represents complete understanding? We take the perspective that conceptual mappings are as critical to the enterprise of transcription as information about the physical form. This paper attempts to address some of the questions which confront researchers working from this perspective. In particular, what sort of system can be created which does not attempt to reduce meaning categories to concise lists of mutually exclusive, maximally distinct features? We propose that this enterprise can profit from the use of frameworks from within the cognitive linguistics tradition, in particular the mental spaces framework (Fauconnier, 1994 [1985], 1997; Fauconnier & Sweetser, 1996) and that of blending or conceptual integration (Fauconnier & Turner, 2002). The way to describe iconic/metaphorical/deictic mappings as succinctly as possible, we claim, cannot be reduced to the physical features of the produced gestures. As with most systems, before giving a succinct description of something it is necessary to know (in principle, at least!) how to describe it as fully as possible. And the meaningful interpretation of gestures can be expressed concisely only against a background of complex cross-space mappings. To some extent, gesture is about representing entities in mental spaces. These entities are set up and manipulated through rapid transitions between discourse spaces and iconic representations of entities in those spaces. The fact that the mental spaces framework is a spatial representation of abstract information entails that it is an invaluable resource in making mappings between these entities and their referents explicit. The advantage of using it is that it makes no claims about the "phonetic" transcription at all; any kind of medium of meaningful expression could be used to represent content against a contextual background describable in terms of mental space mappings.

In elaborating this model, let us return to the case in which the speaker is using a hand in the shape of a fist to represent a ball. In the blending model different mental spaces, or partially structured mental models (Fauconnier, 1994 [1985], 1997), act as inputs to the blended space. This framework was originally developed to account for lexical ambiguities in reference, but when discussing gesture one part of the picture is what Scott Liddell calls Real Space, or “the mental representation of the physical elements of one’s immediate physical environment” (2000, 342). That is, where gesture is concerned, the physical space becomes a resource for meaning construction. Liddell points out that a speaker’s use of his or her physical space or surroundings to represent some entity in the discourse creates a blend (Liddell calls this a grounded blend, because it involves Real Space). Real Space acts as one input: a second input is the mental space in which the ball is evoked (with its corresponding properties of roundness, internal solidity, etc.). An important element of the blend is the generic space, which contains shared features (perceived similarities) that themselves permit cross-space mappings. This example can be depicted as follows, using the conventions of the blending model:

Figure 1

One of the benefits of this model is that it provides an explanation for which things become part of the mapping: in Real Space there will be an arm extending out of the hand but this not mapped into the blend. The arm has no analogue in this particular mental model of the ball therefore is not extracted as a shared feature. The convention by which mental spaces are represented as circles is, of course, too bulky for transcription of the hundreds of gestures which can occur in even a brief sample. Even ignoring this aspect of the model, it may nonetheless seem an unwieldy system for the expression of such simple relationships. In reality, however, few gestures have such simple correspondences, and it is in describing the mappings between elements in complex discourse that this framework can be seen to be essential.

For instance, rather than representing a moving entity such as a ball, a hand may be gesturing instead "about" the ongoing speech interaction: here a "rolling" hand (circular motion in one location) might not correlate with speech about a rolling object, but mark the speech segment as merely a backgrounded part of a longer, ongoing discourse. Iconic mappings alone do not seem sufficient to set up this correspondence: speech interaction does not literally "move" from one place to another, "progress" or "occupy a space" in a larger physical progression. We may "reach" the end of a discussion without having physically moved at all. Taub (2001) lays out for ASL signs the way in which a second layer of mappings, metaphorical mappings, can be layered onto iconic mappings. In this case, we might say that the rolling hand literally represents a "rolling in place" motion, which is not moving forward in itself, although the rolling action could be a moment of delay in the traversal of a path which includes the space filled out by the rolling motion. Metaphorically, discourse structure is mapped onto traversal of a path from source to goal, and background material which does not in itself advance the argument structure is mapped onto delays along that path. If we take AN ARGUMENT IS A JOURNEY (Lakoff & Johnson, 1985) as the basic metaphor behind this example, the chart below (following the convention of Taub, 2001) elaborates on some of the key double mappings:

Figure 2

These same mappings can also be represented through the formalism of the blending model, as below. In this case, however, there are two blends. The first involves the iconic mappings which permit us to perceive the hand as representing some other moving entity, that is, the mappings from real space, in which the hand performs some action, to the mental model of a moving object, which here is the source domain of the metaphor. As with the previous example, the perceived similarity between the two inputs--such as the fact that there exists some entity, that it is in motion, that the motion is of a particular sort, and so on--is captured by the generic space. What results is a blend, represented here as blend A for convenience, in which the hand is perceived as a moving entity.

The second blend (blend B) incorporates the mappings between the source and target domains of the metaphor. Both act as inputs to a further blended mental space in which the movement of an object is construed as a discourse process. Thus, the moving object maps onto what we have called a discourse entity, that is, an idea or a proposition metaphorically conceived of as an object.

Figure 3


These blends can then be combined in order to represent the construal of the moving hand as a discourse entity. In this way real entities (the hands) can give us information about something as abstract as discourse.

Figure 4

One clear benefit of this approach is the fact that it offers an elaborated treatment of the role of iconicity in representational gestures. While the classic McNeill gesture categories (McNeill, 1992) were never intended to be mutually exclusive, providing mappings forces the transcriber to be explicit about his or her judgements. That is, if one decides that a gesture is deictic, being obliged to transcribe mappings may prevent one from ignoring the fact that the referent is some abstract, metaphoric entity.

Blending is proposed to be a cognitive operation by which partial structure from input spaces is combined to yield emergent structure, thus one might question the extent to which this static and metaphorical abstraction has anything to do with actual cognitive structure. That is, do our brains perform an operation anything like this in interpreting gestures or in producing a speech-gesture package? At this point any comments on this topic are going to be entirely speculative, but it does seem important to at least be explicit about the formalism, e.g. the fact that the lines represent coactivation/binding.


2. Conceptual integration and pointing gestures:

Given that the mental spaces framework was developed to account for indirect reference, one might expect it to be efficacious in representing deictic gestures, which ostensibly pick out some referent. As a further illustration of the sort of complexity which demonstrates this framework’s usefulness, consider the following example, taken from the work of Haviland (2000). A Tzotzil speaker says, "If one gets to Palenque, the ruins are located this way," and looks and gestures southwards. The speaker and Haviland are located in a village south of Palenque, so the ruins are north of the actual gesture space. But they are south of the "transposed origo" of Palenque. A simple set of mental space mappings will express the use of the speaker's location to represent a location in Palenque - a location, we must note, which preserves the cardinal directional alignment of the local space.

Speaker's location L in Nabenchauk = a location L' in Palenque
Direction of speaker's gesture from L = direction of deictic reference from L'
The speaker goes on to say "like the (Nabenchauk) sinkhole," a local geographical feature to the south of Nabenchauk. Here we add another mapping:
Location of sinkhole = location of Palenque ruins

Figure 5

Once these mappings are set up, it becomes clear that --since the mappings are emphatically not identities-- a deictic gesture southwards is ambiguous, pointing towards the Palenque ruins (quite far to the speaker's north) or to a nearby sinkhole (actually to his south).
Crucially, it now becomes simpler to describe the gestures themselves. The speaker really is making a 'simple point', correct though Haviland is in complicating our understanding of simple points. What is complex is the framework of space mappings against which the point is made. Need to say something about lamination and blending Language does this all the time: Fauconnier (1994 [1985], 1997) began constructing his theory of mental spaces precisely because of the observation that linguistic reference is pervasively achieved by such "indirect" means - speakers use a description or label appropriate to one referent, in order to refer to some counterpart of that referent in another mental space. For example, movie reviewers regularly say things like, At the end of the movie, Clark Gable leaves Vivian Leigh, meaning simply that the character Rhett Butler (played by Gable) leaves his wife Scarlett (played by Leigh). Needless to say, the phrase Clark Gable leaves Vivian Leigh could also refer to the end of an affair between the two actors themselves (or between two characters, in a movie about Gable and Leigh). Only context allows us to know that it is the characters played by Gable and Leigh in Gone with the Wind who are the ultimate referents of the noun phrases Gable and Leigh.

Perhaps even more helpful in showing us the way to map gestures to meanings is recent work on reference in American Sign Language. ASL is a visual-gestural language, not a system of gestures accompanying language. But like co-linguistic gesture, it makes use of the physical space surrounding the gesturer to refer in a systematic manner to a multiplicity of other spaces, physical and abstract. Karen van Hoek (1996 REF) and Scott Liddell (REFS 2000) have described ASL signers' mappings of Real Space onto past and imagined situations. Liddell (2000) describes, for example, a signer describing a Garfield cartoon who "becomes" Garfield; the signer's surroundings are then mapped onto Garfield's fictional surroundings (a virtual TV now exists in front of the signer, based on the presence of a TV which Garfield is watching in the cartoon). Within these fictional surroundings, the "cat" can point at the "TV", talk with "Jim", and so on. Liddell insightfully analyzes this sequence as "blending" the Real Space and the Garfield space, using Fauconnier and Turner's (2000) mechanisms of mental space blending.

What is meant in essence here by blending is that in some crucial way the signer, while remaining himself (a person we all know to be distinct from Garfield), "becomes" Garfield via the mapping between the two spaces. This means not only a simple referential mapping (location of signer = location of Garfield), but also continuous active construal of the signer's movements and actions and surroundings in terms of Garfield's movements, actions and surroundings. We don't stop construing the signer's gestures as his own, either; narratorial linguistic forms are produced sometimes simultaneously with body stances that are intended to portray Garfield's bodily behavior.

Once again, as with the Gable and Leigh example, reference is "simple", once you know what is construed as present to be referred to. But a physical pointing gesture by the "Garfield" signer is potentially ambiguous between a point at (for example) Garfield's TV and the desk in front of the signer. That ambiguity is not particular to the gesture in question, but follows automatically from the mental space construction underlying the exchange.


3. Discourse cohesion and mental space mappings.

One of the cases in which the mental spaces framework is most clearly useful is in maintaining coherence over the space of a complex discourse. The following example comes from an academic lecture. The speaker is talking about dynamic programming and draws a parallel with reinforcement learning:
Okay...what happens is that... as you...the way reinforcement learning works is is…you can think of reinforcement learning as dynamic programming done badly ... okay...the way reinforcement <Xworking it doesX> 1 [you just wander around idly ... okay...and then] 2 [when you bump into the reward you know what to do] from the 3 [penultimate state]. You're not gonna have a policy, you're clueless, but the penultimate state, you know what to do...and you know how to <Xdiscount itX> right? 4 [Then you're wandering around idly a-again] you might 5 [bump into that penultimate state] the state before that knows how to do and you know how to <XdiscountX>it, right?
During 1 (you just wander around idly ... okay...and then), the speaker sets up an axis moving outward from his body. This is the axis along which the process of reinforcement learning proceeds, a process which the speaker describes as wandering around idly (the analogy being to a rat in a maze). This is depicted by moving the left hand (in a spread B hand) in a series of small circles outward until the arm is fully extended. The right hand, also in a spread B, remains a few inches from the chest. The right hand returns to the chest at do. This example is particularly difficult to analyze using traditional systems because of the mixture of language from the programming domain and language from the learning domain. That is, there is no "penultimate state" in a maze, although there is a location in space before the reward's location. Nor is there any "wanderer" nor any notion of "idleness" (which presupposes volition) in the programming domain, instead there is something more like a circuit's particular electrical state at a given moment (that is, a given node may be active at a certain time, which can be construed as an entity's location). The conflation of these two domains can be unpacked, and some of the mappings are as follows. It is also worth noting that the imagery for this gesture is in place before the speaker has determined how to encode it in language. That is, the positions for the reward and the penultimate state are already established before they have been introduced in speech. This may well be a feature of this sort of academic talk, which is relatively informal while still being well-rehearsed. Fix.

Figure 6

Figure 7

A stage near then end of this gesture can be seen below.

Figure 8

At (2) the speaker moves his right hand away from his body until the arm is fully extended again, but this time in a single smooth motion. This motion acts to situate the reward at the end of the space defined by the arm's motion. The mappings here are similar, but the space between the speaker’s right and left hands now represents not the whole process of wandering, but the step between the penultimate state and the reward. The motion is smooth rather than circular because of this change in mapping. As mentioned above, one of the interesting features of this example is the fact that the penultimate state comes from the programming domain (there is no penultimate state in the maze) while the reward comes from the maze domain (one does not reward a program). Here gesture continues to be mainly about the program (although at this level of abstraction there is no real difference, so it may not be a defensible claim) but the speech combines both domains. The final frame of gesture 2 is below. It is very similar in appearance to the end of the first sequence, but the mappings are different. Need to address the question of whether or not motion maps.

Figure 9

Figure 10

Figure 11

Next (3) the speaker uses both hands to create a space in front of his body. Ignoring for the moment the relation between the programming domain and the reinforcement learning domain, some of the mappings would look like this:

Figure 12

Figure 13

Figure 14

In the fourth gesture the speaker says wandering around idly for the second time while he waves his arms and assumes a bewildered facial expression. This time, however, he assumes character viewpoint, or creates a blend in which he "becomes" the wanderer, in the same sense as the speaker becomes Garfield in the example above. In this case his Real Space body maps onto the body of the wanderer who has been established in one of the mental space inputs to this blend. The use of the phrase wandering around idly cues us to introduce an actual wanderer into one of our mental spaces, who can then be profiled at another level of granularity. In other words, when watching this gesture we don't wonder why the speaker is suddenly flailing about: there is an already existing notion of an entity wandering which we can access when the mappings are appropriate.

Figure 15

While this scenario is quite complex already, the speaker then complicates it further by placing the penultimate state away from his body, with his arms extended (5, pictured below). Again we see a difference in the element of the mappings which are being profiled. In the original gestural representation, the penultimate state is located next to the body: when it is first introduced into the discourse it is viewed from the perspective of the wanderer who has suddenly stumbled upon it (3). That is, in the first case we get a combination of observer viewpoint in the motion of the right hand (wandering is depicted as an abstract process) and in the left hand's functioning as a place holder (the penultimate state) and character viewpoint in gesture 3, where the placement of the penultimate state is next to the body. This constitutes "zooming in" on the mental space in which the penultimate state is a physical location in a maze. In the second case, however, we get character viewpoint (or what we would call a blend) between the wanderer and the speaker (wandering depicted as wandering). The penultimate state is then represented with observer viewpoint: the state is where it should be if the speaker's body is a reference point for the axis of motion, not where it should be if the speaker is the wanderer.

Some problems solved:
1. Space: what's the place of spaces which are set up to represent concepts, things. Notation for space creation? This should fall out of a system of mappings/MS transcription.
Circular motion: where is stroke in circular motion? Special type of "direction"? How to treat and what does it mean anyway?
Gestures (made by the same person) which have the same physical form but different meanings, i.e. where the mappings are quite different, and where doing explicit mental space mappings helps to make this difference readily apparent.
How do physical units (beats, strokes….) map onto meaning units????


4. Conclusion

The more an analyst examines gesture, the more surprising it becomes that people effortlessly produce and process these complex sets of conceptual mappings between spaces. The same, of course, has been said of language - and special built-in hardware has been postulated to account for such amazing abilities. But most linguists, including those who see language as a modular system, acknowledge that linguistic signs are no more than the tip of the iceberg, prompts to the evocation of a meaning which is very far from being determined by a particular linguistic sequence. This is even more strongly true of gesture. The vast indeterminacy of iconic mappings means that one particular sequence of hand configurations could be iconic for many different possible literal physical referents; the additional possibilities brought up by cross-space mappings ensure that the referential possibilities are multiplied again. In general, access to accompanying linguistic production ensures that the mapping possibilities are appropriately constrained. And indeed, a gesture researcher who is watching a video in an unknown language can usually "understand" some of the gestures if told that the videotaped speaker is talking about, for instance, the past and the future, or the space in back of her and the space in front of her. The researcher would not, however, be able to figure out the topic on her own, or might well guess wrong, picking a literal spatial topic when a metaphorical one was actually in play.

We don't have a universally sufficient "-etic" gesture transcription system; that is, even less than for language is it possible to imagine a transcription which could substitute for the video as "data" for a researcher with other purposes than the transcriber's. Where transcribers share some purposes, they may well be able to share choices of which physical aspects of gesture should be transcribed. But they all need to transcribe meaning as well. While the mental spaces framework does not offer a quick and easy way of transcribing what is for some the most fascinating part of the enterprise, it does offer a convenient way of helping the analyst to organize her thoughts for the process.


References:

Fauconnier, G. (1994 [1985]). Mental Spaces. Cambridge: Cambridge University Press.
Fauconnier, G. & Sweetser, E. (Eds.). (1996). Spaces, Worlds, and Grammar. Chicago: University of Chicago Press.
Fauconnier, G. (1997). Mappings in Thought and Language. Cambridge: Cambridge University Press.
Fauconnier, G. & Turner, M. (2002). The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. New York: Basic Books.
Haviland, J. B. (2000). Pointing, gesture spaces, and mental maps. In D. McNeill (Ed.). Language and Gesture (pp. 13-46). Cambridge, Cambridge University Press.
Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. Chicago: University of Chicago Press.
Lakoff, G. (1986). Women, Fire and Dangerous Things. Chicago: University of Chicago Press.
Lakoff, G. & Johnson, M. (1999). Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Thought. New York: Basic Books.
Liddell, S. K. (1998).Grounded blends, gestures, and conceptual shifts. Cognitive Linguistics, 9, 283-314.
Liddell, S. K. (2000). Blended spaces and deixis in sign language discourse. In D. McNeill (Ed.). Language and Gesture (pp. 331-357). Cambridge, Cambridge University Press.
Mandel, M. (1977). Iconic Devices in American Sign Language. In L. A. Friedman (Ed.). On the Other Hand (pp. 57-107). London: Academic Press.
Taub, S. F. (2001). Language From the Body: Iconicity and Metaphor in American Sign Language. Cambridge, Cambridge University Press. Different DEGREES of indeterminacy - LeBaron and Streeck's comment that the "driving" hand gesture is recognizable crosslinguistically because it's a truly unique motor routine - UNLIKE up-down motion of hand,…
Topographical preservation with metaphor