Representing Meaning: Morphemic-level
analysis within a holistic approach to gesture transcription.
Fey Parrill and Eve Sweetser
University of Chicago/University of California, Berkeley
Abstract of some sort.
Keywords: transcription, mental spaces, conceptual integration, meaning1. Gestural
meaning and decomposition.
Researchers from a number of different areas have begun to apprehend the relevance
of the study of gesture to their disciplines. With this increase in interest
and desire for data exchange, the difficulty of designing a system of transcription
which will be amenable to the requirements of disparate methodologies becomes
apparent. As with signed languages, the spatial nature of gesture results in
a substantial increase in the number of features which must be represented,
but a speech-gesture transcription system must address both the inadequacies
of speech transcription and the complexities of transcribing motion. Transcription
is, of course, intended to represent a maximum amount of information as economically
as possible. This enterprise is generally carried out by devising mutually exhaustive
categories of features. Decomposing gesture into a series of features, however,
is problematic. Furthermore, without suggesting that economical representation
is not to be sought, we would argue that in some cases this approach is counterproductive.
The most obvious of such cases is where the mappings between the form and motion
of the gesture and the conceptual structure which underlies it are concerned,
or what may be called the meaning of a gesture.
Meaning is here assumed to be a cognitive construct built up during production
or interpretation (Lakoff, 1986; Lakoff & Johnson, 1980, 1999). One central
mechanism employed in the construction of gestural meaning is iconic mapping.
When we perceive a hand in a certain configuration and conclude that the hand
represents a physical entity being talked about in the accompanying discourse,
we are engaging in a cognitive process whereby features of the hands configuration
map onto features in our mental model of a particular referent. This mapping
is the result of preservation of structure: perceived resemblance comes from
the abstraction of correspondences between shared features (Taub, 2001). For
example, when speech about a ball is accompanied by a gesture with the speakers
hand in the shape of a fist, features of the ball --namely the outer curve and
inner solidity-- are preserved in the hand shape. Our knowledge of the shape
of a (prototypical) ball enables us to extract these features in order to fill
out a mental model whereby the hand can represent the ball, the motion of the
hand can be construed as the motion of the ball, and so on. Sarah Taub (2001)
gives this topic a detailed treatment in her discussion of conceptual metaphor
and iconicity in ASL. She begins by expanding Mandel's (1977) catalogue of sign-language
iconicity in order to describe the ways in which the correspondences we perceive
between the sign and its referent enable us to fill out a schematic mental model
of the scene. Some these constraints on iconicity are as follows.
In cases where the hands depict a physical entity, the shape of the hands may
map onto the shape of a physical referent (shape-for-shape mapping). Similarly,
the motion of the hands may map onto the motion of the referent (motion-for-motion
mapping). Motion can also be used to represent shape (path-for-shape), as when
the path of the hand traces the shape of a referent. A location in physical
space can represent a location in some abstract or metaphorical space, and so
on. Yet identity of content between language and gesture is always contextually
embedded. How do we know that when a person is talking about the sun and makes
a gesture with his or her hand in a fist that this fist represents the sun,
and in what respect is this the same phenomenon as when a fist striking a palm
represents complete understanding? We take the perspective that conceptual mappings
are as critical to the enterprise of transcription as information about the
physical form. This paper attempts to address some of the questions which confront
researchers working from this perspective. In particular, what sort of system
can be created which does not attempt to reduce meaning categories to concise
lists of mutually exclusive, maximally distinct features? We propose that this
enterprise can profit from the use of frameworks from within the cognitive linguistics
tradition, in particular the mental spaces framework (Fauconnier, 1994 [1985],
1997; Fauconnier & Sweetser, 1996) and that of blending or conceptual integration
(Fauconnier & Turner, 2002). The way to describe iconic/metaphorical/deictic
mappings as succinctly as possible, we claim, cannot be reduced to the physical
features of the produced gestures. As with most systems, before giving a succinct
description of something it is necessary to know (in principle, at least!) how
to describe it as fully as possible. And the meaningful interpretation of gestures
can be expressed concisely only against a background of complex cross-space
mappings. To some extent, gesture is about representing entities in mental spaces.
These entities are set up and manipulated through rapid transitions between
discourse spaces and iconic representations of entities in those spaces. The
fact that the mental spaces framework is a spatial representation of abstract
information entails that it is an invaluable resource in making mappings between
these entities and their referents explicit. The advantage of using it is that
it makes no claims about the "phonetic" transcription at all; any
kind of medium of meaningful expression could be used to represent content against
a contextual background describable in terms of mental space mappings.
In elaborating this model, let us return to the case in which the speaker is
using a hand in the shape of a fist to represent a ball. In the blending model
different mental spaces, or partially structured mental models (Fauconnier,
1994 [1985], 1997), act as inputs to the blended space. This framework was originally
developed to account for lexical ambiguities in reference, but when discussing
gesture one part of the picture is what Scott Liddell calls Real Space, or the
mental representation of the physical elements of ones immediate physical
environment (2000, 342). That is, where gesture is concerned, the physical
space becomes a resource for meaning construction. Liddell points out that a
speakers use of his or her physical space or surroundings to represent
some entity in the discourse creates a blend (Liddell calls this a grounded
blend, because it involves Real Space). Real Space acts as one input: a second
input is the mental space in which the ball is evoked (with its corresponding
properties of roundness, internal solidity, etc.). An important element of the
blend is the generic space, which contains shared features (perceived similarities)
that themselves permit cross-space mappings. This example can be depicted as
follows, using the conventions of the blending model:
Figure 1
One of the benefits of this model is that it provides an explanation for which
things become part of the mapping: in Real Space there will be an arm extending
out of the hand but this not mapped into the blend. The arm has no analogue
in this particular mental model of the ball therefore is not extracted as a
shared feature. The convention by which mental spaces are represented as circles
is, of course, too bulky for transcription of the hundreds of gestures which
can occur in even a brief sample. Even ignoring this aspect of the model, it
may nonetheless seem an unwieldy system for the expression of such simple relationships.
In reality, however, few gestures have such simple correspondences, and it is
in describing the mappings between elements in complex discourse that this framework
can be seen to be essential.
For instance, rather than representing a moving entity such as a ball, a hand
may be gesturing instead "about" the ongoing speech interaction: here
a "rolling" hand (circular motion in one location) might not correlate
with speech about a rolling object, but mark the speech segment as merely a
backgrounded part of a longer, ongoing discourse. Iconic mappings alone do not
seem sufficient to set up this correspondence: speech interaction does not literally
"move" from one place to another, "progress" or "occupy
a space" in a larger physical progression. We may "reach" the
end of a discussion without having physically moved at all. Taub (2001) lays
out for ASL signs the way in which a second layer of mappings, metaphorical
mappings, can be layered onto iconic mappings. In this case, we might say that
the rolling hand literally represents a "rolling in place" motion,
which is not moving forward in itself, although the rolling action could be
a moment of delay in the traversal of a path which includes the space filled
out by the rolling motion. Metaphorically, discourse structure is mapped onto
traversal of a path from source to goal, and background material which does
not in itself advance the argument structure is mapped onto delays along that
path. If we take AN ARGUMENT IS A JOURNEY (Lakoff & Johnson, 1985) as the
basic metaphor behind this example, the chart below (following the convention
of Taub, 2001) elaborates on some of the key double mappings:
Figure 2
These same mappings can also be represented through the formalism of the blending
model, as below. In this case, however, there are two blends. The first involves
the iconic mappings which permit us to perceive the hand as representing some
other moving entity, that is, the mappings from real space, in which the hand
performs some action, to the mental model of a moving object, which here is
the source domain of the metaphor. As with the previous example, the perceived
similarity between the two inputs--such as the fact that there exists some entity,
that it is in motion, that the motion is of a particular sort, and so on--is
captured by the generic space. What results is a blend, represented here as
blend A for convenience, in which the hand is perceived as a moving entity.
The second blend (blend B) incorporates the mappings between the source and
target domains of the metaphor. Both act as inputs to a further blended mental
space in which the movement of an object is construed as a discourse process.
Thus, the moving object maps onto what we have called a discourse entity, that
is, an idea or a proposition metaphorically conceived of as an object.
Figure 3
These blends can then be combined in order to represent the construal of the
moving hand as a discourse entity. In this way real entities (the hands) can
give us information about something as abstract as discourse.
Figure 4
One clear benefit of this approach is the fact that it offers an elaborated
treatment of the role of iconicity in representational gestures. While the classic
McNeill gesture categories (McNeill, 1992) were never intended to be mutually
exclusive, providing mappings forces the transcriber to be explicit about his
or her judgements. That is, if one decides that a gesture is deictic, being
obliged to transcribe mappings may prevent one from ignoring the fact that the
referent is some abstract, metaphoric entity.
Blending is proposed to be a cognitive operation by which partial structure
from input spaces is combined to yield emergent structure, thus one might question
the extent to which this static and metaphorical abstraction has anything to
do with actual cognitive structure. That is, do our brains perform an operation
anything like this in interpreting gestures or in producing a speech-gesture
package? At this point any comments on this topic are going to be entirely speculative,
but it does seem important to at least be explicit about the formalism, e.g.
the fact that the lines represent coactivation/binding.
2. Conceptual integration and pointing gestures:
Given that the mental spaces framework was developed to account for indirect
reference, one might expect it to be efficacious in representing deictic gestures,
which ostensibly pick out some referent. As a further illustration of the sort
of complexity which demonstrates this frameworks usefulness, consider
the following example, taken from the work of Haviland (2000). A Tzotzil speaker
says, "If one gets to Palenque, the ruins are located this way," and
looks and gestures southwards. The speaker and Haviland are located in a village
south of Palenque, so the ruins are north of the actual gesture space. But they
are south of the "transposed origo" of Palenque. A simple set of mental
space mappings will express the use of the speaker's location to represent a
location in Palenque - a location, we must note, which preserves the cardinal
directional alignment of the local space.
Speaker's location L in Nabenchauk = a location L' in Palenque
Direction of speaker's gesture from L = direction of deictic reference from
L'
The speaker goes on to say "like the (Nabenchauk) sinkhole," a local
geographical feature to the south of Nabenchauk. Here we add another mapping:
Location of sinkhole = location of Palenque ruins
Figure 5
Once these mappings are set up, it becomes clear that --since the mappings are
emphatically not identities-- a deictic gesture southwards is ambiguous, pointing
towards the Palenque ruins (quite far to the speaker's north) or to a nearby
sinkhole (actually to his south).
Crucially, it now becomes simpler to describe the gestures themselves. The speaker
really is making a 'simple point', correct though Haviland is in complicating
our understanding of simple points. What is complex is the framework of space
mappings against which the point is made. Need to say something about lamination
and blending Language does this all the time: Fauconnier (1994 [1985], 1997)
began constructing his theory of mental spaces precisely because of the observation
that linguistic reference is pervasively achieved by such "indirect"
means - speakers use a description or label appropriate to one referent, in
order to refer to some counterpart of that referent in another mental space.
For example, movie reviewers regularly say things like, At the end of the movie,
Clark Gable leaves Vivian Leigh, meaning simply that the character Rhett Butler
(played by Gable) leaves his wife Scarlett (played by Leigh). Needless to say,
the phrase Clark Gable leaves Vivian Leigh could also refer to the end of an
affair between the two actors themselves (or between two characters, in a movie
about Gable and Leigh). Only context allows us to know that it is the characters
played by Gable and Leigh in Gone with the Wind who are the ultimate referents
of the noun phrases Gable and Leigh.
Perhaps even more helpful in showing us the way to map gestures to meanings
is recent work on reference in American Sign Language. ASL is a visual-gestural
language, not a system of gestures accompanying language. But like co-linguistic
gesture, it makes use of the physical space surrounding the gesturer to refer
in a systematic manner to a multiplicity of other spaces, physical and abstract.
Karen van Hoek (1996 REF) and Scott Liddell (REFS 2000) have described ASL signers'
mappings of Real Space onto past and imagined situations. Liddell (2000) describes,
for example, a signer describing a Garfield cartoon who "becomes"
Garfield; the signer's surroundings are then mapped onto Garfield's fictional
surroundings (a virtual TV now exists in front of the signer, based on the presence
of a TV which Garfield is watching in the cartoon). Within these fictional surroundings,
the "cat" can point at the "TV", talk with "Jim",
and so on. Liddell insightfully analyzes this sequence as "blending"
the Real Space and the Garfield space, using Fauconnier and Turner's (2000)
mechanisms of mental space blending.
What is meant in essence here by blending is that in some crucial way the signer,
while remaining himself (a person we all know to be distinct from Garfield),
"becomes" Garfield via the mapping between the two spaces. This means
not only a simple referential mapping (location of signer = location of Garfield),
but also continuous active construal of the signer's movements and actions and
surroundings in terms of Garfield's movements, actions and surroundings. We
don't stop construing the signer's gestures as his own, either; narratorial
linguistic forms are produced sometimes simultaneously with body stances that
are intended to portray Garfield's bodily behavior.
Once again, as with the Gable and Leigh example, reference is "simple",
once you know what is construed as present to be referred to. But a physical
pointing gesture by the "Garfield" signer is potentially ambiguous
between a point at (for example) Garfield's TV and the desk in front of the
signer. That ambiguity is not particular to the gesture in question, but follows
automatically from the mental space construction underlying the exchange.
3. Discourse cohesion and mental space mappings.
One of the cases in which the mental spaces framework is most clearly useful
is in maintaining coherence over the space of a complex discourse. The following
example comes from an academic lecture. The speaker is talking about dynamic
programming and draws a parallel with reinforcement learning:
Okay...what happens is that... as you...the way reinforcement learning works
is is
you can think of reinforcement learning as dynamic programming done
badly ... okay...the way reinforcement <Xworking it doesX> 1 [you just
wander around idly ... okay...and then] 2 [when you bump into the reward you
know what to do] from the 3 [penultimate state]. You're not gonna have a policy,
you're clueless, but the penultimate state, you know what to do...and you know
how to <Xdiscount itX> right? 4 [Then you're wandering around idly a-again]
you might 5 [bump into that penultimate state] the state before that knows how
to do and you know how to <XdiscountX>it, right?
During 1 (you just wander around idly ... okay...and then), the speaker sets
up an axis moving outward from his body. This is the axis along which the process
of reinforcement learning proceeds, a process which the speaker describes as
wandering around idly (the analogy being to a rat in a maze). This is depicted
by moving the left hand (in a spread B hand) in a series of small circles outward
until the arm is fully extended. The right hand, also in a spread B, remains
a few inches from the chest. The right hand returns to the chest at do. This
example is particularly difficult to analyze using traditional systems because
of the mixture of language from the programming domain and language from the
learning domain. That is, there is no "penultimate state" in a maze,
although there is a location in space before the reward's location. Nor is there
any "wanderer" nor any notion of "idleness" (which presupposes
volition) in the programming domain, instead there is something more like a
circuit's particular electrical state at a given moment (that is, a given node
may be active at a certain time, which can be construed as an entity's location).
The conflation of these two domains can be unpacked, and some of the mappings
are as follows. It is also worth noting that the imagery for this gesture is
in place before the speaker has determined how to encode it in language. That
is, the positions for the reward and the penultimate state are already established
before they have been introduced in speech. This may well be a feature of this
sort of academic talk, which is relatively informal while still being well-rehearsed.
Fix.
Figure 6
Figure 7
A stage near then end of this gesture can be seen below.
Figure 8
At (2) the speaker moves his right hand away from his body until the arm is
fully extended again, but this time in a single smooth motion. This motion acts
to situate the reward at the end of the space defined by the arm's motion. The
mappings here are similar, but the space between the speakers right and
left hands now represents not the whole process of wandering, but the step between
the penultimate state and the reward. The motion is smooth rather than circular
because of this change in mapping. As mentioned above, one of the interesting
features of this example is the fact that the penultimate state comes from the
programming domain (there is no penultimate state in the maze) while the reward
comes from the maze domain (one does not reward a program). Here gesture continues
to be mainly about the program (although at this level of abstraction there
is no real difference, so it may not be a defensible claim) but the speech combines
both domains. The final frame of gesture 2 is below. It is very similar in appearance
to the end of the first sequence, but the mappings are different. Need to address
the question of whether or not motion maps.
Figure 9
Figure 10
Figure 11
Next (3) the speaker uses both hands to create a space in front of his body.
Ignoring for the moment the relation between the programming domain and the
reinforcement learning domain, some of the mappings would look like this:
Figure 12
Figure 13
Figure 14
In the fourth gesture the speaker says wandering around idly for the second
time while he waves his arms and assumes a bewildered facial expression. This
time, however, he assumes character viewpoint, or creates a blend in which he
"becomes" the wanderer, in the same sense as the speaker becomes Garfield
in the example above. In this case his Real Space body maps onto the body of
the wanderer who has been established in one of the mental space inputs to this
blend. The use of the phrase wandering around idly cues us to introduce an actual
wanderer into one of our mental spaces, who can then be profiled at another
level of granularity. In other words, when watching this gesture we don't wonder
why the speaker is suddenly flailing about: there is an already existing notion
of an entity wandering which we can access when the mappings are appropriate.
Figure 15
While this scenario is quite complex already, the speaker then complicates it
further by placing the penultimate state away from his body, with his arms extended
(5, pictured below). Again we see a difference in the element of the mappings
which are being profiled. In the original gestural representation, the penultimate
state is located next to the body: when it is first introduced into the discourse
it is viewed from the perspective of the wanderer who has suddenly stumbled
upon it (3). That is, in the first case we get a combination of observer viewpoint
in the motion of the right hand (wandering is depicted as an abstract process)
and in the left hand's functioning as a place holder (the penultimate state)
and character viewpoint in gesture 3, where the placement of the penultimate
state is next to the body. This constitutes "zooming in" on the mental
space in which the penultimate state is a physical location in a maze. In the
second case, however, we get character viewpoint (or what we would call a blend)
between the wanderer and the speaker (wandering depicted as wandering). The
penultimate state is then represented with observer viewpoint: the state is
where it should be if the speaker's body is a reference point for the axis of
motion, not where it should be if the speaker is the wanderer.
Some problems solved:
1. Space: what's the place of spaces which are set up to represent concepts,
things. Notation for space creation? This should fall out of a system of mappings/MS
transcription.
Circular motion: where is stroke in circular motion? Special type of "direction"?
How to treat and what does it mean anyway?
Gestures (made by the same person) which have the same physical form but different
meanings, i.e. where the mappings are quite different, and where doing explicit
mental space mappings helps to make this difference readily apparent.
How do physical units (beats, strokes
.) map onto meaning units????
4. Conclusion
The more an analyst examines gesture, the more surprising it becomes that people
effortlessly produce and process these complex sets of conceptual mappings between
spaces. The same, of course, has been said of language - and special built-in
hardware has been postulated to account for such amazing abilities. But most
linguists, including those who see language as a modular system, acknowledge
that linguistic signs are no more than the tip of the iceberg, prompts to the
evocation of a meaning which is very far from being determined by a particular
linguistic sequence. This is even more strongly true of gesture. The vast indeterminacy
of iconic mappings means that one particular sequence of hand configurations
could be iconic for many different possible literal physical referents; the
additional possibilities brought up by cross-space mappings ensure that the
referential possibilities are multiplied again. In general, access to accompanying
linguistic production ensures that the mapping possibilities are appropriately
constrained. And indeed, a gesture researcher who is watching a video in an
unknown language can usually "understand" some of the gestures if
told that the videotaped speaker is talking about, for instance, the past and
the future, or the space in back of her and the space in front of her. The researcher
would not, however, be able to figure out the topic on her own, or might well
guess wrong, picking a literal spatial topic when a metaphorical one was actually
in play.
We don't have a universally sufficient "-etic" gesture transcription
system; that is, even less than for language is it possible to imagine a transcription
which could substitute for the video as "data" for a researcher with
other purposes than the transcriber's. Where transcribers share some purposes,
they may well be able to share choices of which physical aspects of gesture
should be transcribed. But they all need to transcribe meaning as well. While
the mental spaces framework does not offer a quick and easy way of transcribing
what is for some the most fascinating part of the enterprise, it does offer
a convenient way of helping the analyst to organize her thoughts for the process.
References:
Fauconnier, G. (1994 [1985]). Mental Spaces. Cambridge: Cambridge University
Press.
Fauconnier, G. & Sweetser, E. (Eds.). (1996). Spaces, Worlds, and Grammar.
Chicago: University of Chicago Press.
Fauconnier, G. (1997). Mappings in Thought and Language. Cambridge: Cambridge
University Press.
Fauconnier, G. & Turner, M. (2002). The Way We Think: Conceptual Blending
and the Minds Hidden Complexities. New York: Basic Books.
Haviland, J. B. (2000). Pointing, gesture spaces, and mental maps. In D. McNeill
(Ed.). Language and Gesture (pp. 13-46). Cambridge, Cambridge University Press.
Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. Chicago: University
of Chicago Press.
Lakoff, G. (1986). Women, Fire and Dangerous Things. Chicago: University of
Chicago Press.
Lakoff, G. & Johnson, M. (1999). Philosophy in the Flesh: The Embodied Mind
and Its Challenge to Western Thought. New York: Basic Books.
Liddell, S. K. (1998).Grounded blends, gestures, and conceptual shifts. Cognitive
Linguistics, 9, 283-314.
Liddell, S. K. (2000). Blended spaces and deixis in sign language discourse.
In D. McNeill (Ed.). Language and Gesture (pp. 331-357). Cambridge, Cambridge
University Press.
Mandel, M. (1977). Iconic Devices in American Sign Language. In L. A. Friedman
(Ed.). On the Other Hand (pp. 57-107). London: Academic Press.
Taub, S. F. (2001). Language From the Body: Iconicity and Metaphor in American
Sign Language. Cambridge, Cambridge University Press. Different DEGREES of indeterminacy
- LeBaron and Streeck's comment that the "driving" hand gesture is
recognizable crosslinguistically because it's a truly unique motor routine -
UNLIKE up-down motion of hand,
Topographical preservation with metaphor