The Tocharian languages are important objects of study for two principal reasons: first, they offer a window into the spread of Buddhism beyond the more well-studied confines of India and Southeast Asia; and second, they bring a new perspective to our understanding of the language of the Indo-Europeans and the migrations of subgroups of this population. The latter point concerns us in this section.
The relation of Tocharian to the other Indo-European languages is not an incidental fact only of interest to historical linguists. This relation in fact has useful pedagogical implications. This can be seen by imagining the ways in which one might approach the learning of Tocharian. On the one hand since, on a synchronic level, at the time of the Tocharian documents the language seemed to be a relative isolate, we may study the language in isolation. From that point of view we are free to organize our description of the language in any manner whatsoever. But in doing so, the language will appear a fairly complex jumble of facts, given the number of nominal and verbal paradigms, and we will have little recourse to bring to bear in any coherent sense knowledge we may have gained from the study of other languages.
On the other hand, we may use a knowledge of Proto-Indo-European (PIE) as an organizing principle. A reader may wonder how, if he or she does not already know Proto-Indo-European, this could be a benefit at all. When a historical linguist speaks of "Proto-Indo-European", in the strictest sense this simply denotes a collection of information about the common traits of a large swath of languages spreading in archaic times from Iceland to India (and now China, as these lessons will help demonstrate). From this viewpoint, understanding some information about Proto-Indo-European shows one how knowledge of certain individual languages --- e.g. English, French, Spanish, German, Latin, Greek, Sanskrit --- can be imported systematically to the understanding of several additional languages. Specifically, by stating that Tocharian is like, or unlike, PIE in certain specific respects, we may thus view Tocharian against the backdrop of a large number of well-studied languages. In this way, for example, a Buddhist scholar who knows Sanskrit may not only use his or her knowledge of Sanskrit Buddhist vocabulary in learning Tocharian vocabulary; but he or she may also use Sanskrit grammatical structure as a point of comparison for the understanding of Tocharian grammatical structure. Certainly he or she will draw parallels between Tocharian and Sanskrit when they show obvious similarities, as anyone would do in learning a new subject of any sort. But the PIE perspective adds more: one finds out how certain features of PIE still extant in Sanskrit --- but not in Tocharian --- may have developed through a logical sequence of steps into what one actually finds in Tocharian. Thus even the differences between Tocharian and other languages like Sanskrit often become logical in a certain manner of speaking. Moreover, one finds that these similarities and differences are linked to a large number of other languages which the reader may go on to study in the future.
A specific example may help to illustrate the point. Consider the Tocharian verb AB āk- 'lead', whose present tense active forms we list below.
Note the seemingly haphazard alternation of the root-final consonant between -k- and -ś-. As given above, one must simply memorize which forms should have -ś-, with no apparent rhyme or reason. However, if we set beside the above forms the paradigm of the Greek root ag- 'lead', we find the following.
We see that Tocharian shows the original -k- precisely where the Greek ending shows an o-type vowel, and Tocharian shows -ś- precisely where the Greek ending shows an e-type vowel. The PIE perspective formalizes this relation, stipulating what features of PIE the Greek paradigm reflects, and how those PIE features propagated within Tocharian. If the reader does not happen to be familiar with Greek, this poses little problem; Indo-European languages are numerous, and Tocharian is related to all of them. By virtue of the fact that these lessons are in English, we can rest assured that the reader knows at least one (other) Indo-European language!
With this in mind, the organization of the present work will pay attention to how Tocharian developed out of PIE. Remarkably, this attention to the historical development of Tocharian from PIE not only serves as a useful mnemonic for, e.g., when root-final -k- shifts to -ś-. A second benefit derives from taking a slightly more "substantial" view of PIE, that is, by saying that PIE or something very similar was in fact a real language spoken by a real group of inhabitants of a certain area. Endowing PIE with substance in this way, and adding the postulate that (in the age before telecommunication) divergences in linguistic features correspond to increasing geographical isolation, the historical linguist may map linguistic changes onto rather coarse-grained routes of migration. That is, sometimes the way in which a language changes tells us about where its speakers have been. In this manner, when the predominantly Buddhist texts of Tocharian fail to tell us about where the Tocharians came from, the structures of the language can speak for themselves and reveal parts of their history otherwise lost to the dust.
Historical linguistics opened up as a scientific discipline with the recognition that sound changes are regular. That is to say, if a sound change occurs in a speech community, that change occurs in a fashion almost exclusively conditioned by phonetic environment, not by what particular word the sound happens to be found in. Thus for example, the change of ā to o that took Old English bān to Modern English bone affected not only the ā of the word bān, but all ās throughout the language. Hence hām also became home.
Given this regularity, that is, this ability to describe the "laws" of sound changes, the historical linguistic endeavor begins to resemble that of physics. In physics, the great minds discover by hook or by crook the rules by which to derive the final states of systems given knowledge of their initial states. Similarly in historical linguistics, and specifically historical phonology, one postulates an initial state -- the phonological system of Proto-Indo-European -- and applies rules of sound change to this initial state in order to arrive at the phonological systems of the daughter languages.
If one knows the sound rules, then one only needs the starting point to arrive at the end result. In historical linguistic terms, one calls the starting point (for the languages that concern us here) Proto-Indo-European. If one knows the rules that take the phonemes of PIE to, say, the phonemes of Latin, then all one needs is the starting point -- the phonological inventory of PIE -- to predict the phonological inventory of Latin. Of course, in practice the situation is a little more difficult, on the one hand because one has to work in reverse (e.g. we already know the phonological system of Latin), and on the other hand because neither the PIE phonological system nor the rules of derivation are known a priori. Thus one must hypothesize a PIE phonological system, hypothesize rules of derivation, and then change one or both in a give-and-take manner until one can cogently arrive at the phonological system of Latin with a minimum of exceptions to the rules. (This is not as ad hoc or "unscientific" as it may appear: after all, in physics one knows the outcome of experiments. Given that, one then has to reason or guess to figure out a law that explains those results. Only then may one proceed to try to predict new results using the law that explained the old results.) These rules are generally in the form of correspondences, stating that some specific PIE sound corresponds to an equally specific Latin sound (perhaps with the attendant necessity of specifying the surrounding environment in which one sound corresponds to the other). The following table shows an example of correspondences between PIE and Latin phonemes.
|*k||c||c||*g||g||g||*gʰ||h (f)||g (h)|
This says, for example, that the initial *gʰ- and the medial *-bʰ- of PIE *gʰabʰ- correspond respectively to the h- and -b- of Latin habēre 'have'.
One may ask: where does this get us? After all, we already knew the Latin phonological system! A simple example may illustrate the point. Grimm's Law is the well-known set of rules relating the PIE phonetic inventory to the Germanic phonetic inventory. Simply applying Grimm's Law to the PIE root *gʰabʰ- leads one eventually to Modern English give. Thus, Latin habēre 'have' is related to Modern English give, and not to the Modern English word have, though they look similar orthographically. The point: one infers that perhaps the original PIE root *gʰabʰ- must have originally contained semantic elements of both 'give' and 'have'. This correspondence and others among etymologically related words across the Indo-European languages allow historical linguists to reconstruct, not just a Proto-Indo-European language, but a Proto-Indo-European culture based on ties of reciprocal giving. Presented as such, this hypothesis would be nothing more than a vague speculation. But further investigation shows that cultural traits of just this sort remain preserved in the Greek literary heritage of Homer's Iliad and Odyssey. The impact is twofold: First, we may then see these traits of Homeric culture not as an isolated idiosyncracy, but as a traditional viewpoint maintained over the centuries; second, we gain insight into the Proto-Indo-European culture itself, a culture which left no written records whatsoever, merely by careful consideration of rules relating the sounds of the languages of the daughter cultures!
In the early period of historical linguistic investigation, familiarity with ancient languages like Latin, Greek, Sanskrit, and Gothic and with their modern relatives such as Italian, Modern Greek, Hindi, and German, led scholars to believe that the evolution of languages followed a general principle of phonological and morphological simplification. In postulating the phonological system of PIE, scholars thus posited the most ornate of the systems they had encountered, which happened to be that of Sanskrit. The various points of stop consonant articulation, such as the lips, then had a complete series of four phonemes: *p, *pʰ, *b, *bʰ. Similarly for the dental stops, palatals, and velars. Sanskrit was thus supposed to be the only language to keep the stops unchanged, while other languages simply lost one or other of the consonants.
With closer inspection of the relationships among the Indo-European languages, as well as the discovery of hitherto unknown members of the family such as Hittite, whose documents were older than those of any other Indo-European language but whose consonant inventory was far simpler than that of Sanskrit, scholars finally had to abandon the equation of the Sanskrit and PIE systems. In the common understanding of the PIE consonant inventory, there are no voiceless aspirates (e.g. *pʰ), but only voiceless non-aspirates (e.g. *p), voiced non-aspirates (*b), and voiced aspirates (*bʰ).
The current reconstruction of the PIE system of stop consonants is given in the following table.
The labiovelar consonants are similar to their velar counterparts, but with a simultaneous rounding of the lips. Examples of their remnants in daughter languages are Latin quid, Hittite kwit, Old English hwæt (this initial labiovelar is still preserved in some pronunciations of Modern English what, e.g. in midwestern dialects of the United States).
In truth, no ancient Indo-European language maintains all of the Indo-European stop consonants intact. One of the major changes that occurs between the period of PIE and that of the daughter languages is the merger of series (that is, the merger of columns in the preceding table). The principal merger is that between velars and palatals on the one hand, and between velars and labiovelars on the other hand. These mergers divide the IE languages into two groups, called the centum and satem groups. Specifically, the groups show mergers as follows.
The general situation is outlined in the following chart.
There are similar correspondences for the voiced non-aspirates, e.g. PIE *g, and the voiced aspirates, e.g. PIE *gʰ. The following chart provides a specific illustration of the above.
One may liken this process to a much later change which took place within the Romance language family, whereby the c- (pronounced [k]) of Latin centum became the c- (pronounced [s]) of Spanish cien. This shows a similar assibilation process to that found in the satem group, though the change in the Romance group occurred thousands of years later.
This merger of PIE series had the effect of giving all the ancient IE languages situated to the east a common linguistic trait, and similarly all those to the west a common trait. Specifically, the language families grouped as follows.
The centum and satem dialectal grouping thus correlated with a geographic grouping, much the way the use of y'all distinguishes Southern American English from other American English dialects.
The discovery of Tocharian, however, put an end to this neat correlation between linguistic and geographic affiliation. The Tocharian word for 'hundred' is A känt B kante, showing an initial velar k-. This squarely associates Tocharian with the other centum subfamilies. The latter lie in the west, however, while Tocharian alone among them appears in the east. Thus Tocharian's linguistic traits undercut the correlation between geography and the centum-satem division, in much the same way that y'all would fail to distinguish Southern American English from northern dialects if that pronoun suddenly were to become common in, say, Connecticut.
The following text is an excerpt in Tocharian A from the Buddhist Puṇyavanta-Jātaka. This text was initially published in Sieg and Siegling's Tocharische Sprachreste, Leipzig 1921. Somewhat later George Lane published the text and translation in English in the Journal of the American Oriental Society, Vol. 67, No. 1 (Jan. -- Mar. 1947), 33-53. Though much has been learned about Tocharian in the time since Lane's translation, the article still remains a very useful starting point, providing a translation that generally remains close to the original text.
As Lane points out, the Tocharian text does not provide a previously unknown text; the Puṇyavanta-Jātaka was already known to scholarship via Sanskrit texts. This is often the case with Tocharian manuscripts. What is revealing, however, is that as Sieg et al. and Lane note, the Tocharian version turns the style of composition on its head. Rather than giving prominence to the actual exploits of the story's protagonists (as is done in the Sanskrit version), the Tocharian version instead gives overwhelming prominence to the stories told before the protagonists start their journey. The Tocharian version also downplays some of the erotic overtones found in the Mahāvastu version, thus showing a different moral perspective. Such changes hold interest for the Buddhist scholar in delineating the fluidity of Buddhist doctrine within the different cultures in which it took hold. But they also hold interest for the Tocharian scholar inasmuch as they are clues to the cultural fabric of the Tocharian peoples themselves.
A linguistic study of the text shows features of a languages in transition. In particular we note in verse 10 the phrase poñcäṃ saṃsāris --- a noun in the genitive takes a modifier in the oblique. We will see in the course of the next few lessons how this demonstrates that the new inflectional cases being generated within the Tocharian language family are beginning to overtake the older inflectional cases inherited directly from the common parent of the Indo-European languages.
1 - kāsu ñom-klyu tsraṣiśśi śäk kälymentwaṃ sätkatär.
yärk ynāñmune nam poto tsraṣṣuneyā pukäṣ kälpnāl;
yuknāl ymāräk yäsluñcäs, kälpnāl ymāräk yātlune.
2 - tsraṣiśśi māk niṣpalntu tsraṣiśśi māk śkaṃ ṣñaṣṣeñ.
nämseñc yäsluṣ tsraṣisac, kunseñc yärkant tsraṣisac.
tsraṣiñ waste wrasaśśi, tsraṣiśśi mā praski naṣ.
3 - tämyo kāsu tsraṣṣune pukaṃ pruccamo ñi pälskaṃ.
4 - tsraṣṣuneyo tämne neṣ praṣtaṃ Siddʰārtʰes lānt se Sarvārtʰasiddʰe bodʰisattu sāmudraṃ kārp, ñemiṣiṃ praṅkā yeṣ.
5 - ñemintuyo ypic olyiyaṃ sārtʰ Jambudvipac pe yāmuräṣ, ṣpät koṃsā kñukac wraṃ kälk, ṣpät koṃsā pokenā kälk, ṣpät koṃsā lyomaṃ kälk.
6 - ṣpät koṃsā wälts pältwāyo oplāsyo wraṃ opläṣ oplā kārnmāṃ kälkoräṣ, päñ kursärwā ārṣlāsyo rarkusāṃ tkanā kälk.
7 - tmäṣ rākṣtsāśśi dvipaṃ yeṣ, tmäṣ yakṣāśśi, tmäṣ Baladvipaṃ yeṣ.
8 - tmäṣ śtwar-wäknā ārṣlāsyo rarkuñcäs iṣanäs kcäk. śtwar-wäknā speṣinäs kluṃtsäsyo sopis Sāgares lānt lāñci waṣt pāṣäntās śāwes empeles nākās āsuk kätkoräṣ, Sāgareṃ lāntäṣ cindāmaṇi wmār toriṃ kälpāt, poñcäṃ Jambudvipis ekrorñe wawik.
9 - ślak śkaṃ -- Ṣāmnernaṃ
māski kätkāläṃ ktäṅkeñc tsraṣiñ sāmuddrä,
traidʰātuk saṃsār tsraṣṣuneyo ktäṅkeñc kraṃś.
kälpnāntär toriṃ puttiśparäṃ wärṣṣältse.
mā=pärmāt tsru-yärm yātal yatsi tsraṣṣune.
10 - mā täpreṃ saṃ poñcäṃ saṃsāris kāripac sāspärtwu ālak wram naṣ kosne ālāsune.
11 - kyalte neṣ wrasaśśi sne-wāwleṣu sne-psäl klu śwātsi ṣeṣ, kalpavṛkṣäntwaṃ ārwar papyätkunt wsālu yetweyntu waṣlaṃ ṣeñc-äṃ.
12 - ālāsāp klu kropluneyā kalpavṛkṣäntu nakäntäm, kappāñ pākär tākaräm.
13 - sne-wāwleṣu sne-psäl klu naktäm, śāwaṃ wlesaṃtyo psälaśśäl pākär tākam.
cami ālāsuneyis nu tsraṣṣune pratipakṣ nāṃtsu. tämyo tsraṣṣune ñi ārkiśoṣyaṃ pukaṃ pruccamo pälskaṃ.
1 kāsu ñom-klyu tsraṣiśśi śäk kälymentwaṃ sätkatär.
yärk ynāñmune nam poto tsraṣṣuneyā pukäṣ kälpnāl;
yuknāl ymāräk yäsluñcäs, kälpnāl ymāräk yātlune.
2 tsraṣiśśi māk niṣpalntu tsraṣiśśi māk śkaṃ ṣñaṣṣeñ.
nämseñc yäsluṣ tsraṣisac, kunseñc yärkant tsraṣisac.
tsraṣiñ waste wrasaśśi, tsraṣiśśi mā praski naṣ.
3 tämyo kāsu tsraṣṣune pukaṃ pruccamo ñi pälskaṃ.
4 tsraṣṣuneyo tämne neṣ praṣtaṃ Siddʰārtʰes lānt se Sarvārtʰasiddʰe bodʰisattu sāmudraṃ kārp, ñemiṣiṃ praṅkā yeṣ. 5 ñemintuyo ypic olyiyaṃ sārtʰ Jambudvipac pe yāmuräṣ, ṣpät koṃsā kñukac wraṃ kälk, ṣpät koṃsā pokenā kälk, ṣpät koṃsā lyomaṃ kälk. 6 ṣpät koṃsā wälts pältwāyo oplāsyo wraṃ opläṣ oplā kārnmāṃ kälkoräṣ, päñ kursärwā ārṣlāsyo rarkusāṃ tkanā kälk. 7 tmäṣ rākṣtsāśśi dvipaṃ yeṣ, tmäṣ yakṣāśśi, tmäṣ Baladvipaṃ yeṣ. 8 tmäṣ śtwar-wäknā ārṣlāsyo rarkuñcäs iṣanäs kcäk. śtwar-wäknā speṣinäs kluṃtsäsyo sopis Sāgares lānt lāñci waṣt pāṣäntās śāwes empeles nākās āsuk kätkoräṣ, Sāgareṃ lāntäṣ cindāmaṇi wmār toriṃ kälpāt, poñcäṃ Jambudvipis ekrorñe wawik. 9 ślak śkaṃ -- Ṣāmnernaṃ
māski kätkāläṃ ktäṅkeñc tsraṣiñ sāmuddrä,
traidʰātuk saṃsār tsraṣṣuneyo ktäṅkeñc kraṃś.
kälpnāntär toriṃ puttiśparäṃ wärṣṣältse.
mā=pärmāt tsru-yärm yātal yatsi tsraṣṣune.
10 mā täpreṃ saṃ poñcäṃ saṃsāris kāripac sāspärtwu ālak wram naṣ kosne ālāsune. 11 kyalte neṣ wrasaśśi sne-wāwleṣu sne-psäl klu śwātsi ṣeṣ, kalpavṛkṣäntwaṃ ārwar papyätkunt wsālu yetweyntu waṣlaṃ ṣeñc-äṃ. 12 ālāsāp klu kropluneyā kalpavṛkṣäntu nakäntäm, kappāñ pākär tākaräm. 13 sne-wāwleṣu sne-psäl klu naktäm, śāwaṃ wlesaṃtyo psälaśśäl pākär tākam. cami ālāsuneyis nu tsraṣṣune pratipakṣ nāṃtsu. tämyo tsraṣṣune ñi ārkiśoṣyaṃ pukaṃ pruccamo pälskaṃ.
1 "The good fame of the strong spreads in the ten directions.
Reverence, respect, obeisance, (and) honor (are) to be attained through strength from everyone.
To be conquered quickly (are) enemies. To be obtained quickly (is) prosperity.
2 Of the strong (there are) great riches; of the strong (are) also many relatives.
Enemies bow down before the strong; to the strong come honors.
The strong (are) the protection of creatures; of the strong there is no fear.
3 Therefore strength (is) good (and) in every way the best (thing) in my opinion.
4 "By means of strength thus, at an earlier time, the son of king Siddhartha, the Bodhisattva Sarvarthasiddha descended upon the ocean. He went to the island of jewels. 5 With a caravan to Jambudvipa also having been made in a ship filled with jewels, for seven days he walked up to the neck in water; for seven days with the arms he walked; for seven days in mud he walked; 6 for seven days in water with lotuses with a thousand leaves, ascending from lotus to lotus he went; five leagues he walked though a place covered by snakes. 7 Thereupon he went to the island of the Raksasas, then to the island of the Yaksas, to Baladvipa, he went. 8 Thereupon he traversed the moats covered by four sorts of snakes. Nets with four sorts of Sphatika thread guarding the royal house of king Sagara, the great, awful Nagas having traversed completely, he obtained the Cintamani-stone, the precious, from king Sagara. Of all Jambudvipa the sickness he caused to disappear. 9 And so (in samner-meter):
"The ocean difficult to cross the strong cross.
The threefold world (of) existence by strength the good cross.
The superior obtain precious Buddhahood.
Strength is not capable of performing a disgrace (even) to a small degree.
10 "There is not another thing (which has) become (lit. turned) so for the injury of the entire world as (has) sloth. 11 For formerly of men without work (there) was chaffless rice to eat. In the kalpa-trees ready prepared for them to wear were clothing and ornaments. 12 The rice of the slothful (man) (to be had) by gathering and the kalpa-trees disappeared for them. Miseries (?) were plainly before them. 13 Without work (and) without chaff the rice disappeared for them. By great labor and with chaff a store of grain was for them. 14 Indeed of this, sloth being the opposite, therefore, strength (is) in the world in my opinion altogether the best thing."
The writing system of the Tocharian languages was not a wholly new creation, but an adaptation of a pre-existing script. The Tocharian scribes adopted a certain version of the north Indian Brāhmī script. This they subsequently modified to suit the requirements of their own language. A very small number of texts, however, are found in a Manichean script.
The Brahmi script is not purely alphabetic, nor purely syllabic. It is a system of so-called akṣaras. Each consonant has a separate, unique representation. The unmodified version, however, always represents the consonant followed by the default vowel a. Thus the symbol <p>, in the absence of other modifications, represents [pa] --- much as in English we write p, but say it is the letter 'pee'. But in English we would then have to write, say, the name Peter as p-ter, since we pronounce 'pee' whenever we see the symbol p.
To then tell the reader to pronounce the consonant with a different vowel, a certain symbol would be located above the <p> to denote [pi], a different symbol below to denote [pu], and so on. These symbols placed around the consonants to change the value of the following vowel are the bound forms of the vowels. Each vowel also had its own free form, generally used in word-initial position. The diphthongs ai and au had their own symbols, and were not written as the combination of their constituent elements.
The Indic languages are blessed with a wealth of stop consonants; the Tocharian languages, by contrast, lie impoverished in this regard. Tocharian thus had no need, in principle, to use symbols for the voiced aspirates such as gʰ, dʰ, bʰ; nor for the retroflex consonants such as ṭ, ḍ, ṇ. But this is not to say that Tocharian scribes did not employ them. The scribes were, in fact, often very faithful to the sounds and spellings of the Sanskrit words they borrowed. And as in other Buddhist traditions, so too the Tocharians borrowed a very large inventory of terminology directly from Sanskrit Buddhist texts. Thus the majority of Indic sounds have a graphical representation in some word or other in the Tocharian languages; but these are to be taken as the result of conservative tendencies in spelling and not necessarily as aids to the native Tocharian speaker in reproducing a faithful pronunciation -- much the same as our own tendency in English to keep the long silent gh of words like through.
Hard as it is to believe, there are in fact some sounds in Tocharian which are not to be found in Sanskrit. In particular, there is the voiceless dental affricate ts. This was in fact written by the Tocharians with a ligature of the characters representing t and s, but the sound itself is a single phoneme in Tocharian. Tocharian also possesses a reduced high central vowel denoted ä, since its representation in the Tocharian script involved the placements of two dots above the character for a. Take care to remember, however, that this is merely a convention of scholarly transcription, and it does not represent the German sound ä in words such as Mädchen. The phonetic value was probably closest to the IPA [ɨ].
One peculiar feature of Tocharian is that some vowels -- generally i, u, and ä -- could lose their syllabic content in open syllables. When this occurred, the Tocharian scribes would combine the preceding consonant with the following consonant, and write the non-syllabic vowel above the vowel which properly belonged to the following consonant. For example, phonemic /kuse/ was evidently pronounced with a reduced vowel as [kʷse], and this latter was represented in writing as <ksue>. Modern scholars generally transcribe this as kuse.
Relative to a language like Sanskrit, and even to Proto-Indo-European itself, the phonological system of Tocharian is quite simple. Both Tocharian languages have almost identical phonological systems. In particular, they have the same consonant inventory.
The only voiced consonants are the resonants (liquids and nasals) and glides; all stops, affricates, and sibilants are voiceless. To what degree this classification is actually phonetic, and not merely phonemic, is difficult to say. For example, the stops of native Tocharian words are generally not written with the characters corresponding to the Sanskrit aspirates. But this is no guarantee that the stops themselves did not have some degree of aspiration, much like the p in English pot. We can only be relatively certain that the distinction between aspirate and non-aspirate was not important in Tocharian.
One important distinction is that between palatal and non-palatal consonants. As is clear in the chart, the Tocharian languages have a large palatal inventory, and the distinction between e.g. l and ly is phonemic. This alternation is largely a result of historical processes which will be discussed elsewhere in these lessons.
All consonants can be single (e.g. ṣ) or doubled (e.g. ṣṣ), possibly denoting a difference in consonant length. Compare, for example, the distinction between [n] in English pennant and [nn] in English penknife. Doubled consonants are rare, however, in Tocharian A. There is evidence that consonant doubling might not (always) denote consonant length: it appears that ll is a frequent spelling for the single palatal consonant ly. From this and other alternations, it seems likely that doubling consonants is a typical manner of denoting palatalization.
The two Tocharian languages have a common inventory of simple vowels. Their transcription and probable phonetic values are given in the chart below.
There is no certain evidence that the Tocharian languages had phonemic vocalic length. Rather, all vowels are phonemically short in both languages. It is important to note in this regard that the symbol ā is merely a convention of transcription -- it does not denote a long vowel, but rather an open, low, central unrounded vowel.
The phonetic value of ä is also poorly understood. Some evidence points to it being a front, mid vowel, though likely very weakly articulated. ä is often found where it is not etymologically expected, being the vowel generally employed to break up difficult consonant clusters.
By the time of the documented Tocharian languages, diphthongs remain only in Tocharian B.
The diphthongs are falling diphthongs formed the addition of one of the semivowels y or w. That is, the first vowel of the diphthong carries the syllabic content (and therefore can carry stress in a stressed syllable), while the second element changes the off-glide or release. The diphthongs written with the simple vowel e appear to have had the vocalism of a diphthong with nucleus a: one finds variations <ey> alternating with <ai>, <eu, eu, euw> alternating with au. Likewise <oi> alternates with oy. The diphthongs which existed in Proto-Tocharian were monophthongized in Tocharian A.
Some spellings indicate possible allophonic variation of consonants, that is, variation in the actual phonetic realization of a sound, but which nevertheless does not change the meaning of the form. Though the stops were generally voiceless in word-initial or word-final position, there is some evidence that stops were voiced in certain other environments. For example, the occasional writing of ṅ for ṅk suggests that -k was voiced to [g] in this position. It also seems stops were generally voiced between vowels, or after a consonant but before a vowel. Doubling of stop consonants evidently denotes a voiceless consonant which would otherwise have been voiced between vowels: for example, nätk- 'push' has present stem nättäk-, suggesting that the t remained voiceless between vowels. A summary of the possible allophones of the stop phonemes in various environments is given in the table below.
|#_ or _#||V_V or C_V||N_|
The character used for w seems at times to represent the voiced bilabial fricative [β]. The v of Sanskrit, which frequently had a fricative pronunciation, is typically rendered by Tocharian scribes as v, emulating Sanskrit spelling, or as w (e.g. aviś or awiś from Skt. avīci). As the table shows, Tocharian p seems also at times to represent [β]. This might explain such spelling alternations as B cpi for more usual cwi 'his'.
Given the preceding discussion, it is possible to estimate the actual phonetic value of the Tocharian sounds. These are given in the table below. These can only be approximate at best and are certainly open to revision as the Tocharian languages become better understood.
The Tocharian letters are given in their dictionary order. This order is essentially the same as that of Sanskrit, with minor modifications.
It is not clear what pronunciation should be assigned to the Sanskrit sounds not contained in the Tocharian phonetic inventory (the lack of a clear approximation is denoted by "--" above). One possibile resolution is to remain faithful to the pronunciation of Classical Sanskrit. Given the changes, however, that some Sanskrit words undergo in their adoption by Tocharian speakers, this is not likely the manner in which any but the most learned Tocharian speakers pronounced these sounds. A more plausible scenario is that, though the writing may have remained faithful to the Sanskrit, the pronunciation was adapted to the available Tocharian phonological inventory.
The Sanskrit anusvāra is also employed in Tocharian writing. This is a raised dot placed over the syllable, representing in Sanskrit either a bilabial nasal (m) in word-final position, or a nasal homorganic with (having the same point of articulation as) the following consonant. As with Sanskrit, the anusvāra is transcribed as ṃ. In Tocharian, however, the majority of instances point to a single pronunciation as a dental nasal n.
The accent of words in Tocharian is complex and it is difficult to state a simple, overarching rule governing its placement. The vocalic alternations of Tocharian B lead scholars to believe that
This statement may not hold, however, for Tocharian A. It seems that if the second syllable contained a non-high vowel, the accent was retracted leftward.
Less is known about overall patterns of accent governing phrases and clauses. Particles generally lacked any sort of accent, and some evidence suggests that this may also be true for monosyllabic verb forms. It is unclear if the same is true for polysyllabic verb forms.
The phonetic development from Proto-Indo-European to the documented Tocharian languages involves various stages in which palatalization affected certain consonants. That is, due for example to a following vowel having an articulation with the blade of the tongue near the mouth's palate, the articulation of a preceding consonant might itself shift toward the palate. When different forms of the same word involve different vowels, some palatal and some not, this can lead to a concomitant alternation between palatalized and non-palatalized variants of the neighboring consonants. Such palatalizing processes occur both during and after the Proto-Tocharian period, and leave behind a system of somewhat regular correspondences between consonant phonemes in the daughter languages. The following chart lists the major correspondences between such palatalized and non-palatalized single consonants and clusters.
|Liquid||l||l' <ly, ll>|
|tk||A ck B cc|
|Sibilant||st||B śc > ś(ś)|
|A ṣt||A śś|
|ṅk||A ñś B ñc|
We will discuss palatalization further in the context of the historical phonology of the consonants.
Tocharian maintains the three grammatical genders masculine, neuter, and feminine. As with many modern languages, these grammatical genders are distinct from the notion of biological gender, unless the particular noun represents something animate (in which case the grammatical and biological genders are often the same, but not always). Rather, grammatical gender serves as a marker for grammatical agreement, so that the listener knows that a noun and modifier (adjective) are associated if they both have, e.g., masculine grammatical gender. If they have differing genders, then the one does not modify the other. The Tocharian neuter grammatical gender however only survives as a separate category in the pronouns. This very much parallels the situation in modern Spanish. In that language nouns and adjectives show two grammatical genders (Sp. nouns el tablero m. 'the chalkboard' and la mesa f. 'the table'; adjectives bueno m. and buena f. 'good'). The pronouns by contrast show not only masculine (Sp. él 'he' or 'that (man who...)') and feminine (Sp. la 'she' or 'that (woman who...)'), but also a neuter (Sp. lo 'the (thing)' or 'that (thing which...)').
Among the nouns, the phonological changes in Tocharian led to the neuter endings converging with the masculine endings in the singular, and with the feminine endings in the plural, resulting in nouns which display a combination of the masculine and feminine endings. PIE masculine nouns of the type of Latin dolus (accusative dolum) merged in the singular with neuter nouns of the type of Latin donum (acc. donum), and also with neuter nouns of the type of Latin genus (acc. genus).
By contrast, in the plural the PIE neuter ending *-H₂ fell together in PToch. with the feminine plural suffix *-H₂-es.
Such nouns, with masculine endings in the singular and feminine endings in the plural, are said to have alternating gender.
Tocharian has kept the PIE categories of singular (one of a thing), dual (two of a thing), and plural (more than two of a thing). It has also innovated in creating the paral, which is a sort of dual used to denote naturally occurring pairs, e.g. (two) hands, (two) eyes. The forms derive from the original inherited dual of PIE, with the additional suffix *-nō. For example, A aśäṃ, B eśane 'both eyes'. Tocharian B has a further innovation called the plurative, which employs the ending -aiwenta (from the plural of PIE *oi-wo- 'one') to express 'one at a time, individually'.
In the earliest stages of Proto-Indo-European there appears to have been a basic distinction between animate and inanimate among the substantives. That is, substantives denoting animate beings shared one type of morphology, characterized in part by the fact that the nominative ending differed from the accusative. By contrast, substantives denoting inanimate things shared a different type of morphology, generally characterized by the fact that the nominative and accusative forms were identical.
By the time of the documented languages, however, this system was almost everywhere restructured into a ternary system of masculine, feminine, and neuter genders -- Hittite being the most notable exception. These genders were grammatical, in the sense that, for example, not all nouns denoting males had masculine endings. The Latin noun nauta 'sailor' furnishes a ready example, taking the feminine ending -a even though generally denoting a male.
The Tocharian languages show a different restructuring. Though there are, as mentioned elsewhere, still clear remnants of the three-way gender distinction of the majority of the IE family, Tocharian noun classification shows another overarching structure. Tocharian consistently distinguishes substantival morphology on the basis of whether a noun is human or non-human. Most succinctly,
Thus 'dog', though animate, will not take the oblique singular ending -ṃ, since it is not human. This ending has not only morphological value, but semantic value as well.
Adjectives display two genders, masculine or feminine, unless they are used as substantives (see below). Substantives with masculine grammatical gender take the masculine adjective endings, feminine substantives take the feminine endings. There is a special class of nouns, however, that takes masculine adjective endings in the singular and feminine adjective endings in the plural. These substantives are those with the so-called alternating grammatical gender.
An adjective may be used as a neuter substantive, parallel to the process in English whereby good becomes the good (thing) -- e.g. You have to take the good with the bad -- or even more akin to how Spanish bueno becomes lo bueno. The Tocharian neuter substantive is for the most part formally indistinguishable from the masculine, a natural result of the collapse of the PIE masculine and neuter into identical forms. The only difference in Tocharian is that the new substantive now, like any PIE neuter, has identical nominative and oblique (old accusative). Thus what was the masculine nominative form is in this substantive now the nominative and oblique. Such neuter substantives are only found in the singular.
Tocharian verbs, as with the majority of the Indo-European languages, distinguish three persons: first ('I, we'), second ('you/thou, you (all)'), and third ('he, she, it, they') person. The first person refers to the speaker, perhaps with companions. The second person refers to the addressee. The third person refers to participants or referents outside of the first and second person.
Tocharian verbs further distinguish three numbers: singular, dual, and plural. The singular and plural distinguish between one and more-than-one; the dual specifically signifies two. There is no complete verbal paradigm with dual forms for all persons; the dual is a highly restricted verbal category, usually occurring in the third person (e.g. B nesteṃ 'they two are'), and whose forms have no sure connection to the dual forms of Proto-Indo-European itself. Thus, in general, Tocharian verbs distiguish singular from plural: B nesau 'I am' vs. nesem 'we are'; B nest 'thou art' vs. nescer 'you (all) are'; B nesäṃ 'he/she/it is' vs. nesäṃ 'they are'.
In a concrete sense, the distinction of voice is found in the contrast of the two English sentences 'The dog bit the man' and 'The man was bitten by the dog'. In the former, the dog is the grammatical subject, and also the one doing the action of biting. In the latter, the man is now the grammatical subject, though the dog is still doing the action of biting. We say the sentence 'The dog bit the man' is active, because the grammatical subject (the dog) is also the agent (the one doing the action). The second sentence, 'The man was bitten by the dog', is passive, because the agent is not the grammatical subject; instead, the patient (the one being acted upon) is instead the grammatical subject. Active and passive are two types of voice.
There is a third voice relevant to many Indo-European languages, one which is neither active nor passive, but combines some connotations of both. For this reason it is known as the middle voice, or, because especially in Greek the passive and middle forms largely overlap, as the mediopassive. In order to get some idea of how this functions, consider the following English example sentences:
Sentence (1a) is clearly active: the grammatical subject, the man, is also the agent (the one doing the soaking). Sentence (1b) is clearly passive: the grammatical subject, the towel, is not the agent but the patient (the one receiving the action). In sentence (1c), however, the grammatical subject is the agent (the man is doing the soaking) and also the patient (the man is also the one getting soaked). We say that sentence (1c) is middle or mediopassive.
In the example above, sentence (1c) is mediopassive by virtue of the fact that it is reflexive. That is, by explicit use of the word himself, we have equated the agent and patient. However, this is not always necessary:
In sentence (2a) the towel is the agent and subject of the active sentence; in sentence (2b), the towel is the patient and subject of a passive sentence. And in sentence (2c) the towel is the subject, agent and patient, this time not by virtue of the insertion of a reflexive pronoun, but by something the English verb soak allows.
The sentence (2c) is not very different in sense from sentence (2b) if we delete the agentive phrase by the man. It may therefore come as no surprise to find that many languages which morphologically mark passive and middle verb forms often use the same forms for both. Classical Greek falls into this category, so that pʰérō 'I carry' is active, while pʰéromai is both middle ('I carry for myself') and passive ('I am carried'). Some middle and passive forms do nevertheless differ: lūsámenos (masc. nom. sg.) 'having freed (for himself)' is the aorist middle participle, while lūtʰeís (masc. nom. sg.) 'having been freed' is the aorist passive participle.
Some languages, furthermore, do away with a separate passive formation altogether, and simply make do with a basic distinction between active and middle. Tocharian falls into this class of languages. There is no morphological passive; when passive statements are intended, middle forms are used. This may in fact be a late development, a usage borrowed on the model of Sanskrit. In short, Tocharian has only a morphological active and mediopassive, with no separate morphological forms that are strictly passive.
Some verbs are (middle) deponents, that is, verbs which occur with exclusively middle forms. For example, all finite forms of AB trik- 'be confused' are middle: A trikatär B triketär 'is confused', where -tär is the 3rd person singular middle ending; forms with the active endings A -ṣ B -ṃ do not occur. Some such deponents occasionally have an active present participle form, e.g. trik- forms the present participle A trikant. But it must be kept in mind that, although this suffix derives from PIE *-o-nt-, which comes to be the active present participle in such languages as Greek, Latin and Sanskrit, nevertheless:
Thus it may not even be proper to denote such forms as 'active' participles. For example, the root AB pik- 'paint', related to Latin pingere with the same meaning, forms A pekant. Rather than having the connotation of the Latin present participle pingens (-ntis) '(someone) painting (at the moment)', the Tocharian form pekant has the connotation of 'someone who paints by profession', i.e. a 'painter', akin to the Latin agent noun pictor.
Tocharian distinguishes four basic moods: indicative, subjunctive, optative, imperative. The indicative is the mood of simple fact, as in the English 'He was an athelete'. The indicative mood has both present and past tense formations in Tocharian. The subjunctive is the mood of hypothesis or supposition, as in the English 'If he were an athelete...' or 'Were he an athelete...'. Tocharian subjunctive forms do not differ based on tense; the subjunctive stem differs from the present tense stem, but nevertheless employs the same endings. Tocharian often uses the subjunctive where English would use a future tense. The optative is the mood of wish or desire, as in English 'Would he were an athelete'. This is formed by adding the typical PIE optative marker, *-i-, to the subjunctive stem. The imperative is the mood of direct command, as in English 'Be an athelete!' The imperative in Tocharian has no distinct stem, but is formed by addition of the prefix p- to the preterite or subjunctive stem. There are special endings for the 2nd person only, all numbers.
Tocharian distinguishes between past and present tenses only in the indicative. There are two past tenses: imperfect and preterite. The distinction is basically one of verbal aspect. The preterite denotes a simple past, with no connotation of an internal structure to the action; compare English 'he went'. The imperfect, by contrast, denotes an ongoing past, with the expressed connotation of the event having internal structure; compare English 'he was going'.
There is no separate morphological future tense in Tocharian. Though the present can be used to express the future, as in English 'I am going to the store tomorrow', the Tocharian subjunctive frequently acts as a future tense. Tocharian also employs verbal nouns to render the future tense.
One of the most fundamental distinctions in the Tocharian verbal system is that between base and causative. In linguistic terms, the notion of causative, or factitive, has to do with 'making' or 'causing' another verbal action to be done. That is, if we have a verb DO Y, then the causative of the verb denotes MAKE [(X) DO Y] or CAUSE [(X) TO DO Y]. Thus the causative construction applied to English 'paints a picture' is 'someone makes [(someone else) paint a picture]'. The underlying verb need not be eventive (e.g. DO Y), but may also be stative (e.g. BE Y). The causative construct then denotes MAKE [(X) BE Y]. Different languages realize this underlying formation differently, some through concatenation of lexical verbs as in English, some though separate morphological formations; or both, or neither. Consider, for example, English 'I am captain'. By the formulation just given, the basic causative rendering might be 'He makes me be captain'. English of course simplifies this to 'He makes me captain'. Thus the stative be captain has causative make captain.
Note that neither does English have one unique manner of forming this kind of causative. If we apply the same construction to the stative be strong, then we arrive at a causative make strong. But English has an alternate formation: strengthen. This of course applies only to certain adjectives, e.g. lengthen, widen, shorten, heighten; one cannot however say (!)closen, but rather one can only say make close. This disparity arises because rules that were once regular in the period of Old English no longer apply synchronically in Modern English. A similar situation obtains in Tocharian.
Tocharian achieves this distinction by changes to the verbal stem, giving rise to a distinction between a causative stem and a non-causative, or base, stem for a large number of verbs. The distinction between causative and non-causative runs throughout the present, subjunctive, and preterite systems; the base and causative stems of a given verb, however, are not the same throughout these systems. For example, consider the verb AB tsälp- 'be free of suffering, pass away'.
|Present||Toch. A||Toch. B|
The preterite forms show that one method of forming the causative in Tocharian involves reduplication: A tsälp vs. śaśälpāt. The initial consonant sequence and vowel of the root (subject to phonological modifications) are repeated at the beginning of the word. Another mark of the causative form is palatalization: B tsalpa vs. tsyālpāte. The initial consonant may develop a palatal off-glide.
The other forms, however, show traces of a more common method of forming the causative in Tocharian: addition of the PIE *-sḱ- and *-s- suffixes. These suffixes of course undergo phonological changes within the Tocharian languages. Tocharian A generally employs only the *-s- suffix, which often palatalizes into -ṣ-. Tocharian B employs both suffixes: *-sḱ- palatalizes as -ṣṣ- in CLASS IX, and *-s- as -ṣ- in CLASS VIII (cf. Lesson 4, Sections 19.3 and 19.4). For its part, the PIE *-sḱ- suffix does not have a causative meaning in many of the other Indo-European languages in which it survives, but often has either an inchoative or iterative connotation.
The Tocharian situation is also not as clear cut as one might hope. One must in fact be careful to distinguish between 'causative' as a purely formal morphological category and 'causative' as an actual semantic category. The above mentioned structures clearly show a formal morphological formation which we may call 'causative' because it does, in fact, have a true factitative meaning for many verbs. However this is not true for all verbs: many verbs have the structural markings of the causative formation, but show no apparent change in meaning distinct from the non-causative (base) formations. In general, one may only say that
|the 'causative' has true factitative value only when the base paradigm has intransitive value.|
That is to say, there are two basic possibilties as to how the base is converted into a causative:
The prose texts of Tocharian exhibit a basic Subject-Object-Verb (SOV) word order. The constraints of meter, among other considerations, often result in divergence from this pattern in poetry. Consider the following examples of the basic SOV word order:
|'The good fame of the strong spreads in the ten directions'.|
From Tocharian B we have
|'We take up the same manner which (our) brother takes'.|
Phrasal structure generally shows the following patterns: Adjective + Noun, Genitive + Noun, Noun + Postposition, the term 'postposition' denoting a 'preposition' that comes after the noun it governs. In general, the structure within Proto-Tocharian was one of Modifier + Head, where 'head' denotes the principal element on which all others in the phrase depend.
Speaking typologically, languages with verb-final syntax tend to employ postpositions rather than prepositions. This tendency is further exhibited in Tocharian through the development of the secondary cases, whose endings apparently began as postpositions. This postpositional quality is still felt strongly enough in Tocharian that it results in "group inflection", whereby only the last element in a string of nouns, or in a string composed of a noun with adjectives, takes the secondary case ending.