The University of Texas at Austin; College of Liberal Arts
Jonathan Slocum, Interim Director :: PCL 5.112, 1 University Station S5490 :: Austin, TX 78712 :: 512-471-4566
LRC Links: Home | About | Books Online | EIEOL | IE Doc. Center | IE Lexicon | IE Maps | IE Texts | Pub. Indices | SiteMap

Tocharian Online

Series Introduction

Todd B. Krause and Jonathan Slocum

Tocharian denotes two closely related languages of the Indo-European family, denoted simply Tocharian A and Tocharian B. Though quite similar, Tocharian A and B are now considered by most scholars to be two distinct languages, and not merely two dialects of one common language. It is still common practice, however, to use the term Tocharian to refer to both languages when no particular distinction needs to be made. The phrase "the Tocharian language" should be understood in this sense, not as implying any dialectal status of either A or B, but rather as a common euphemism for the two languages considered in terms where their differences are irrelevant.

Note: this set of lessons is for systems/browsers with Unicode® support, but fonts for only the Unicode 2.0 character set (including combining diacritics). Lessons rendered in alternate character sets are available via links (Romanized and Unicode 3) in the left margin, and at the bottom of this page.
Geographic Location

The Tocharian languages were first discovered in documents unearthed in expeditions to Chinese Turkestan (East Turkestan, or Xinjiang); the sites are located along what was once the Silk Road. One primary site of Tocharian remains is Turfan. Turfan lies to the north of the Tarim Basin, a depression situated to the northeast of the Takla Makan Desert. A second major source of Tocharian documents is Kučā, a city to the west of Turfan, in the center of the northern boundary of the Takla Makan Desert. A third major site, Tumšuq, forms the extreme western boundary of Tocharian finds. This lies along the northern rim of the desert, between Kučā and Kašgʰar, a city at the desert's western extreme.

Tocharian A is found only in the region of Turfan and Qārāšahr (Karashahr), a nearby oasis located to the west roughly midway between Turfan and Kučā. Tocharian B, by contrast, is found throughout the entirety of this branch of the Silk Road, from Turfan in the east to Tumšuq in the west. Some sources therefore refer to Tocharian A as East Tocharian or Turfanian, since Turfan is at the easternmost extent of the Tocharian sites. Occasionally it is termed Agnean, referring to the Sanskrit designation Agni for Qārāšahr. In this context, Tocharian B is referred to as West Tocharian (though it is found in the east, too), or Kuchean.

Tocharian Texts

The Tocharian A and B documents all date to a period roughly between the sixth and eighth centuries AD. The materials are predominantly translations of Buddhist texts which were in common circulation in Central Asia. This of course is a double-edged sword: on the one hand, the well-known content assists in the process of decipherment; on the other hand, it provides very little information about the people who spoke the language. There are, however, some texts that are not translations of Buddhist progenitors, including monastic and business letters, caravan passes, and graffiti. These secular documents are all written in Tocharian B, leading some scholars to conclude that Tocharian A, by the time the surviving documents were written, may already have been extinct, preserved only as the liturgical language -- much as Latin was preserved, in Europe. The relative paucity of secular documents, however, necessarily makes such conclusions tentative.

Most of the Tocharian texts were, in origin, parts of monastic library collections, or were left in monasteries as votive offerings. Often, parts of these documents were picked up by the wind and swept out into the desert, making the number of complete Tocharian documents quite small and the manner of recovery at times haphazard. But Tocharian documents are not always found in isolation: Tocharian manuscripts from monastery libraries may lie side by side with Sanskrit manuscripts of the same era; at times texts in other languages, such as Old Persian and Uyghur, are found alongside the Tocharian texts; occasionally gāndʰārī (a middle Indic language) documents are found in the same areas, but they date to an earlier era. The texts themselves were predominantly written in a variant of the north Indian Brāhmī script, which was also used for the nearby Middle Iranian language Tumshuqese. There are however some Tocharian B documents that employ Manichean script.

The Name "Tocharian"

The name "Tocharian" (German Tocharisch) was proposed first by F. W. K. Müller in 1907, and a year later by the renowned pair of Tocharianists Sieg and Siegling. This name is now thought to be a misnomer, but nevertheless persists due to sheer inertia and the lack of a definitive replacement.

The origin of the name goes back to the discovery of an Old Turkic (Uyghur) text Maitrisimit nom bitig, a translation of the Buddhist Maitreyasamiti-Nāṭaka. The colophon of the work states (Adams, pp. 2-3):

"The sacred book Maitreya-samiti which the Bodhisattva guru ācārya Āryacandra, who was born in the country of Nagaradeśa, had composed in the Twγry languages out of the Indian language, and which the guru ācārya Prajñarakṣita, who was born in Il-bliq, translated from the Twγry language into the Turkish language."

Thus a certain Āryacandra composed the original work here referred to as Maitrisimit nom bitig. This is the same name as the composer of the Indic Maitreyasamiti-Nāṭaka, so that the identification of the original text appears to be solid. Apparently this work was then translated into toxrï (Twγry), and from that translated by the present author, Prajñarakṣita, into Old Turkic (Uyghur).

Sieg and Siegling assumed that the intermediary language, toxrï, was the language which is the subject of these lessons, and that this language toxrï is identical to Greek To'kʰaroi and Sanskrit Tukʰāra, denoting inhabitants of Bactria. In some scholarly accounts, the inhabitants of this area were known as the Indo-Scythians, and so Sieg and Siegling proposed the name Tocharisch for the language of the Indo-Scythians.

Research and newly discovered documents have proven this identification to be untenable. The language of the Indo-Scythians is now termed Bactrian, and this language has left a number of loanwords in the languages now termed Tocharian. Tocharian, however, seems not to have left any linguistic traces in Bactrian, so these two cultures were likely never in direct contact.

What is clear is that toxrï is the name of Tocharian from the point of view of the Turks. This term, however, does not appear in the Tocharian languages themselves. In Tocharian A we find ārśi as a term likely denoting the Tocharians themselves. In Tocharian B we find the adjective k[u]śiññe, derived from kuśi (also kuci), the name of a dynasty and state also known from Chinese documents. The Turkic people also mention a language küšän (also küsän) from the Tarim Basin. One bilingual document listing first Sanskrit and then Tocharian B has the pair tokʰarikaḥ : k[u]caññe iṣṭʰake, but this is not as helpful as one would hope because tokʰarika is not the name of any known people.

The Position of Tocharian in Indo-European

In terms of geography, Tocharian is the easternmost of all the ancient Indo-European languages. Tocharian also appears to be a so-called centum language, meaning that the Proto-Indo-European palatovelars became true velars. This, coupled with its geographical location, came as a shock to the historical linguistics community. The other centum languages included the Celtic, Germanic, Italic, Hellenic, and Anatolian branches of Indo-European -- all located to the west of the ancient Indo-European speech community. Tocharian thus broke down the geographical interpretation of the centum-satem dialectal division, which held that the centum speakers formed a western dialect group in the original Indo-European community, and the satem speakers, an eastern group.

Though certainly Indo-European in heritage, Tocharian shows a number of departures from its historical source. In particular, it is notable for having a reduced phonetic inventory in comparison to other Indo-European languages, the Tocharian stops being all voiceless. Tocharian likewise seems to have lost a number of original cases of the Indo-European noun, and then developed a new and enlarged system. These cases tend to have what linguists call an agglutinative structure, and this combined with the reduced stop inventory leads many scholars to believe that Tocharian went through a long period of extended contact with western and central Asian language groups such as Uralic, Turkic, and Mongolian.

Acknowledgements

These lessons have benefitted from the assistance of a number of first-rate scholars, in particular Professor Douglas Q. Adams of the University of Idaho and Professor Georges-Jean Pinault of the École Pratique des Hautes Études in Paris. These scholars have graciously offered their assistance in various ways, including the selection and translation of texts and answering the author's many questions on points of Tocharian grammar. They have shown great support and enthusiasm for the project, and the lessons are far better for their generous input. The author extends to them his sincerest thanks. For various reasons, however, the author has occasionally not been able to follow their advice, much to his own peril; he is of course solely responsible for any errors of omission or commission that remain in these lessons.

Tocharian Lessons

Our plan is to present 5 readings for Tocharian A and 5 for Tocharian B. The structures of the two languages are treated simultaneously in the grammar points of each lesson. At the moment, lesson preparation is in its early stages, so changes may be made anywhere at any time. Check back here periodically...

Note: there are great disparities in capability among personal computers in contemporary use. Unfortunately, support for Unicode® and/or the repertoire of fonts installed on your personal computer cannot be detected by a web server! Accordingly, we have prepared multiple versions of each lesson; this set of lessons is for systems/browsers with Unicode support, but fonts for only the Unicode 2.0 character set (including combining diacritics). (You may switch to other versions via links below.) Lessons are still under development, hence subject to change:
  1. Buddhist Puṇyavanta-Jātaka
  2. Buddhist Puṇyavanta-Jātaka (cont'd)
  3. Exerpt from the Maitreyasamiti-nāṭaka (A255/THT888)
  4. Maitreyasamiti-nāṭaka (A255/THT888, cont'd)
  5. Maitreyasamiti-nāṭaka (A255/THT888, cont'd)
Options:

Tocharian Resources Elsewhere

Our Web Links page includes pointers to Tocharian resources elsewhere.