Tocharian denotes two closely related languages of the Indo-European family, denoted simply Tocharian A and Tocharian B. Though quite similar, Tocharian A and B are now considered by most scholars to be two distinct languages, and not merely two dialects of one common language. It is still common practice, however, to use the term Tocharian to refer to both languages when no particular distinction needs to be made. The phrase "the Tocharian language" should be understood in this sense, not as implying any dialectal status of either A or B, but rather as a common euphemism for the two languages considered in terms where their differences are irrelevant.
Note: this set of lessons is for systems/browsers lacking Unicode® support, or having less than full Unicode 2.0 font support. Lessons rendered in alternate character sets are available via links (Unicode 2 and Unicode 3) in the left margin, and at the bottom of this page.
The Tocharian languages were first discovered in documents unearthed in expeditions to Chinese Turkestan (East Turkestan, or Xinjiang); the sites are located along what was once the Silk Road. One primary site of Tocharian remains is Turfan. Turfan lies to the north of the Tarim Basin, a depression situated to the northeast of the Takla Makan Desert. A second major source of Tocharian documents is Kuc^a:, a city to the west of Turfan, in the center of the northern boundary of the Takla Makan Desert. A third major site, Tums^uq, forms the extreme western boundary of Tocharian finds. This lies along the northern rim of the desert, between Kuc^a: and Kas^ghar, a city at the desert's western extreme.
Tocharian A is found only in the region of Turfan and Qa:ra:s^ahr (Karashahr), a nearby oasis located to the west roughly midway between Turfan and Kuc^a:. Tocharian B, by contrast, is found throughout the entirety of this branch of the Silk Road, from Turfan in the east to Tums^uq in the west. Some sources therefore refer to Tocharian A as East Tocharian or Turfanian, since Turfan is at the easternmost extent of the Tocharian sites. Occasionally it is termed Agnean, referring to the Sanskrit designation Agni for Qa:ra:s^ahr. In this context, Tocharian B is referred to as West Tocharian (though it is found in the east, too), or Kuchean.
The Tocharian A and B documents all date to a period roughly between the sixth and eighth centuries AD. The materials are predominantly translations of Buddhist texts which were in common circulation in Central Asia. This of course is a double-edged sword: on the one hand, the well-known content assists in the process of decipherment; on the other hand, it provides very little information about the people who spoke the language. There are, however, some texts that are not translations of Buddhist progenitors, including monastic and business letters, caravan passes, and graffiti. These secular documents are all written in Tocharian B, leading some scholars to conclude that Tocharian A, by the time the surviving documents were written, may already have been extinct, preserved only as the liturgical language -- much as Latin was preserved, in Europe. The relative paucity of secular documents, however, necessarily makes such conclusions tentative.
Most of the Tocharian texts were, in origin, parts of monastic library collections, or were left in monasteries as votive offerings. Often, parts of these documents were picked up by the wind and swept out into the desert, making the number of complete Tocharian documents quite small and the manner of recovery at times haphazard. But Tocharian documents are not always found in isolation: Tocharian manuscripts from monastery libraries may lie side by side with Sanskrit manuscripts of the same era; at times texts in other languages, such as Old Persian and Uyghur, are found alongside the Tocharian texts; occasionally ga:ndha:ri: (a middle Indic language) documents are found in the same areas, but they date to an earlier era. The texts themselves were predominantly written in a variant of the north Indian Bra:hmi: script, which was also used for the nearby Middle Iranian language Tumshuqese. There are however some Tocharian B documents that employ Manichean script.
The name "Tocharian" (German Tocharisch) was proposed first by F. W. K. Müller in 1907, and a year later by the renowned pair of Tocharianists Sieg and Siegling. This name is now thought to be a misnomer, but nevertheless persists due to sheer inertia and the lack of a definitive replacement.
The origin of the name goes back to the discovery of an Old Turkic (Uyghur) text Maitrisimit nom bitig, a translation of the Buddhist Maitreyasamiti-Na:t.aka. The colophon of the work states (Adams, pp. 2-3):
"The sacred book Maitreya-samiti which the Bodhisattva guru a:ca:rya A:ryacandra, who was born in the country of Nagarades'a, had composed in the Twg'ry languages out of the Indian language, and which the guru a:ca:rya Prajn'araks.ita, who was born in Il-bliq, translated from the Twg'ry language into the Turkish language."
Thus a certain A:ryacandra composed the original work here referred to as Maitrisimit nom bitig. This is the same name as the composer of the Indic Maitreyasamiti-Na:t.aka, so that the identification of the original text appears to be solid. Apparently this work was then translated into toxrï (Twg'ry), and from that translated by the present author, Prajn'araks.ita, into Old Turkic (Uyghur).
Sieg and Siegling assumed that the intermediary language, toxrï, was the language which is the subject of these lessons, and that this language toxrï is identical to Greek To'kharoi and Sanskrit Tukha:ra, denoting inhabitants of Bactria. In some scholarly accounts, the inhabitants of this area were known as the Indo-Scythians, and so Sieg and Siegling proposed the name Tocharisch for the language of the Indo-Scythians.
Research and newly discovered documents have proven this identification to be untenable. The language of the Indo-Scythians is now termed Bactrian, and this language has left a number of loanwords in the languages now termed Tocharian. Tocharian, however, seems not to have left any linguistic traces in Bactrian, so these two cultures were likely never in direct contact.
What is clear is that toxrï is the name of Tocharian from the point of view of the Turks. This term, however, does not appear in the Tocharian languages themselves. In Tocharian A we find a:rs'i as a term likely denoting the Tocharians themselves. In Tocharian B we find the adjective k[u]s'in'n'e, derived from kus'i (also kuci), the name of a dynasty and state also known from Chinese documents. The Turkic people also mention a language küs^än (also küsän) from the Tarim Basin. One bilingual document listing first Sanskrit and then Tocharian B has the pair tokharikah. : k[u]can'n'e is.t.hake, but this is not as helpful as one would hope because tokharika is not the name of any known people.
In terms of geography, Tocharian is the easternmost of all the ancient Indo-European languages. Tocharian also appears to be a so-called centum language, meaning that the Proto-Indo-European palatovelars became true velars. This, coupled with its geographical location, came as a shock to the historical linguistics community. The other centum languages included the Celtic, Germanic, Italic, Hellenic, and Anatolian branches of Indo-European -- all located to the west of the ancient Indo-European speech community. Tocharian thus broke down the geographical interpretation of the centum-satem dialectal division, which held that the centum speakers formed a western dialect group in the original Indo-European community, and the satem speakers, an eastern group.
Though certainly Indo-European in heritage, Tocharian shows a number of departures from its historical source. In particular, it is notable for having a reduced phonetic inventory in comparison to other Indo-European languages, the Tocharian stops being all voiceless. Tocharian likewise seems to have lost a number of original cases of the Indo-European noun, and then developed a new and enlarged system. These cases tend to have what linguists call an agglutinative structure, and this combined with the reduced stop inventory leads many scholars to believe that Tocharian went through a long period of extended contact with western and central Asian language groups such as Uralic, Turkic, and Mongolian.
These lessons have benefitted from the assistance of a number of first-rate scholars, in particular Professor Douglas Q. Adams of the University of Idaho and Professor Georges-Jean Pinault of the École Pratique des Hautes Études in Paris. These scholars have graciously offered their assistance in various ways, including the selection and translation of texts and answering the author's many questions on points of Tocharian grammar. They have shown great support and enthusiasm for the project, and the lessons are far better for their generous input. The author extends to them his sincerest thanks. For various reasons, however, the author has occasionally not been able to follow their advice, much to his own peril; he is of course solely responsible for any errors of omission or commission that remain in these lessons.
Our plan is to present 5 readings for Tocharian A and 5 for Tocharian B. The structures of the two languages are treated simultaneously in the grammar points of each lesson. At the moment, lesson preparation is in its early stages, so changes may be made anywhere at any time. Check back here periodically...
Note: there are great disparities in capability among personal computers in contemporary use. Unfortunately, support for Unicode® and/or the repertoire of fonts installed on your personal computer cannot be detected by a web server! Accordingly, we have prepared multiple versions of each lesson; this set of lessons is for systems/browsers lacking Unicode support, or having less than full Unicode 2.0 font support. (You may switch to other versions via links below.) Lessons are still under development, hence subject to change:
Our Web Links page includes pointers to Tocharian resources elsewhere.