Tocharian denotes two closely related languages of the Indo-European family, denoted simply Tocharian A and Tocharian B. Though quite similar, Tocharian A and B are now considered by most scholars to be two distinct languages, and not merely two dialects of one common language. It is still common practice, however, to use the term Tocharian to refer to both languages when no particular distinction needs to be made. The phrase "the Tocharian language" should be understood in this sense, not as implying any dialectal status of either A or B, but rather as a common euphemism for the two languages considered in terms where their differences are irrelevant.
Note: this set of lessons is for systems/browsers with Unicode® support and fonts spanning the Unicode 3 character set relevant to (Romanized) Tocharian. Lessons rendered in alternate character sets are available via links (Romanized and Unicode 2) in the left margin, and at the bottom of this page.
The Tocharian languages were first discovered in documents unearthed in expeditions to Chinese Turkestan (East Turkestan, or Xinjiang); the sites are located along what was once the Silk Road. One primary site of Tocharian remains is Turfan. Turfan lies to the north of the Tarim Basin, a depression situated to the northeast of the Takla Makan Desert. A second major source of Tocharian documents is Kučā, a city to the west of Turfan, in the center of the northern boundary of the Takla Makan Desert. A third major site, Tumšuq, forms the extreme western boundary of Tocharian finds. This lies along the northern rim of the desert, between Kučā and Kašgʰar, a city at the desert's western extreme.
Tocharian A is found only in the region of Turfan and Qārāšahr (Karashahr), a nearby oasis located to the west roughly midway between Turfan and Kučā. Tocharian B, by contrast, is found throughout the entirety of this branch of the Silk Road, from Turfan in the east to Tumšuq in the west. Some sources therefore refer to Tocharian A as East Tocharian or Turfanian, since Turfan is at the easternmost extent of the Tocharian sites. Occasionally it is termed Agnean, referring to the Sanskrit designation Agni for Qārāšahr. In this context, Tocharian B is referred to as West Tocharian (though it is found in the east, too) or Kuchean.
The Tocharian documents all date to a period roughly between the sixth and eighth centuries AD. The materials are predominantly translations of Buddhist texts which were in common circulation in Central Asia. This of course is a double-edged sword: on the one hand, the well-known content assists in the process of decipherment; on the other hand, it provides very little information about the people who spoke the language. There are, however, some texts that are not translations of Buddhist progenitors, including monastic and business letters, caravan passes, and graffiti. These secular documents are all written in Tocharian B, leading some scholars to conclude that Tocharian A, by the time the surviving documents were written, may already have been an extinct language, preserved only as the liturgical language -- much as Latin was, in Europe. The relative paucity of such secular documents, however, necessarily makes such conclusions tentative.
Most of the Tocharian texts were in origin parts of monastic library collections, or left in monasteries as votive offerings. Often, parts of these documents were picked up by the wind and swept out into the desert, making the number of complete Tocharian documents quite small and making the manner of recovery at times haphazard. The Tocharian documents are not found in isolation. Tocharian manuscripts from monastery libraries naturally lie side by side with Sanskrit manuscripts of the same era. At times texts in other languages, such as Old Persian and Uyghur, are found alongside the Tocharian texts. Occasionally documents in gāndʰārī, a Middle Indic language, are found in the same areas, but they date to an earlier era. The texts themselves were predominantly written in a variant of the north Indian Brāhmī script, which was also used for the nearby Middle Iranian language Tumshuqese. There are however some Tocharian B documents that employ Manichean script (used to spread writings of the Manichean religion, which originated in the Mesopotamian region).
The name "Tocharian" (German Tocharisch) was proposed first by F. W. K. Müller in 1907, and a year later by the renowned pair of Tocharianists Sieg and Siegling. This name is now thought to be a misnomer, but nevertheless remains due to sheer inertia and the lack of a definitive replacement.
The origin of the name goes back to the discovery of an Old Turkic (Uyghur) text Maitrisimit nom bitig, a translation of the Buddhist Sanskrit Maitreyasamiti-Nāṭaka. The colophon of the work states (Adams, pp. 2-3):
"The sacred book Maitreya-samiti which the Bodhisattva guru ācārya Āryacandra, who was born in the country of Nagaradeśa, had composed in the Twγry languages out of the Indian language, and which the guru ācārya Prajñarakṣita, who was born in Il-bliq, translated from the Twγry language into the Turkish language."
Thus a certain Āryacandra composed the original work here referred to as Maitrisimit nom bitig. This is the same name as the composer of the Indic Maitreyasamiti-Nāṭaka, so that the identification of the original text appears to be solid. Apparently this work was then translated into toxrï (Twγry), and from that translated by the present author, Prajñarakṣita, into Old Turkic (Uyghur).
Sieg and Siegling assumed that the intermediary language, toxrï, was the language which is the subject of these lessons, and that this language toxrï is identical to Greek To'kʰaroi and Sanskrit Tukʰāra, denoting inhabitants of Bactria. In some scholarly accounts, the inhabitants of this area were known as the Indo-Scythians, and so Sieg and Siegling proposed the name Tocharisch for the language of the Indo-Scythians.
Research and newly discovered documents have proven this identification to be untenable. The language of the Indo-Scythians is now termed Bactrian, and this language has left a number of loanwords in the languages now termed Tocharian. Tocharian, however, seems not to have left any linguistic traces in Bactrian, so these two cultures were likely never in direct contact.
What is clear is that toxrï is the name of Tocharian from the point of view of the Turkic people. This term, however, does not appear in the Tocharian languages themselves. In Tocharian A we find ārśi as a term likely denoting the Tocharians themselves. In Tocharian B we find the adjective kuśiññe, derived from kuśi (also kuci), the name of a dynasty and state also known from Chinese documents. The Turkic people also mention a language küšän (also küsän) from the Tarim Basin. One bilingual document listing first Sanskrit and then Tocharian B has the pair tokʰarikaḥ : kucaññe iṣṭʰake, but this is not as helpful as one would hope, since tokʰarika (Gk. To'kʰaroi ?) is not the name of any known people.
In terms of geography, Tocharian is the easternmost of all the ancient Indo-European languages. Tocharian also appears to be a so-called centum language, meaning that the Proto-Indo-European palatovelars became true velars. This, coupled with its geographical location, came as a shock to the historical linguistics community. The other centum languages included the Celtic, Germanic, Italic, and Hellenic branches of Indo-European -- all located to the west of the ancient Indo-European speech community. Tocharian thus broke down the geographical interpretation of the centum-satem dialectal division, which held that the centum speakers formed a western dialect group in the original Indo-European community, and the satem speakers, an eastern group.
Though certainly Indo-European in heritage, Tocharian shows a number of departures from its historical source. In particular, it is notable for having a reduced phonetic inventory in comparison to other Indo-European languages, the Tocharian stop consonants (e.g. p, t, k) being all voiceless (e.g. no equivalents to English b, d, g). Tocharian likewise seems to have lost a number of original cases of the Indo-European noun, and then developed a new and enlarged system. These cases tend to have what linguists call an agglutinative structure similar to that found in Japanese or in Turkic languages (among others), and this combined with the reduced inventory of stop consonants leads many scholars to believe that Tocharian went through a long period of extended contact with western and central Asian language groups such as Uralic, Turkic, and Mongolian. (The above technical terms will be explained in the course of the lessons, but the reader may at any time use the Table of Contents to locate discussion of a particular topic.)
These lessons have benefitted from the assistance of a number of first-rate scholars, in particular Professor Douglas Q. Adams of the University of Idaho and Professor Georges-Jean Pinault of the École Pratique des Hautes Études in Paris. These scholars have graciously offered their assistance in various ways, including the selection and translation of texts and answering the author's many questions on points of Tocharian grammar. They have shown great support and enthusiasm for the project, and the lessons are far better for their generous input. No less important have been the many suggestions for improvement made by Dr. Stephie Nikoloudis. The author extends to all of them his sincerest thanks. For various reasons, however, the author has occasionally not been able to follow their advice, much to his own peril; he is of course solely responsible for any errors of omission or commission that remain in these lessons.
Our plan for this lesson series is to present five readings for Tocharian A and five for Tocharian B. Every reading contains a complete grammatical analysis of each word. As part of the series design, we employ only original texts; this of course implies that the grammatical structures employed in any given text may come from any aspect of the Tocharian grammatical system, whether covered in previous or subsequent lessons. Thus the reader may encounter grammatical analyses employing terminology that is not yet familiar. We urge the reader to glean whatever information is possible from the analysis and accompanying translation, but not to get bogged down in the details. As one progresses through the lessons, these points will become clearer. For the grasshopper who likes to leap about, we point to the Table of Contents where one can jump to the description of a specific grammatical topic.
Because the Tocharian languages are so similar w.r.t. grammar and morphology, discussing the grammar of one language in isolation from the other would prove redundant. Therefore the structures of the two languages are treated simultaneously in the grammar points of each lesson. If the reader chooses, he or she may simply ignore the details of one language while concentrating on the other.
Note: there are great disparities in capability among personal computers in contemporary use. Unfortunately, support for Unicode® and/or the repertoire of fonts installed on your personal computer cannot be detected by a web server! Accordingly, we have prepared multiple versions of each lesson; this set of lessons is for systems/browsers with Unicode support and fonts spanning the Unicode 3 character set relevant to Tocharian. (You may switch to other versions via links below.) Lessons are still under development, hence subject to change at any time:
Online language courses for college credit are offered through the University Extension (link opens in a new browser window).