The University of Texas at Austin; College of Liberal Arts
Hans C. Boas, Director :: PCL 5.556, 1 University Station S5490 :: Austin, TX 78712 :: 512-471-4566
LRC Links: Home | About | Books Online | EIEOL | IE Doc. Center | IE Lexicon | IE Maps | IE Texts | Pub. Indices | SiteMap

Indo-European Languages

Evolution and Locale Maps

Jonathan Slocum

All living languages evolve over time, adding & losing vocabulary, morphological behavior, and syntactic structures, and changing in the ways they are pronounced by their speakers. Even without knowing how or why these evolutionary mechanisms operate, one can still get a feel for their effects; for example, they account for the differences between American and British English, and for the fact that neither Americans nor Brits can understand Beowulf at all without first being taught how to read the Old English language in which it was composed. Even the writings of Shakespeare -- much more recent than Beowulf -- can be difficult for modern English speakers to interpret. The field of study that concerns itself with language evolution is called historical linguistics.

A large number of related languages form what is called the Indo-European macrofamily. These languages all evolved from a common ancestral tongue called Proto-Indo-European (PIE), spoken ca. 6,000 years ago by a people living (by "traditional" hypothesis) somewhere in the general vicinity of the Pontic Steppe north of the Black Sea and east to the Caspian -- an area that, perhaps not accidentally, seems to coincide with the land of the ancient Scythians, from the Ukraine across far southwestern Russia to western Kazakhstan. (N.B. Many claims on this page are debated, in their details, but on the whole they seem best to fit the evidence and are accepted by most scholars; herein, we shall not bother to acknowledge the myriad debates but instead present a broad-brush picture for a general audience.)

Proto-Indo-European speakers grew in number and influence -- they are credited with the domestication of horses and the invention of the chariot, among many other innovations -- and spread east & west, north & south. But before the invention of any writing system known to its speakers, PIE had died out: as Indo-Europeans expanded from the ancestral homeland and brought forth new generations, PIE evolved, first into disparate dialects, and then into mutually incomprehensible daughter languages. Ten "proto-language" families are identified today: using what historical linguists call the comparative method, their probable forms (and that of Proto-Indo-European itself) can be reconstructed based on similarities and differences among descendants that were attested in inscriptions and literary & religious texts. (Such written records began to appear about a thousand years after PIE was last spoken.) For a sketch of the evolution of PIE into its major proto-languages, see Evolution of IE Families.

The Indo-European proto-languages themselves evolved, each giving rise to its own family of languages. Each family is identified with the proto-language from which it sprung; these families are conventionally listed in order, roughly from west to east with respect to the homelands their speakers came to occupy. The ten families, linked to modern maps of their homeland areas (which open in a separate window), are:

  1. Celtic, with languages spoken in the British Isles, in Spain, and across southern Europe to central Turkey;
  2. Germanic, with languages spoken in England and throughout Scandinavia & central Europe to Crimea;
  3. Italic, with languages spoken in Italy and, later, throughout the Roman Empire including modern-day Portugal, Spain, France, and Romania;
  4. Balto-Slavic, with Baltic languages spoken in Latvia & Lithuania, and Slavic throughout eastern Europe plus Belarus & the Ukraine & Russia;
  5. Balkan (exceptional, as discussed below), with languages spoken mostly in the Balkans and far western Turkey;
  6. Hellenic, spoken in Greece and the Aegean Islands and, later, in other areas conquered by Alexander (but mostly around the Mediterranean);
  7. Anatolian, with languages spoken in Anatolia, a.k.a. Asia Minor, i.e. modern Turkey;
  8. Armenian, spoken in Armenia and nearby areas including eastern Turkey;
  9. Indo-Iranian, with languages spoken from India through Pakistan and Afghanistan to Iran and Kurdish areas of Iraq and Turkey;
  10. Tocharian, spoken in the Tarim Basin of Xinjiang, in far western China.

Each table that follows presents a highly schematic sketch of the evolutionary paths leading from the family ancestor to later, attested languages -- up to the present time, in the case of families that did not entirely die out. (Anatolian and Tocharian are the only known families that are now extinct.) By highly schematic we mean, for example, that dates are very approximate: we adopt, for sheer presentation convenience, quite arbitrary ranges of 500 or 1000 years that have little to do with accurate dates even when these might be known, which is seldom. What is important is that the general picture is instructive; for details the reader is referred to the vast literature of historical linguistics, now well over 200 years in the making and brimming with hypotheses, supporting arguments, and disagreements major & minor.

In the tables that follow, columns show 500/1000-year ranges, reading left to right; successive rows display groupings of sub-families (in bold face), languages within them (italicized if dead), and, reading left to right, not just a chronological but an evolutionary sequence (except for the Balkan languages). After each family section heading, important points related to the table that follows are briefly surveyed; for the reader's convenience, most geographic names are in modern English. Note: even where surviving languages in a family may number in the hundreds, and may be spoken by over a billion people (as in the case of the Indo-Iranian family), only a very few languages are selected for illustration here. For every family except Balkan, there are one or more languages for which online texts & lessons are or will be available in our Early Indo-European Online (EIEOL) series; links are provided from those languages to their series introductions.


Proto-Celtic speakers moved generally west from the PIE homeland, probably alongside groups from the Italic branch, spreading across southern Europe into central Turkey, northern Italy, France, Spain, and eventually the British Isles. As centuries passed, their language evolved into one group of languages labelled Continental (spoken by "Gauls" across southern Europe and mentioned by Julius Caesar among others), and another labelled Insular (spoken in the British Isles). Continental Celts later adopted Latin, or Greek in the case of those in Turkey, and the Continental Celtic languages, attested from the 6th century B.C., were lost. Insular Celtic split into a Goidelic subgroup that developed in Ireland, and a Brythonic subgroup that developed in England & Wales. Later in history, Goidelic Celts migrated to Scotland; also later in history, Brythonic Celts under pressure from the Anglo-Saxons returned to the Continent and settled in Brittany, on the western point of France.

2000-1000   1000-500   500-1 BC 1-500 AD   500-1000   1000-1500   1500-2000
Proto-Celtic   Continental   Celtiberian              
    Insular   Goidelic Ogham Irish   Old Irish   Middle Irish   Irish Gaelic
                      Scots Gaelic
        Brythonic     Old Welsh   Middle Welsh   Welsh
              Old Cornish   Middle Cornish   Cornish
              Old Breton   Middle Breton   Breton
See also:


The Germanic tribes generally followed behind the Celts, but moved somewhat further north. Their language developed into three groups of tongues labelled East, North, and West for their geographic distribution, with Runic now being considered the likely ancestor of the latter two. Gothic is the only attested language from the east, with a 4th century translation of the Bible, although Vandalic is known to have been spoken by Vandals who migrated across the fading Roman Empire through Spain to north Africa (see also map of the Germanic Kingdoms in 526). Most of the Goths blended into the Empire and their language was replaced by local Latin dialects, but some migrated east into Crimea, where their language survived to the 16th century.

Limited amounts of "Northwest Germanic" text survive from the 1st/2nd centuries A.D., carved in Runic script; later, the North Germanic languages developed in far north Europe (primarily the Scandinavian countries Denmark, Sweden, Norway, and their islands). Old Norse was the language of the Vikings, who settled Iceland as well as Scandinavia.

West Germanic languages developed in two main groups, one ("High German") at higher elevations, in southern Germany, Switzerland, and Austria, and the other ("Low German") further north and along the coast, including the Netherlands and Belgium. Modern German evolved from the former; modern English, via Old English a.k.a. Anglo-Saxon (see the map of Angles & Saxons about 600 A.D.), from the latter. (The term "Pennsylvania Dutch" is a modern misnomer: the original speakers came from central & southern Germany, even Switzerland -- not from the Netherlands.)

2000-500 500-1 BC   1-500 AD   500-1000 1000-1500 1500-2000
Proto-Germanic East   Gothic     Crimean Gothic  
  Runic   North   Old Norse Old Icelandic Icelandic
            Old Norwegian Norwegian
            Old Swedish Swedish
            Old Danish Danish
      West   Old High German Middle High German German
              Swiss German
              Pennsylvania Dutch
          Old Saxon Middle Low German Low German
          Old English Middle English English
          Old Dutch Middle Dutch Dutch
See also:


The Italic peoples began their descent into the Italian peninsula around the 2nd millenium B.C. Two subgroups developed from Proto-Italic -- Sabellic and Latino-Faliscan, both attested by 7th century B.C. inscriptions (the former in Umbrian, the latter in Faliscan). But the growing strength of the Latin speakers, culminating in the Roman Empire, resulted in most competing tongues in Italy (and many elsewhere, for example Continental Celtic) being extinguished. With the collapse of the Empire, the provincial Vulgar Latin dialects rather than Classical Latin survived, and in time developed into the Romance languages (see map of the European Provinces of Rome).

2000-1000   1000-500   500-1 BC   1-500 AD   500-1000   1000-1500   1500-2000
Proto-Italic   Sabellic   Oscan                
    Latino-Faliscan   Faliscan                
        Latin   Classical Latin   Vulgar       Romanian
                    Old Italian   Italian
                    Old French   French
                    Old Provençal   Provençal
                    Old Spanish   Spanish
                    Old Portuguese   Portuguese
See also:


While the Balto-Slavic (and especially the Baltic) languages of eastern Europe are attested only late, even by Indo-European standards, there are characteristics that strongly suggest they are highly conservative (most especially Baltic) and retain features akin to Proto-Indo-European. No Slavic language is attested until the mid-9th century A.D. (Old Church Slavonic), and no Baltic language until the 14th century (some Old Prussian words & phrases). Old Church Slavonic and Old Prussian became extinct, but Slavic and Baltic sibling languages survived.

2000-1000   1000-1 BC   1-500 AD   500-1000   1000-1500   1500-2000
Proto-Balto-Slavic   Proto-Baltic       Western   Old Prussian    
            Eastern   Old Lithuanian   Lithuanian
                Old Latvian   Latvian
    Proto-Slavic       South   Old Church Slavonic    
                Eastern South   Bulgarian
                Western South   Serbian
            East   Old Russian   Russian
            West   Old Polish   Polish
See also:


The "family" of Balkan languages (see also the old map of Macedonia, Thrace, Illyria, Moesia and Dacia) is exceptional in that there are far too few early texts to support strong hypotheses about genetic relationships among the erstwhile members. This doesn't mean there are no hypotheses -- they are, in fact, numerous! -- but it does mean that no firm conclusions can be drawn because evidence is paltry or absent. As one example, the "traditional" hypothesis is that Illyrian is the ancestor of Albanian; but as there are no native texts in Illyrian, it is difficult to say much of anything certain about it. It seems nevertheless that these two differ in a fundamental manner that, in Indo-European linguistics, has always marked a crucial distinction (denoted by the terms "centum" vs. "satem"). The languages in the table below are grouped into a "family" for reasons as much geographic as linguistic, and the chronological sequence of languages, left to right, cannot be taken to suggest their evolutionary sequence.

2000-1000   1000-500   500-1 BC   1-500 AD   500-1000   1000-1500   1500-2000
Proto-Balkan   Phrygian   Thracian   Dacian           Albanian
See also:


For all practical purposes, the Hellenic family is represented by a single language spoken in Greece and the Aegean Islands: Greek, which is attested in a number of dialects spanning more than three millenia. The oldest, Mycenaean Greek texts pre-date the 14th century B.C. (see map of Mycenaean Greece), and were written in the script known as Linear B. But an invasion of (illiterate?) Dorian tribes ca. 1100 B.C. was followed by the collapse of Mycenaean civilization and the loss of the art of Greek writing. A few hundred years later the Greeks adapted a Phoenician script -- adding, for the first time, letters representing vowels. This script developed into what we know as the Greek alphabet, which formed the early basis of the Etruscan & Roman alphabets among others (a more modern example being Cyrillic).

2000-1500   1500-1000   1000-500   500-1 BC   1-500 AD 500-1500 1500-2000
Proto-Greek   Mycenaean   Ancient Greek   Attic Greek   Koine Greek Middle Greek Greek
        Homeric Greek            
        Doric Greek            
See also:


The Anatolian family includes the oldest attested Indo-European languages: some Hittite documents are dated as early as the 18th century B.C. It is thought to have been the first branch of Indo-European to separate from PIE, and it was also the first branch [known to us] to become extinct, being replaced by Greek ca. 2nd/1st century B.C. Buried and lost until modern times, Hittite cuneiform tablets were first unearthed in the early 20th century in north-central Turkey, and helped revolutionize Indo-European linguistics. A sister language, Luwian, was probably spoken in Homer's Troy, located southwest of the Dardanelles.

2500-2000   2000-1500   1500-1000   1000-500   500-1 BC   1-1000 AD   1000-2000
Proto-Anatolian   Old Hittite   Middle/New Hittite       Lydian        
    Luwian           Lycian        
See also:


The earliest documentary evidence re: the Armenians is a 6th century B.C. inscription at Behistun by the Persian king Darius I. Herodotus, writing a century later, stated that the Armenians had lived in Thrace and moved into Phrygia, from which they crossed into the [later] territory of Armenia. But though Armenians are known to history as a people, their language was first attested by a translation of the Bible a full thousand years later, following the invention by Mesrop, a Christian monk, of a suitable alphabet; by that time, Classical Armenian evidenced strong influence by Iranian tongues, especially Parthian. Other loan words from Anatolian languages attest to early Armenian presence in western and central Turkey. Due to manifold linguistic influences, evidenced for example by many isoglosses with Greek, it is difficult to support arguments for a close connection with any other Indo-European language family in particular.

2000-1000 1000-500   500-1 BC   1-500 AD   500-1000 1000-1500 1500-2000
Proto-Armenian             Classical Armenian Middle Armenian Armenian
See also:


Proto-Indo-Iranian speakers moved east & south from the PIE ancestral homeland. Then, still in prehistoric times, the Indo-Iranian family split into Indic and Iranian branches, labelled for their early literary centers (roughly speaking) in India and Iran.

Although written Indic documents do not exist of an age comparable to that of Hittite, the language of the Rigveda is thought to be well-preserved from a form dating to perhaps the early 2nd millenium B.C. In particular, when the grammar for Sanskrit was being composed by Panini ca. 400 B.C., Rigvedic was already archaic and, in many respects, no longer understood -- a situation analogous to modern English speakers' problems understanding the language of Beowulf. Even some of the poetic structures of the Rigveda were no longer recognized -- again, a situation analogous to our modern ignorance of Old English poetic structures. Nevertheless, oral transmission of liturgy and poetry can be, and for the Rigveda is believed to have been, amazingly accurate. Accordingly, early Indic compositions can be studied with almost as much confidence as is invested in later, written texts in Pali, Prakrit, etc.

Somewhat like Rigvedic (a close descendant of Proto-Indic), Avestan (a descendant of Proto-Iranian) was represented by memorized religious compositions for centuries before they were written down. The Avestan language itself, then, is of unknown but great age. Although it is still important in Zoroastrian liturgy, it does not have living descendants. Two languages closely related to it, Bactrian and Old Persian, have many modern descendants including Pashto and Farsi.

2000-1500 1500-1000 1000-500   500-1 BC 1-500 AD   500-1000 1000-1500 1500-2000
Proto-Indo-Iranian Proto-Indic Rigvedic   Sanskrit          
        Pali Prakrit   Apabhramsha Old Hindi Hindi/Urdu
  Proto-Iranian Avestan              
    Eastern   Bactrian     Sogdian   Pashto
    Western   Old Persian     Pahlavi   Farsi
See also:


Like the Anatolian language family, the Tocharian family is extinct; also like Anatolian, Tocharian texts were deciphered in the early 20th century and their study has suggested major changes to theories about early Indo-European (IE) languages. Prominent among these is the fact that Tocharian exhibits some fundamental affinities to the more western language families, such as Celtic, Italic, Hellenic and especially Germanic, that distinguish it from the geographically much closer eastern language families, such as Indo-Iranian or even Balto-Slavic. This does not mean that Tocharian is particularly close to any western European language family, though many individual parallels have been drawn, but only that it seems closer to them as a group than to the eastern IE languages. How western European (?) Tocharian speakers came to live in the Tarim Basin in Xinjiang, China, is a mystery yet unresolved. However, it is noteworthy that the Silk Road was established through that area around the same time Tocharian speakers seem to have arrived: the appearance of a highly mobile European people at the inception of a major Eurasian trade link might not be a coincidence.

It is by no means certain that western European affinities demonstrate a prior western European presence: sometimes similarities exist by chance; but if chance is ruled out, there may have been sufficient linguistic contact between Proto-Tocharian speakers and others destined to live in western Europe, before the IE break-up. It seems rather likely that Tocharian peoples migrated directly east from the PIE homeland and discovered exotic trade goods awaiting further exploitation. Tocharian, unattested, later evolved into two separate languages, conventionally denoted as Tocharian A (eastern, a.k.a. Turfanian) and Tocharian B (western, a.k.a. Kuchean), both located along the north rim of the Tarim Basin; in the 6th-8th century A.D. texts so far discovered, A seems to have been in liturgical use only, while B was yet a living vernacular. Evidence for yet a third offshoot, Tocharian C, somewhat older than the other two, has been unearthed along the southern rim of the Tarim Basin.

2000-1000   1000-500   500-1 BC   1-500 AD   500-1000   1000-1500   1500-2000
Proto-Tocharian   Tocharian?           Tocharian A        
                Tocharian B        
            Tocharian C            
See also: