The University of Texas at Austin; College of Liberal Arts
Hans C. Boas, Director :: PCL 5.556, 1 University Station S5490 :: Austin, TX 78712 :: 512-471-4566
LRC Links: Home | About | Books Online | EIEOL | IE Doc. Center | IE Lexicon | IE Maps | IE Texts | Pub. Indices | SiteMap

Indo-European Lexicon

PIE Etyma and IE Reflexes

Jonathan Slocum

Our project goal is to produce a large, heavily indexed collection of Indo-European (IE) "reflex" words having their inferred etymological origins in the reconstructed ancestral language Proto-Indo-European (PIE). Re: "large," we anticipate many tens of thousands of reflex entries (the present size already exceeding 60,000). By "heavily indexed," we mean every unique reflex spelling in the collection can be indexed alphabetically within its language and family (there are already nearly 70,000 indexed reflex spellings in 100 IE languages/dialects), so that one can click on browser links to see all information associated with each reflex word and its relationship to other words. There will be no database search engine, nor any need for one.

What we originally had, as a lexicon, comprised nothing more than a collection of [most of] the main entries -- "etyma" -- in Julius Pokorny's massive Indogermanisches etymologisches Wörterbuch (IEW), along with our own glosses of their meanings and chains of cross-references derived from IEW. Our project's next step required, among other things, human editing of massive content assembled via software from electronic sources; additional content is now being acquired from selected print & online sources. For those who are interested, an online paper outlines the nature of our early work in this project.

Work on our IE Lexicon has become our primary focus. For most of our EIEOL lesson languages, we have attempted (or will attempt) to link relevant entries in their Base-Form Dictionaries to etyma in our Pokorny Master Collection. In addition, a large and growing fraction of the PIE etyma listed in our Pokorny collection are being linked to IE Reflex Pages that list words derived from those etyma: at present, nearly 200 ancient and modern Indo-European languages/dialects are represented by reflexes, the vast majority of which may be located alphabetically via our Language Index pages. Our lower-level Semantic Field Index pages may also be linked to IE Reflex Pages.

Pokorny Master Collection

As our current set of PIE etyma, we have selected 2,222 main entries from Pokorny's IEW; these are listed in a single large table (see 3 options below) in their IEW "alphabetic" order. Each entry that corresponds to a page listing IE reflexes thereof is linked to that page. At present, over 2/3 of our PIE entries link to IE Reflex Pages; this fraction will rise as our project proceeds.

There are great disparities in character set capabilities and font repertoires among personal computers in contemporary use. Unfortunately, support for Unicode and/or the collection of fonts installed on your personal computer cannot be detected by a web server! Accordingly, we have prepared three versions of each page, and you may select from among them based on your situation and experience --

IE Reflex Pages

Each IE reflex page shows a single PIE etymon with reflexes in IE languages/dialects. Each reflex is annotated with: part-of-speech and/or other grammatical feature(s); a short gloss which, especially for modern English reflexes, may be confined to the oldest sense; and one or more source citation(s). Again there are three versions of each etymon-with-reflexes page; each is linked, in chain-reference fashion, to nearby etyma (the previous and/or next extant reflex page in IEW order) --

Language Index

Our IE Language Index page lists many (though not all) individual Indo-European languages by family, from west to east; families are divided into groups, by age and/or geographic area (again, generally from west to east). For each IE "daughter language" that is represented by a sufficient number of reflex words derived from PIE etyma, a Reflex Index page will exist: each reflex index will list, in an alphabetic order suitable for the language family, all words in the language/dialect that appear on IE Reflex Pages. A word with multiple morphemes may have multiple links to IE reflex pages (e.g., the English noun werewolf 'man-wolf' derives from two PIE etyma). Also, since different words spelled the same way may derive from different PIE etyma, again there may be multiple links (e.g., the English verbs lie 'to recline' and lie 'to prevaricate' link to their different PIE etyma). And, obviously, many words in a given language (e.g. English brown, bruin, bear 'animal') may derive from a single PIE etymon.

Our one Language Index page is exclusively represented in ISO-8859-1; however, any reflex index pages that it may link to are replicated in 3 versions, just as with our Pokorny collection. That is, each reflex index page, representing derived words in a single IE language/dialect, is replicated in 3 character sets. See the Text Encodings section (below) for more information about these versions.

Semantic Field Index

Another feature of our collection is a Semantic Index to the Proto-Indo-European etyma listed in Pokorny, using a scheme developed by Carl Darling Buck (cf. A Dictionary of Selected Synonyms in the Principal Indo-European Languages, 1949). This semantic indexing scheme has been used by others and, while not perfect, seems adequate for our needs. We are in the process of making substantial additions to our lexical collection, adding "reflex" words derived from PIE etyma as listed by Pokorny; these can be reached via links on our lower-level Semantic Index pages. At present such links are mildly limited in number; but check back from time to time for new resources, as this work is proceeding swiftly.

Pages in our Semantic Field Index are entirely represented in ISO-8859-1; however, any IE Lexicon pages that they may link to are as usual replicated in 3 character-set versions. See the following section for more information about these versions.

Text Encodings

Owing to the requirements of the various non-Roman alphabets, we have adopted (two levels of) Unicode® to represent lexical material in scripts other than Roman (specifically, other than Latin-1), and for such material our HTML style sheet requests that browsers use one of the following Unicode-compliant fonts (listed here in alphabetical order):

What this means is that, in order to read words written in non-Roman scripts, you must employ Unicode-aware software, and it must have available a Unicode-compliant font -- such as one of those listed here. (At the time of writing these requirements are generally met, on Macintosh, only by OS X 10.2 or higher with a suitably advanced browser.) As we become aware of the existence and wide distribution of other large Unicode-compliant fonts, we will add them to our list.

For those whose browsers refuse to properly render Unicode characters, the editor attempts to replicate all materials using a standard Roman transliteration; obviously these are not sufficient to represent languages, such as Greek, that are not written in the so-called Roman alphabet, but they do offer a start. Anyone wishing to read online resources in their native scripts should acquire software fully supporting the Unicode standard; some supporting software is free, while some may require payment. The Linguistics Research Center and University of Texas cannot, and do not, make vendor recommendations.