N.B. The email exchange below may have been edited, e.g. to remove content not essential to the main point(s) or to standardize English spelling/grammar.
Wow, thanks for doing that work. 75K is enough in principle. :-)
I agree that a typical publication could never hold all the data. On the other hand, scientifically I am irked that these guys' conclusions are not at all replicable, as it were. Correspondence sets between languages, I think, are the most compelling kind of linguistic facts. I have been turning over in my mind a database structure that would work for historical linguistics -- something that would give access to the data, as well as to the abstractions involved in historical research.
That would lower the entry barrier for guys like me, who are coming into it relatively late in the game. I also have a (not-so-mild) concern that the reconstructions might turn out differently depending on what languages were considered.
I don't suppose your reflex index is amenable to being converted into a single large table?
Re: your database structure -- Good thinking! I hope to hear more about this as you progress.
Re: your concern that the reconstructions might turn out differently depending on what languages were considered -- In part true, but less I think than you imagine. For example, an early heavy reliance on Sanskrit as the "oldest, least innovative" IE language skewed reconstructions of PIE in certain directions. However, the discovery and eventual accommodation of Hittite -- proving the reality of laryngeals, if not their detailed sound patterns -- followed later by reconstructions adding the contributions of typology (Glottalic Theory) and recognition that Sanskrit was more, not less innovative, seems to have firmed things up. If you haven't read [the English translation of] T.V. Gamkrelidze and V.V. Ivanov's Indo-European and the Indo-Europeans (1995, 2 vol's), by all means do so!
But as to Glottalic Theory -- whether or not it's valid -- its reconstructions map 1-1 with the more traditional kind. So computer software can transform one to the other and back without loss, which technically makes them notational variants. Yet GT does imply sound patterns that seem more reasonable, typologically speaking: it doesn't require PIE to have been odd.
Re: our reflex index being amenable to conversion into a single large table -- Yes, it (they) could be. But I don't know that the results would be what you're looking for. For example, where should ð and þ be sorted -- should they "line up" with TH? Where would an Old English G be sorted compared to G for Gothic and OHG (to say nothing of modern English)? And what about OE's letter C? I cite less than a handful of examples, among very many, to illustrate the point that Roman-alphabet spelling is not exactly transparent. Those early linguists worked really hard to make sense of it all.
And, by the way, they all considered a very wide range of IE languages, not just a mere handful. Let's see... Grimm's article in the Reader cites 21 languages in the Germanic, Baltic, Slavic, Hellenic, Italic, and Indic families; in Verner's article, the numbers rise to 30 languages in the Iranian, Slavic, Italic, Germanic, Hellenic, Baltic, and Indic families (plus Lapp loan words). And these, of course, are only languages cited in two articles -- not all the languages they studied. They worked without knowing that Hittite existed, and their reconstructions are mostly notational variants of the latest reconstructions...? I wouldn't sell these guys short!
Regards, J. S.