The hard-core indo-europeanist may be interested in the TITUS Indo-European Resources project in Stuttgart (eventually in many languages, but currently only German and Spanish).
Okay, in 1786 Sir William Jones announced to the Asiatick Society of
Calcutta that Sanskrit had to be related to Greek and Latin, touching
off what would come to be known as the Neogrammarian move from
philology (the comparison of texts) to what we now consider
linguistics.
If you were to see a whole huge raft of cognates like the following,
you might come to the same conclusion (Avestan is an ancestor of
Persian, it's the language of the Zoroastrian texts):
Sanskrit Avestan Greek Latin Gothic English
pita pater pater fadar father
padam poda pedem fotu foot
bhratar phrater frater brothar brother
bharami barami phero fero baira bear
jivah jivo wiwos qius quick
('living')
sanah hano henee senex sinista senile
virah viro wir wair were(wolf)
('man')
tris tres thri three
deka decem taihun ten
satem he-katon centum hund(rath) hundred
Now, cognates mean "pair/set of words descended from a common
ancestor", not just words that happen to look like each other -- i.e.
"coffee" is not a cognate of kaffe, kahawa, cafe, etc.; that's an
instance of lots of borrowing of the same word by various languages.
What we're talking about here are historically related words. When we
know we've got cognates, we can talk about reconstruction.
Reconstruction revolves around the notion that sound change is
mechanical and exceptionless. If a proto-/p/ becomes /f/ in a
daughter language, it does so in regular fashion (that's the
heuristic you have to use). If there are exceptions, there must be
some other conditioning factor. Using this assumption, we can
conclude that some common ancestor produced Sanskrit /bh/, Avestan
/b/, Greek /ph/ (which is NOT /f/, it's aspirated /p/ at the stage
we're talking about), Latin /f/, and Germanic /b/. Now the question
is, what was that common ancestor?
The way we decide what segment must have been there in the proto-
language involves things we know independently about how sounds
behave, based partly on how sounds alternate synchronically in
languages (i.e. rules that operate to change one sound to another in
different contexts during a single stage of a language), partly on
what we know about acoustics and articulation of speech sounds (which
tells us what directionality is more or less likely), and partly on
experience. Pure gold for the historical linguist is ATTESTED
(written) ancient forms.
For instance, we know that the modern Romance languages (French,
Italian, Spanish, Portuguese, Romansch, Rumanian, etc.) are descended
from Latin. And we have lots of attested Latin to work with -- so we
have clear, unambiguous examples of how some sound changes have
worked. Likewise in other language families where ancient texts are
preserved (i.e. ancient religious texts in Semitic etc.) So we have
some real-life models on which to build our guesses.
So anyway, you reconstruct Proto-Indo-Iranian, and Proto-Germanic,
and Proto-Balto-Slavic, and Proto-Celtic, and ultimately you have a
pretty good idea of what -- on the basis of very rigorous analysis --
must have been the forms of certain words/roots in
Proto-Indo-European, before it split up. Now, this method does NOT
yield reliable results further back than about 10,000 years, because
beyond that, too much change has occurred for there to be any
recognizable remnants (that we can be sure about anyway) in attested
languages. (Pace Greenberg et al. who get lots of popular press.
One real triumph of this method of reconstruction was the Laryngeal
Hypothesis: it was known that there were some troublesome places in
Indo-European where the sound changes seemed not to be behaving in
their usual regular way; things were happening to vowels and
sometimes consonants that couldn't be easily explained based on what
we saw in the attested languages. Ferdinand de Saussure in the late
19th century said that there had to be a set of three segments in the
proto-language that had not survived in any of the daughter languages
-- he was fairly conservative about claiming what they must have
been, but he called them laryngeals and pointed out the precise
locations where they must have occurred. Many years later, when a
bunch of texts in Turkey were finally decoded and we knew we were
looking at the ancient Anatolian language Hittite, the oldest
attested Indo-European language -- voila: there were the laryngeals,
exactly where Saussure had predicted they must be just on the basis
of careful reconstruction.
There are other wrinkles, like you can do internal reconstruction
under some circumstances, and there are things other than sounds that
point to common ancestry (morphology, syntax, etc.). And semantic
change is a really neat thing to trace, though much slipperier than
sound change. But the general answer to your question is, we know
what we know about Proto-Indo-European because of the Comparative
Method, which arose in the 19th century and gives us a rigorous way
to compare sounds in daughter languages and determine what the
antecedent sounds must have been.
Oh, and the PIE reconstructions for the above words are (always
preceded by a star to show they're unattested, followed by a hyphen
if they're roots that get suffixed, and with hedges if a vowel or
something is uncertain -- consonants are much easier to reconstruct
than vowels -- oh yes and @ stands for schwa here):
*p@ter- father
*ped- foot
*bhrater- brother
*bher- carry
*gwei- live
*sen- old
*wi-ro- man (derived from *wei@- vital force)
*trei- three
*dekm- ten
*dkm-tom- hundred (derived from *dekm- ten)
Emile Benveniste, Indo-European Language and Society (London 1973). [Contains cultural as well as linguistic material.]
Carl D. Buck, A Dictionary of Selected Synoynms in the Principal Indo-European Langauges (Chicago 1949).[A wonderful old reference work. Lists and discusses synonyms and cognates for a variety of ideas (arranged topically) in over 30 Indo-European langauges. Now available in an affordable paperback reprint edition.]
N. E. Collinge, The Laws of Indo-European (Amsterdam 1985). [Catalogs real and alleged sound changes in IE families and languages. Fairly technical]
Antoine
Meillet
(trans. S. N. Rosenberg), The Indo-European Dialects
(Huntsville 1967). [This and the two following works are by one of the
great masters of the field, but are still relatively clear and
accessible.]
----- (trans. Gordon B. Ford, Jr.), The Comparative Method in
Historical Linguistics (Paris 1967).
-----, Introduction a l'etude comparative des langues
indo-europeennes (Paris 1937).
Holgar Pedersen, The Discovery of Language (Bloomington 1959). [Includes historical perspective on how these discoveries were made.]
Andrew Sihler, New Comparative Grammar of Greek and Latin (New York 1995).
Oswald Szemerenyi, Comparative-historical linguistics : Indo-European and Finno-Ugric (Amsterdam 1993).
Calvert Watkins (ed.), The American Heritage Dictionary of Indo-European Roots (Boston 1985). [Note the extensive introductory essay. Much of the same material can be found in the first and third editions of AHD.]
Werner Winter (ed.), Evidence for Laryngals (The Hague 1965). [Evidence from the various IE languages bearing on Saussure's laryngal theory cited above. Highly technical.]