Longer wordlists for long-range linguistic comparison: principles, problems, perspectives.

DLCE Talk

Date: Apr 25, 2018
Time: 03:00 PM - 04:30 PM (Local Time Germany)
Speaker: George Starostin
Russian State University for the Humanities / Russian Presidential Academy, Moscow
Location: MPI SHH Jena
Room: Villa V14
Host: Department of Linguistic and Cultural Evolution
Contact: schueck@shh.mpg.de

Longer wordlists for long-range linguistic comparison: principles, problems, perspectives.

Despite occasional skepticism concerning the role of "wordlist linguistics" in language classification and evaluation of hypotheses of language relationship, over the past several decades it has proven to be an extremely useful tool for historical linguists. It may even be argued that the use of fixed Swadesh-type wordlists that provide a unified standard for language comparison across the world is the best way to get traditional experts and computational specialists to join forces in working out optimal historical models for language evolution and dispersal.

Nevertheless, while lexicostatistical methods have so far been extremely helpful in working out specific taxonomic details for previously well-established families (ranging from relatively young ones, such as Turkic, to pre-historic phyla such as Indo-European, Uralic, Austronesian, etc.), their role in testing controversial and challenging hypotheses on much more distant language relationship, reaching into the Neolithic period and beyond, remains limited and uncertain.

On one hand, it might seem that the role of fixed wordlists in establishing whether, for instance, Turkic, Mongolic, and Tungusic languages really go back to Proto-Altaic, or whether the hypothetical Proto-Altaic, together with Proto-Indo-European and Proto-Uralic, really goes back to "Proto-Nostratic", should be even greater than their role in working out the specifics for language families whose reality has been proven beyond reasonable doubt through the application of the Comparative Method. On the other hand, as we go deeper into the past and face the obstacle of ever-increasing cognate loss and semantic change (in addition to phonetic change), it becomes clear that the size and constituency of fixed wordlists for such situations is very much an open and debatable issue.

In my talk, I will discuss both the advantages and the disadvantages of using short lexicostatistical wordlists (from the standard 100-item Swadesh list to abridged 50- or 35-item variants) for such hypotheses as Altaic or Nostratic as opposed to huge etymological corpora, and then talk about the ongoing research on a larger size fixed wordlist (approximately 400 items) that could be used as a test base for long-range hypotheses, provided the comparison also incorporates recognition of potential semantic shifts (at least of the "trivial", i.e. typologically common, variety). The combination of a semantically well defined fixed wordlist with the use of low-level reconstructions, where "optimal candidates" for specific meanings are chosen based on a simple set of rules, promises to introduce a new degree of formality and clarity to comparisons that have, up to now, been all but impossible to verify and evaluate on a formal basis.