Glottobank is an international research consortium established to document and understand the world’s linguistic diversity. Glottobank team members are pursuing this goal on two fronts. First, we have established five global databases documenting variation in language structure (Grambank), lexicon (Lexibank), paradigm systems (Parabank), phonetic changes (Phonobank), and numerals (Numeralbank). In doing so, we seek to develop new methods in language documentation, compile data on the world’s languages and make this data accessible and useful. Second, we are developing methods to use this data to make inferences about human prehistory, relationships between languages and processes of language change.
Grambank is a database of structural (typological) features of language. It consists of 200 logically independent features (most of them binary) spanning all subdomains of morphosyntax. The Grambank feature questionnaire has been filled in, based on reference grammars, for over 500 languages. The aim is to eventually reach as many as 3,000 languages. The database can be used to investigate deep language prehistory, the geographical-distribution of features, language universals and the functional interaction of structural features.
Lexibank is a public database and repository for lexical data from the languages of the world. Currently, Lexibank contains lexemes and cognate judgments from ~2500 languages spanning Africa, Europe, Asia, the Pacific, and the Americas. The database will be used to refine cognate judgments, infer language relationships, construct language phylogenies, test hypotheses about deep language history, investigate factors that affect the mode and tempo of language evolution, model sound change, and facilitate quantitative comparisons with other types of linguistic data. The initial focus of Lexibank will be on compiling basic or core vocabulary, but ultimately the database will be expanded to include a full range of lexicon from all the world’s languages.
Parabank is a large database of selected paradigmatic structures found in the world’s languages, focusing on the patterning of formal similarities and identities (or syncretisms) between cells in these paradigms (cf I vs me but you vs you). It is motivated by the observation that different languages and language families have significantly different patterns in their syncretisms and that at least some of these are stable through time. In addition, information arranged in matrices gains additional power because of the large number of values that can be calculated by comparing every cell with every other cell.
Because the paradigms we explore are ubiquitous across the world’s languages, our working hypothesis is that paradigmatic syncretisms can provide significant signal to linguistic relationships in deep time, and the database is designed to allow the systematic exploration of morphosyntactic features by linguistic typologists and evolutionary biologists. Additionally, Parabank will be an important resource to assist in the identification and quantification of some of the important mechanisms in how the design space of language evolves. Initially, the database will assemble paradigms of free pronouns, verb agreement, and a subset of kin terms, with subsequent plans to incorporate demonstratives/interrogatives/indefinite pronouns/negative pronouns, numeral systems, and other promising linguistic subsystems with paradigmatic structure.
Parabank will be led by Nick Evans, Simon Greenhill and Kyla Quinn, all based at the Australian Research Council Centre of Excellence for the Dynamics of Language (CoEDL), at the Australian National University (ANU), but welcomes the participation of any interested researcher. Funding will primarily come from the CoEDL.
Phonobank aims to establish a cross-linguistic comparative database of sound patterns, sound correspondences, and sound shifts. Our starting point is collections of multiple phonetic alignments of cognate sets in language families. All sounds are linked to a cross-linguistic phonetic alphabet that provides distinctive features and segment descriptions. The ultimate goals of the database are to support the computational linguistic comparison of word forms and to serve as a basis for improving the methods of computer assisted cognate detection, sound reconstruction and building linguistic phylogenies from sound correspondences.
Based on the data of the long-running project "Numeral Systems of the World's Languages" led by Eugene Chan, Numeralbank presents the numeral systems of about 4,200 languages of the world as a computer-readable database. From 2006 to May 2015 it was supported and supervised by the former Department of Linguistics at the Max Planck Institute for evolutionary Anthropology (MPI-EVA) in Leipzig/Germany led by Prof. Bernard Comrie. From the beginning of June 2015, the new host of this project is the Department of Cultural and Linguistic Evolution at the Max Planck Institute for Human History (MPI-SHH) in Jena/Germany led by Prof. Russell Gray. The computer scientist Hans-Jörg Bibiko at the MPI-SHH, formerly at the MPI-EVA, converted these data and is now supervising the database Numeralbank.