While traditionally, typologists have relied primarily on grammatical descriptions, corpora have increasingly come into focus as a primary data type for cross-linguistic research (Levshina et al. ,2017; Koplenig et al. , 2017). In my talk, I will present some of my recent work on the TAM system of seven Oceanic languages of Melanesia. In this project, we work both with existing corpora from language documentation, and with comparable corpora that have been created using story-boards. I will present both the tagset we have designed to enrich our corpora and our stimuli for elicitation. I will highlight the following aspects of our research for which corpus-based comparative investigations have been crucial so far:
1. Syntactically complex but canonical structures are prone to being excluded from grammatical descriptions, but may be highly relevant for cross-linguistic research. So far, we have been investigating two such cases:
(a) Syntactically complex timitive structures in Vanuatu languages ( von Prince et al. , in progress).
(b) Syntactically complex expressions of possibility in West Ambrym languages and Saliba-Logea ( von Prince & Margetts, 2017).
2. The role of counterfactuality in the TAM systems of Oceanic languages: There has been much confusion around the terminology of TAM systems generally, and in Oceania in particular, with the result that grammatical descriptions are not a sufficient basis to reliably identify the semantic categories encoded by specific markers. In particular, the counterfactual domain has not been systematically differentiated from other irrealis domains. The corpus-based method is therefore unavoidable for testing our hypothesis that Oceanic languages differentiate counterfactual from other modal domains more systematically than previously analysed.
3. Especially in closely related languages, it is important to separate speaker-based variation from language-based variation. Corpora with speaker meta-data help to control for that.
Koplenig, Alexander, Meyer, Peter, Wolfer, Sascha, & Müller-Spitzer, Carolin. 2017. The statistical trade-off between word order and word structure – large-scale evidence for the principle of least effort. Plos one, 12(3).
Levshina, Natalia, Verkerk, Annemarie, & Moran, Steven. 2017. Comparative corpus linguistics: new perspectives and applications. Workshop proposal for the 51st Meeting of the SLE.
von Prince, Kilu, & Margetts, Anna. 2017 (July). Expressing possibility in Saliba-Logea and Daakaka. Talk given at APLL 9 in Paris. von Prince, Kilu, Krajinović, Ana, Guérin, Valérie, & Franjieh, Michael. in progress. It would not be good…: Canonical apprehensive structures in Vanuatu languages.