Chapter Black box approaches to genealogical classification and their shortcomings

Prokić, Jelena; Moran, Steven

doi:10.1515/9783110305258.429

Download

PDF Viewer

Author(s)

Prokić, Jelena

Moran, Steven

Contributor(s)

Saxena, Anju (editor)

Borin, Lars (editor)

Collection

European Research Council (ERC); EU collection

Language

English

Show full item record

Abstract

In the past 20 years, the application of quantitative methods in historical linguistics has received a lot of attention. Traditional historical linguistics relies on the comparative method in order to determine the genealogical related-ness of languages. More recent quantitative approaches attempt to automate this process, either by developing computational tools that complement the comparative method (Steiner et al. 2010) or by applying fully automatized methods that take into account very limited or no linguistic knowledge, e.g. the Levenshtein approach. The Levenshtein method has been extensively used in dialectometry to measure the distances between various dialects (Kessler 1995; Heeringa 2004; Nerbonne 1996). It has also been frequently used to analyze the relatedness between languages, such as Indo-European (Serva and Petroni 2008; Blanchard et al. 2010), Austronesian (Petroni and Serva 2008), and a very large sample of 3002 languages (Holman 2010). In this paper we will examine the performance of the Levenshtein distance against n-gram models and a zipping approach by applying these methods to the same set of language data.