Recent Focused Series »

Indo-European Origins
Northern California
The Caucasus
Imaginary Geography
Home » Historical Geography, Indo-European Origins, Linguistic Geography

Absolute Dating and the Romance Problems on the Bouckaert/Atkinson Model

Submitted by on September 11, 2012 – 6:34 pm 24 Comments |  
As noted at the end of the previous GeoCurrents post, Bouckaert et al.’s dating of the Romani split off the rest of the Indic tree at 1500 BCE (3,500 years ago) is a gross miscalculation. Linguistic evidence involving the grammatical gender system of Romani shows that the split must have occurred some 2,500 years later. But there is a larger issue here concerning the dates for the various splits on the Indo-European family tree. The dating procedure employed by Bouckaert et al. is based on two essential assumptions: that the rate of loss (or gain) of cognates is steady, and that certain key splits—the ones they have chosen as data—have been incontestably dated through historical records. But both of these assumptions are blatantly wrong.

The idea that replacements in the core vocabulary of any language happen at a regular pace was first explored in the 1950s by Morris Swadesh. Swadesh thought that such temporal consistency would allow the dating of language divergence, and he called the resulting study glottochronology. Parallels are often drawn between glottochronology and an absolute dating technique used in evolutionary biology that is known as “molecular clock”. However, the latter technique was not established until the 1960s. Instead, the idea of a constant rate of lexical change must have been inspired by earlier work on radioactive decay. Swadesh compiled a list of lexical items that he assumed to be resistant against borrowing; the original list had 200 words, but the most widely used version in use today has only 100 items. Bouckaert’s team, however, went back to the original 200-item list.

According to Swadesh, the rate of change in that most conservative part of the lexicon is about 14% per millennium. However, unlike sub-atomic changes or mutations in genes, which happen at random, changes in language are often precipitated by extra-linguistic, social factors. As a result, lexical replacements in the core vocabulary as elsewhere can happen in waves, and as a result the rate of replacement may differ radically from one language to another. For example, Bergsland & Vogt (1962) demonstrated convincingly a “rate of change” in conservative Icelandic at around 4% per millennium, as compared to the 20% rate found in closely related Norwegian. Russian linguist Sergei Starostin, however, showed that if loanwords are eliminated from the calculations, the rate of change for Norwegian comes down to the expected rate of 5–6 “native” replacements per millennium. Yet even when it comes to the core vocabulary—the Swadesh list of 100 words assumed to be the least likely to be replaced—borrowing remains significant: it has been shown that English has borrowed 31 items in the 100-word Swadesh list, Turkish 22, French 27, and Albanian a whopping 41 (see McMahon 2010). While separating the husk of loanwords from the grain of “native” vocabulary is crucial to making glottochronology feasible, the procedure is never easy and is sometimes all but impossible, especially for languages whose history is known imperfectly (computational methods for identifying loanwords are discussed in an earlier GeoCurrents post).

Bouckaert et al., follow current lexicostatistical norms in attempting to alleviate the problems associated with inconstant lexical change by adopting a calibration technique based on known dates.* Instead of calculating the dates of all divergence points on a language tree based on some presupposed constant rate of change (e.g. 5% or 14% per millennium) from the present day backward in time, such studies peg some of the splits to clearly dated historical events. After, having thus put such scale on the family tree, they calculate the dates of other splits in relation to those already known.

Establishing such calibration points, however, generally proves to be more complicated than it sounds. As it turns out, Bouckaert et al. provide erroneous dates for numerous divergence points. We have already pointed out the miscalculation of the date of the Romani split by some 2,500 years. Another wildly misplaced divergence point concerns the split of the Insular Celtic languages into the Goidelic and Brittonic branches (the former including Irish and Scottish Gaelic, and the latter including Welsh, Cornish, and Breton). The trees given in the authors’ Supplementary Materials (Fig. S1 and S2) indicate this split as occurring about 900 BCE (2,900 years ago); strangely enough, the map frame for 692 BCE from the animation on the authors’ website does not yet show this split, although the front of the Celtic advance is shown as already firmly established in southern British Isles. Putting aside this inconsistency between the different multimedia presentations of the study’s results—one of many, as discussed in an earlier GeoCurrents post—this date is too early as well; according to archeological record, the ancestors of the various Celtic groups were living at the time in Austria and Bavaria. It would have helped if the authors had included Continental Celtic varieties such as Gaulish into their sample. As mentioned in an earlier GeoCurrents post, this omission throws off the dates of the various divergence points inside the Celtic cluster by centuries.

Some of the dating errors are apparently due to assuming incorrect calibration points. A prime example is the alleged separation of Romanian from the remaining Romance languages, placed by the authors in 270 CE, when Dacia ceased to be part of the Roman Empire. (Once again, the various presentations of their results do not coincide: the trees in the Supplementary Materials indicate that the slit occurred about 2,000 years ago, and their animated map frame for 18 CE indicates the split via the thick green lines beginning to diverge in the area of Rome). However, as pointed out by a LanguageHat reader Etienne, the 270 CE date

“is not an established datum. The issue as to whether the Romanian language directly stems from the Latin of the Roman province of Dacia or from the Latin of a later group of migrants (whose point of origin must have been South of the Danube) has been fiercely debated over several generations. If the latter scenario (a later migration to Dacia) is true, then of course the date of separation between Romanian and its Romance sisters must be later, indeed perhaps much later, than 270 CE.”

One argument in favor of the latter scenario is that the Romans occupied Dacia for only about 170 years before abandoning it; this does not look like long enough period for Latin to have taken hold so firmly. In this respect, Romania can be compared to Britain, which was occupied by Rome for much longer but did not ultimately become Romance-speaking. Because the first preserved Romanian text dates only from the 16th century, we do not have direct evidence as to its early stages. But indirect evidence in support of the later-arrival theory can be pieced together based on Romanian dialects. As Etienne further explains (in personal communication), “the core problem relates to Romanian dialects: they are much too homogeneous”, as compared, for example, to the dialects of northern France: across northern France mutual intelligibility between non-adjacent dialects was until recently practically nil, whereas in Romania mutual intelligibility across dialects is far greater. Given that northern France and Romania are about the same size, one would expect the dialect differentiation in the two areas to be roughly comparable (or perhaps one would anticipate finding even more homogeneity in northern France, which has no large mountain chain like the Carpathians). Of course, the surprising uniformity of Romanian dialects is easily explainable if the separation of Romanian postdates rather than predates the separation of French from the original Latin.

It is also significant that the Roman Province of Dacia corresponded to less than half the territory of present-day Romania: for example, the Bucharest region, whose dialect is the basis of standard Romanian, never belonged to the Roman Empire. If Romanian goes back to the Latin of Roman Dacia, as the 270 CE date of the separation presupposes, it must have spread outside of the Empire into lands that were not part of Dacia. But if this is the case, we would expect to find greater dialectal differentiation in the “old Romanian territories” compared to the lands in which the language spread later. Sadly for Bouckaert et al., this expectation runs contrary to fact, as not only are Romanian dialects very homogeneous, they are equally homogeneous across all parts of the country. Such basic facts of Romanian dialectology, however, make perfect sense if Romanian spread over present-day Romania and Moldova long after the fall of the Roman Empire from some other place, most likely from south of the Danube. As is the case for the early split of Romani, the incorrect classification of Slavic languages, and the incorrect depiction of a closer tie of Frisian to Flemish and Dutch than to English (discussed in an earlier GeoCurrents post), Bouckaert et al.’s false assumption regarding Romanian probably derives from the fact that although it is clearly a Romance language, Romanian nonetheless has a heavy Slavic influence in the lexis, including items in the Swadesh 200-word list, such as trăi ‘to live’ (from the Slavic trajati ‘to last, continue’), lovi ‘to hunt’ (from the Slavic loviti ‘to hunt, chase’), and zăpadă ‘snow’ (from the Slavic zapadati ‘to fall’).

Quentin D. Atkinson, one of the authors of Bouckaert et al., claimed in a podcast interview with Science’s Isabelle Boni that they “were able to find 14 different known divergence times on the tree, which we used to calibrate the rate of change”. Given the fallacy of the one concerning Romanian, the others must be regarded with skepticism, and as a result they will be examined in later posts.

More generally, Bouckaert et al. appear to make an assumption—false, as I argue below—that the split of the Romance branch into individual languages coincides with the fall of the Western Roman Empire. In the case of Romanian, this date is probably too early, but in the case of the split of the Italo-Western Romance branch (after Sardinian and Romanian split off) into individual languages, the date is probably too late. Vulgar Latin, the actual speech of the common people during the late Roman Empire, exhibited pronounced differences over both time and space. While mentioned by the ancients, Vulgar Latin was never transcribed or described in detail. Nonetheless, it is attested in Late Latin texts, especially those that condemn linguistic “errors” in spoken Latin, such as Appendix Probi, a work written in the third or fourth century CE. Moreover, some literary works written in a lower register of Latin—especially the dialogues of the comedies of Plautus and Terence, many of whose characters were slaves—provide a glimpse into the world of Vulgar Latin in the classical period.

Another important source of evidence about Vulgar Latin comes from inscriptions, such as the ones discovered in Pompei, which were buried in volcanic ash in 79 CE—nearly 400 years before the fall of the Roman Empire. In those inscriptions, we find that the Latin ending -t, found on verbs in the third person singular, is often left out. It is assumed that its variable presence in spelling means that in the spoken Latin of Pompei, the sound had already been dropped. As it happens this change corresponds to the history of the Italian language: where Latin had venit for ‘he/she comes’, Italian today has viene. However, this change, well-attested though it is in Pompei, did not affect all Romance languages simultaneously. To this day, some varieties of Sardinian still keep a final -t in these and other verbs. French, in some ways the most innovative Romance language, kept the -t in vient (as the spelling indicates), pronounced well into the 16th century. Indeed it is still pronounced under some circumstances when a following word begins with a vowel (thus, the -t is pronounced in Vient-il? ‘Is he/she coming?’). Finally, in the earliest known texts in a Romance language from the Iberian peninsula (short poems in Arabic script) we find a final -t (spelled with the Arabic letter dad) on third-person verbs. Thus, we have good evidence that the loss of the third person-marking -t that took place in the Latin of Italy had not spread to the Latin of Sardinia, Gaul (today France) or the Iberian Peninsula (Spain and Portugal) before the fall of the Empire. This clearly indicates that the differentiation of Latin into the various Romance languages had begun before the fall of the Western Roman Empire.

One of the main forces behind the diversification of Vulgar Latin was probably the Roman army, as conquering soldiers were the first ones to bring Latin to the far corners of the growing Empire. Joseph B. Solodow in Latin Alive (p. 36) recounts a story from the historian Tacitus, describing

“a vivid scene during a military campaign. Two brothers belonging to a Germanic tribe stand on opposite banks of a river and spiritedly debate the proper stance to be taken towards the Romans… And when [Tacitus] mentions that the debate was conducted mostly in Latin and explains that the brothers had learned the language through military service with the Roman army, he indicates one way by which familiarity with Latin spread among native people.”

The Romans encouraged Latin-based education in the provinces, especially for the children of the elites; “Britain, Gaul, Spain, and north Africa were soon producing distinguished orators and writers, teachers and scholars” (Solodow, p. 37). Latin was viewed as the cement holding the empire together, “a peaceful bond” (Augustine, On the City of God, 19.7). As Latin gained ground throughout the empire, first in its cities and then in the countryside, a significant number of the conquered people became bilingual. But such massive learning of Latin as a second language resulted in non-native speakers introducing patterns and constructions from their native tongues. Children who grew up in such linguistically mixed communities incorporated some of the non-Latin patterns into their otherwise Latin speech. Such substratum influences are a well-known vehicle of language change, although individual instances are often difficult to prove (see McMahon 1994, pp. 221-222 for further discussion and examples). Changes were also happening in variants of Vulgar Latin for other reasons as well, and gradually a plethora of dialects emerged.

It should also be noted that Bouckaert et al.’s graphic representation of language divergence through a family tree superimposed on the animated map is misleading as well. The authors state in the Supplementary Materials that their “phylogeographic model allows [them] to infer the location of ancestral language divergence events corresponding to the root and internal nodes of the Indo-European family tree” (p. 20). However, the resulting depictions are bizarre, as they depict a given language group splitting in a particular place and then the daughter languages moving into new areas. For example, in the case of the Romanian splitting off the Romance tree, the “location of ancestral language divergence events” is in the area of Rome, creating the impression that Romanian diverged from Latin in the center of the Empire, with speakers of (proto-) Romanian subsequently moving eastwards. Numerous other divergence events are mapped in equally unlikely locales.

In a typical situation, as a language expands geographically it simultaneously diverges linguistically, due both to drift and to contact with different neighboring languages. The diversification of English into British, Scottish, American, Australian, and other varieties worldwide—not always easily mutually intelligible—illustrates this process perfectly, as does the expansion of Indo-European languages in general from their relatively small homeland area. In other cases, however, languages (or dialects) can diversify without spatial expansion. A perfect example involves the diversity of Modern Irish dialects, which largely developed due to the shrinking of Irish-speaking communities and the lack, since the mid-17th century, of a formal authority (e.g. language academy) or a social body responsible for “managing the linguistic garden, killing the weeds of linguistic creativity” (as James McCloskey beautifully described it in his recent presentation on “The Dialect Geography of Irish Nonfinite Clauses” at UC-Berkeley). As a result, significant variation has developed over the past several centuries among dialects spoken from Donegal in the northwest and to County Waterford in the south. The differences among dialects concern not only pronunciation but also grammatical structures; mutual intelligibility is now so low that McCloskey was accused of “speaking French” when he used his native Donegal dialect in southern Ireland!



*Unlike glottochronology, which assumes a constant rate of change for basic vocabulary items and attempts to estimate the dates of divergence from a common proto-language, lexicostatistics makes use of the comparative method (though without reconstructing a proto-language), has a wider range of applications, and need not rely on the assumption of a constant rate of change. A leading practitioner of lexicostatistics was Isidore Dyen—my former landlord, of all things!—who used these techniques to classify Austronesian and Indo-European languages.



Bergsland, Knut & Hans Vogt (1962) On the validity of glottochronology. Current Anthropology 3: 115–153.

McMahon, April (1994) Understanding Language Change. Cambridge University Press.

McMahon, April (2010) “Computational Models and Language Contact”. In Raymond Hickey (ed.) The Handbook of Language Contact (Blackwell Handbooks in Linguistics). Pp. 128–147. Wiley-Blackwell.

Solodow, Joseph B. (2010) Latin Alive: The Survival of Latin in English and the Romance Languages [Hardcover]. Cambridge University Press.


Previous Post
Next Post

Subscribe For Updates

It would be a pleasure to have you back on GeoCurrents in the future. You can sign up for email updates or follow our RSS Feed, Facebook, or Twitter for notifications of each new post:

Commenting Guidelines: GeoCurrents is a forum for the respectful exchange of ideas, and loaded political commentary can detract from that. We ask that you as a reader keep this in mind when sharing your thoughts in the comments below.

  • C. M.

    Just pointing out that “a lovi” in Romanian means “to hit”, “to strike”, and not “to hunt”. That’s not to say that Romanian doesn’t indeed have many Slavic loanwords in its core vocabulary.

    • Asya Pereltsvaig

      Thank you for the correction, C.M.! Indeed, “a lovi” = “to hit”. A typo on my part, as “to hunt” and “to hit” are list immediately next to each other on the Swadesh 200-word list. And of course “a lovi” does come from the Slavic verb “loviti” which means “to hunt, chase, or catch” (which only reinforced my typing the wrong thing).

      By the way, meaning change is not that odd: the English verb “catch” too comes from the Norman-French verb meaning “to chase” (cf. the French “chasser”) and ultimately from the Latin verb meaning “to take, hold”. In fact, “catch” and “chase” form a doublet, with the initial k-ch/sh alternation, not unlike “candle” and “chandelier” (and for the same etymological reasons), which I discussed in a recent post:

      Altogether, “hunt”, “chase”, “take”, “hold”, “catch”, and “hit” seem to form a cluster of meanings, with verbs cross-linguistically easily switching meanings within this cluster.

  • Paul

    Ms Pereltsvaig,

    I would start by saying that I am not a linguist, just somebody who studied the history of Romania. Indeed
    the geographical origins of the Romanians is still a controversial issue but in
    order to have a better understanding of the issue we should also look at other
    aspects, not only at the linguistic ones. I just have a few comments.

    Regarding the separation
    of Romanian from the remaining Romance languages in 270 AD and the short timespan of the Roman colonization of Dacia.
    Though not a linguist it seems ridiculous to me that somebody would try to give
    a specific year for the birth or ‘separation’ of a language. Romanian historians
    argue that the Romanization process began before the Roman conquest as such
    (106 AD) and continued after the Aurelian retreat from Dacia (271 AD). Dacia
    was in contact with the Roman Empire before it was conquered and parts of it
    remained under Roman control after the retreat of the Roman administration.

    The comparison of Dacia with Britain
    and the north of France seems simplistic to me. All three regions have their
    own specificity and historical developments so I’m not that convinced that we
    should draw universal conclusions based on one case or the other. Britain
    suffered the Anglo-Saxon invasion and that element imposed itself linguistically.
    Dacia, thought traversed by various migratory peoples, didn’t suffer the same

    “Given that northern France and
    Romania are about the same size, one would expect the dialect differentiation
    in the two areas to be roughly comparable”

    “the Roman Province of Dacia corresponded to less
    than half the territory of present-day Romania”

    Indeed, Romanian is rather homogenous but
    I don’t see why the case of northern France should be some sort of model. First,
    the map you provided is inexact, at least in what concerns the south-eastern
    border of the province. The border of Roman Dacia was further east. Then, though
    most of the territory of the Romanian
    Plain was not included in the province, the area lay under the control of the Roman
    empire; e.g. Roman roads traversed it

    Moreover, the Romanian spoken inside the
    former border of the Roman province of Dacia, preserved more Latin substance
    than in the rest of the country. See some examples here

    Romanians Transylvania is in a way the cradle of their nation. Here lay the
    political center of Dacia, and this is the region more intensely colonized by
    the Romans. The Romanian states outside the Carpathian arch (Walachia and Moldova) were founded in the 14th
    century and benefited from political leaders and a population influx from
    What is now
    the east and south east of Romania were populated later, in the Middle Ages. Dobruja,
    the region by the sea was populated by Romanians after it was acquired from the
    Ottomans in 1878. The Baragan plain was almost empty until it was populated by
    the Communist regime in the 1950s.

    This is I
    think the explanation why Romanian is rather homogenous, large areas of the
    country (and in what is now the independent state of Moldova) were populated
    rather late, in the Middle Ages or even in the 20th century.

    “for example, the Bucharest region,
    whose dialect is the basis of standard Romanian, never belonged to the Roman
    The basis
    of standard Romanian is the Wallachian subdialect not Bucharest per se; by European standards Bucharest
    is a rather new city in this area so don’t give it too much of a linguistic
    importance. It started to matter for the rest of what is now Romania only from the 18-19th
    centuries onwards.

    • Asya Pereltsvaig

      Thank you for your comments, Paul! You are absolutely right that language divergence is not a momentary but rather a gradual process. Therefore, historical dates are merely convenient time-posts. In some cases, one can relate a specific case of linguistic divergence to a historical event and in others not. Romanian divergence is the first kind of case.

      The comparison with northern France is just one example. Northern Italy with its plethora of local dialects and even languages is another example. All in all, given the size of Romania’s territory, we expect more diversity in the language than we actually observe.

      “Britain suffered the Anglo-Saxon invasion and that element imposed itself linguistically. Dacia, thought traversed by various migratory peoples, didn’t suffer the same experience.’ — this is very true, but why?

      While it is true that late arrival of Latin-based language to many parts of Romania may explain some of its homogeneity, those territories for the most part were not empty prior to their Romanization? Why aren’t we seeing more substratum influences on different dialects of Romanian?

      • Paul

        Actually a lot of those territories were rather empty. The flat areas between the Carpathian mountains, the Black Sea and the Danube (southern Moldova, eastern Wallachia) were a highway for the various groups migrating or attacking from the east (Mongols, Tatars etc) so they were not the best places for settlement in late Antiquity/early Middle Ages.

        They were settled later, after the Romanians managed to build their states and even then there were often pillaged. Dobruja was indeed settled by the XIX century but there were migrations and population exchanges (the Turks/Tatars went to the Ottoman empire, Bulgarians to Bulgaria, Romanians settled in).
        Archaeological remains and a few historical data show that the first polities of this population speaking a Romance language seem to have formed in the mountain valleys and in the depressions on both sides of the Carpathians. It was just safer to live close to the mountains, protected by the woods than in flatter areas.

        Now, this offers only part of the story, I wish I had all the answers. I don’t, there are many unanswered questions, such as the one you put. Unfortunately there is an almost 1.000 years gap in the history of this Romance language speaking people.

        • Asya Pereltsvaig

          Thank you for adding this information, Paul! It is indeed an important 1,000 year gap that we need to fill out as best we can.

    • Messi Veteq

      I am sorry to contradict, but the consequences of the article just proves that the Transylvanian cradle of the
      Romanians is rather a myth… The real cradle must have been south of
      the Danube (present day Bulgaria). In fact nothing proves the
      Transylvanian theory (nor linguistic, nor written, nor architectural,
      nor genetic, nor hystorical proof), only the Romanian historians’
      opinion (from the 19th century).

      Every other theories says that it was rather inhabited by Huns, Avars,
      Hungarians, Slavic settlements and after the invasion of the Kingdom of
      Hungary by the Mongols (the half of her population was annihilated) in 1241, the
      Romanians begin to immigrate into Transylvania.

      Poland and Croatia-Hungary was their main target, not Wallachia – where -
      at this period were nearly almost nothing to pillage by the Mongolian

  • Chris

    Correct me if I’ve misunderstood something here, but I cannot see how the following quote from p.7 of their supplementary is consistent with your claim that:

    “The dating procedure employed by Bouckaert et al. is based on two essential assumptions: that the rate of loss (or gain) of cognates is steady…”


    “Since languages may not evolve at the same rate at every location through
    time, we compare the t of a strict clock model (which assumes a constant
    rate of cognate replacement) to a relaxed clock model (46 ) that allows for rate
    heterogeneity among lineages. To calculate the transition probability of going
    to xk from the parent of xk in time t under an uncorrelated relaxed clock model,
    Prelaxed(xkjx(k); t). The relaxed clock accommodates rate heterogeneity among
    branches with a rate distribution P(r). By relaxing the clock assumption in
    this way, we can accommodate variation in rate of cognate replacement through
    time, estimating the degree to which rates vary from the data itself. Figure S1
    summarizes how inferred rates of cognate replacement vary across branches in
    the tree.”

    • Asya Pereltsvaig

      As far as I understand, they *relax* the clock assumption, but do not abandon it entirely.

      • Chris

        But what’s the objection to a relaxed clock? Are you saying it’s unreasonable to assume that, in general, the fewer cognates shared between two languages, the less closely they are likely to be related? As long as you understand this as a probabilistic statement (which they do) it seems absurd to object to it. Overall, my impression of your and Martin’s discussion of this paper is that, while in many respects it’s very good, you overplay the extent to which Atkinson et al. consider their tree to be the definitive fact of the matter. All they claim about it is that it’s the most likely tree, given their input data. And they’re entirely willing in principle to correct and expand the input data.

        What would be really helpful to someone like me is some informed commentary on whether their method is actually any good *on its own terms*. You criticize their tree with respect to Romani etc., but you give no evidence to suggest that correcting these errors would substantially alter their basic result.

        • Asya Pereltsvaig

          Thank you for your comments, Chris.

          Several points need to be made here:

          1) If a model provides the same result regardless of the input, it’s unfalsifiable and therefore rather not scientific, wouldn’t you agree?

          2) In general, the assumption that the fewer cognates shared between two languages, the less closely they are likely to be related, understood in probabilistic terms, is correct. However, this only works if true cognates are identified. A number of errors (vis-a-vis what is known independently about IE languages), such as the Polish-Belorussian, Romani, Romanian, etc., indicate that borrowings have not been correctly put aside.

          3) We will have an additional post going in detail into what we see as incorrect assumptions about language spread and change soon, which will essentially provide the criticisms of their model “on its own terms” — stay tuned!

          • Chris

            Thanks Asya. I will just say regarding (1) that demonstrating that their model produces the same result *regardless* of the input is exactly what you haven’t done! You have shown that their input is erroneous in several respects. You haven’t shown whether correcting this input would alter their basic finding or not. And you haven’t shown whether or not *any* amount of changes to the input would alter their basic finding, and if so, to what extent. But this is exactly what I would like to see. I’m not in any way blaming you for not doing this, because I think it’s very hard to do, and anyway, I think Atkinson et al. should have gone some way to doing it themselves – it would have made their paper much more credible. But I do think you need to be a little more careful about how devastating you claim specific failings in their data are for their overall project.

          • Asya Pereltsvaig

            Regarding (1), I thought you were suggesting that their model would give the same result regardless of input. Such arguments have been put forward *in their defense* elsewhere in this debate, but the argument is devastating to their model rather than supportive, which is what I wanted to point out.

            And I agree that they could do more replication & verification themselves. The problem with others trying to replicate their results is that they do not define cognates (and elsewhere in the discussion it turned out that they talk about “look-alikes”, not true cognates). Whether one takes 100-word Swadesh list or 200-word Swadesh list, it would probably make a difference too.

            So I don’t believe that, as others have suggested, with different input their model would produce the same result, but I don’t see what justifies the choices about the input that they’ve made and hence the result that the model produced.

  • Emil Perhinschi

    ““the core problem relates to Romanian dialects: they are much too homogeneous”

    of course they are, the territories in the East and South of Rumania were repopulated, after the Mongolian invasion, starting with the XIVth century, with colonists arriving from Transylvania; this is pretty well documented in the sources from XIVth and XVth century, but there are later documents about settlers being given land in the sparsely populated areas in the East … not all the colonists were speaking a Romance language, and some areas were found to be still inhabited when the Mongols/Tartars were driven out, but the bulk came from Transylvania.

    • Asya Pereltsvaig

      Dear Emil,
      thank you for your comment. While the facts that you cite are correct, they don’t entirely explain the linguistic situation (see my response to Chris below).

  • Carl Edlund Anderson

    While I would agree that there a problematic assumptions and methodology in the Bouckaert et al. article, with regard to Celtic languages in Britain, there has been for some time and I think remains a lot of debate amongst Celticists over when and how these first appeared there. Of course, assuming Celtic languages originated in, and then spread to spread to Britain from, the region associated with the Halstatt culture would largely rule out the possibility of a Goidelic-Brittonic split as early as 900 BCE. On the other hand, if Celtic languages had an older development associated with the Atlantic Bronze Age complex, then the possibility of an tongue ancestral to later Goidelic and Brittonic being spoken in Britain at 900 BCE would be perfectly acceptable. Of course … that’s not at all the same thing as positing a _split_ between Goidelic and Brittonic _at_ or around 900 BCE, of course … but I think we might look for other reasons to question such a date for a split beyond an objection that Celtic was not yet spoken in Britain at that date (which might or might not be the case).

    • Asya Pereltsvaig

      Dear Carl,
      thank you for your comment. As far as I know, there are no linguistic reasons to postulate such an early split between Goidelic and Brittonic (besides the work of Atkinson and colleagues themselves). As a matter of fact, all the support for the earlier split (and for early arrival of Celts to Britain) that I’ve ever seen comes from related research, which naturally supports… itself! If you know of any archeological evidence for Celtic presence in Britain in that early era, contradicting the Halstatt theory, I’d love to hear about that.

      • Carl Edlund Anderson

        I think the main exponents of “pre-Halstatt” Celtic at present are Barry Cunliffe and John Koch, who also specifically would link to link the emergence of Celtic with elite/trade networks in Atlantic Europe in the Bronze Age. I do not this this is a consensus or even majority view, but it is at least a hypothesis that is being seriously debated amongst Celticists. Nor is it an entirely new idea; proposals for “Bronze Age British Celts” go back to at least John Rhys in the late 19th century, though of course scholarship has moved on a lot since then — and the idea lost favor with the rise of the “Celtic Halstatt and/or La Tene” hypotheses. Nevertheless, reasonable (if still not widely accepted) proposals for “Bronze Age Celticization” in the British Isles reappeared in the ’60s and ’70s (e.g. Mykles Dillon) and the ’80s (e.g. Gearoid MacEoin, John Koch); there is a useful summary of all this in Waddell, J., “The Question of the Celticization of Ireland”, Emania, 9 (1991), 5-16, where Waddell himself suggest Late Bronze Age horizons for Celtic in Britain. (The same issue of Emania also has articles from John Koch and JP Mallory, though I don’t remember the details of their pieces offhand). Some summary of more recent work can be found in Henderson’s

        The Atlantic Iron Age (2007). Cunliffe and Koch’s Celtic from the West book focuses on such ideas, but includes some pieces criticizing them as well, I think. Given that David Anthony would put the “split” for a Pre-Celtic speech community from the wider Late PIE community c. 3000 BC, I think this would allow for the emergence of a Proto-Celtic speech community by 1200 BC (whether associated with Atlantic Bronze Age cultures, including those of Britain cf. Cunliffe and Koch, or with pre-Halstatt Urnfield cultures along more traditional lines). Still, in any case …. As you say, even accepting (quite provisionally!) a Celtic speech community in Britain by the Late Bronze Age, there is still probably no good reason to assume a split between Goidelic and Brittonic at such an early date! As far as I am aware, the earliest available evidence about Primitive Irish and Roman-era Brittonic suggests that even at their relatively late dates (in the early centuries AD) the Goidelic and Brittonic groups were not so very different. Thurneysen (Grammar of Old Irish, 1946) assumed that Goidelic and Brittonic would have been mutually comprehensible as late as the 1st century BC, and I am not sure that much has happened since that leads Celticists to assume much differently now. So Bouckaert et al.’s date of c. 900 BC for a Goidelic-Brittonic split does seem difficult to reconcile with branches of Celtic that still seem surprisingly close even 1000-1300 years after that (when the earliest surviving written evidence with a bearing on these languages has started to appear).

        • Asya Pereltsvaig

          Dear Carl: Thank you for the detailed comment and the references. This is fascinating, especially as I’ve recently been to a conference on Celtic linguistics. What I wonder is how much traction one can get from comparing Celtic branches (or any other, for that matter) and saying that they are still “close enough” when written evidence begins. I’ve heard a similar argument made about Latin, Ancient Greek, Sanskrit, etc. (that the split of PIE couldn’t have been as early as Bouckaert et al. propose, as these ancient languages are “suprisingly close”) — but what do we really know about the rate of (grammatical) change? Is grammar a better way to construct a “molecular clock” for languages? Your input on this matter would be greatly appreciated!

  • Gippo

    Just pointing out that Plautus and Terence’s comedies are earlier than Latin classical period.

    • Asya Pereltsvaig

      Good point, thanks for the correction! My point is that these comedies are in some form of earlier Latin, but dialogues by slave characters reveal a more colloquial register, which then led to Vulgar Latin. I guess I misused the term “Classical Latin” to refer to a certain register rather than a time period (but I did say “in the classical period” so you are absolutely right to correct me!).

  • Dragos

    Please also note the words “torna, torna, fratre” uttered by a soldier recruited in the Balkans during the campaigns of emperor Maurice (late 6th century). It’s also remarkable that Romanian has a Latin-inherited Christian vocabulary, which suggests a later development in the Roman Balkan provinces – see Johannes Kramer, Bemerkungen zu den christlichen Erbwörtern des Rumänischen und zur Frage der Urheimat der Balkanromanen in Zeitschrift für Balkanologie 34/1 (1998), pp. 15-22.

    • Asya Pereltsvaig

      Thanks for the reference, Dragos!

  • Evan (PolGeoNow)

    Does this article get a lot of traffic for the keywords “dating and romance”, found verbatim in the title? :-p