Recent Focused Series »

Indo-European Origins
Siberia
Northern California
The Caucasus
Imaginary Geography
Home » Cultural Geography, Featured, World

The Linguistic Geography of the Wikipedia

Submitted by on April 18, 2011 – 6:36 pm 12 Comments |  
One of the highlights of the Association of American Geographers meeting last week in Seattle was the annual Geography Bowl. Student teams competed to answer all manner of geographical questions, including a few that were devilishly difficult. The most impressive answer may have come in the final round, when the two remaining teams were asked to list the five top languages, after English, used in Wikipedia articles. The Middle Atlantic team buzzed in almost immediately, and one of its members confidently and correctly recited, “German, French, Polish, Italian, and Spanish.”

Both the lack of Chinese and the presence of Polish seemed extraordinary, prompting me to query the team after the contest. The response referenced the well-known cultural pride of the Poles, as well as the fact that Polish has roughly 40 million speakers, a considerable number.

An article in the “meta-wiki” provides detailed information on the use of the 281 languages in which Wikipedia articles have been written. The table posted lists the top fourteen of these languages, with their respective number of articles (in rounded figures). As one can see, Chinese is represented here, coming in twelfth place, between Swedish and Catalan. Such a showing is hardly impressive, however, considering the fact that more than a billion people speak Mandarin Chinese, whereas only around 10 million speak Swedish and 11.5 million Catalan. But neither is the showing of the top Wikipedia language, English. To demonstrate relative Wiki language standings, I calculated the number of articles per 1,000 total* speakers for each of the top fourteen Wikipedia languages. Here English is far surpassed by a number of other languages. Considering the fact that most Swedish, Dutch, and Norwegian Wikipedia users are fully fluent in English, the quantity of articles appearing in their native languages is impressive indeed. (Admittedly, articles in languages other than English are often translated from an English original.)

Overall, European languages dominate the Wikipedia list. A number of major non-European languages rank relatively high (Vietnamese coming in 17th place, Korean 21st, Indonesian 22nd, and Arabic 25th), but they are still surpassed by European languages with far fewer speakers. Several important Asian languages, moreover, rank very low: Bengali, for example, with more than 230 million speakers, is outranked by Luxembourgish, Welsh, and Icelandic, none of which even approaches one million speakers. Sub-Saharan African languages are least represented. Swahili ranks a respectable 75th, with more than 21,000 articles, but Hausa, a major language spoken by 43 million people, ranks 245th, with only 263 articles. By this metric, Hausa is bested by such obscure tongues as Norfolk and Nauruan, and even by long-deceased Gothic.

Another notable feature of the list is the relatively large number of articles written in non-national European languages, many of which are often regarded as mere dialects. In Spain alone, Asturian is used for more than 14,000 articles, Aragonese for more than 25,000, and Galician for more than 70,000. Local linguistic pride along with regionalism and sub-state nationalism are no doubt responsible for such elevated numbers. Such processes are largely but not entirely limited to Europe. In the Philippines, the obscure tongue of Waray-Waray (3.5 million speakers) has an amazingly large Wikipedia presence, its 102,000 articles far over-shadowing the 51,000 written in the national language Tagalog (Filipino).

A final oddity is the relatively high rankings of artificial languages. Almost as many Wikipedia articles are written in Esperanto as Arabic, and the constructed language of Volapük bests Hebrew, Hindi, Thai, and Greek. Ido, with an estimated 100-1200 speakers, boasts more than 21,000 articles, while Interlingua has more than 5,000, Novial more than 2,500, Interlingue (“Occidental”) almost 2,000, and Logban over 1,000. So-called dead languages are also reasonably well represented, with Latin being used for more than 52,000 articles, Old English (Anglo-Saxon) for 2,600, and Pali for 2,300. Artificial languages from fictional societies, however, do not make the list, even though such tongues as Navi and Klingon have plenty of aficionados. The explanation comes in a footnote: “The Klingon language edition of the Wikipedia is no longer hosted by Wikimedia and is now hosted by Wikia as Klingon Wiki.”

* As opposed to native speakers.

Previous Post
«
Next Post
»

Subscribe For Updates

It would be a pleasure to have you back on GeoCurrents in the future. You can sign up for email updates or follow our RSS Feed, Facebook, or Twitter for notifications of each new post:
        

Commenting Guidelines: GeoCurrents is a forum for the respectful exchange of ideas, and loaded political commentary can detract from that. We ask that you as a reader keep this in mind when sharing your thoughts in the comments below.

  • Keith Macgowan

    Another interesting statistic on the same wikipedia page is the number of edits, which might point at which languages are actually being used. As Wikipedia points out (at the somewhat out of date page http://meta.wikimedia.org/wiki/List_of_Wikipedias_by_edits_per_article):

    “A ranking of Wikipedias by article count alone presents an incomplete picture of the overall quality of any particular Wikipedia edition, since article counts can be inflated by using bots to generate Wikipedia articles from statistical data and databases, such as towns of the world et cetera.”

    The Waray-Waray/Tagalog comparison perhaps starts to make a bit more sense when the other numbers are factored in. Despite having twice as many articles, Waray-Waray has fewer total edits, users, active users and admins than Tagalog. It sounds like the numbers might be being distorted a bit by a small number of of very determined people (possibly with the assistance of a bot or two).

    That said, I wonder if the disproportionate numbers for some languages might partially reflect the lack of other outlets present in those languages. In other words, could you end up with low Arabic and Chinese numbers because the Arabic and Chinese worlds are so big they have dozens of tools for educational writing, whereas Wikipedia might have a near monopoly on the interested volunteers who speak Galacian or Haitian Creole?

  • http://GeoCurrents.info Martin W. Lewis

    Many thanks to Keith Macgowan for the insightful comments, which generally seem spot-on. I doubt, however, whether there is anything comparable to the Wikipedia in Chinese or Arabic, although my knowledge here is quite limited. It is interesting to note that only 46% of Chinese Wikipedia users are from mainland China, whereas 22% percent are from North America (from the English-language Wiki article on the Chinese Wikipedia.) Wikipedia also has an article on the “blocking of Wikipedia in China.” which is no doubt a significant factor.

  • Marc

    Reflecting on the number of English articles per 1000 speakers, I imagine that at ~3.6 million, English comes closer than the others to something of an effective soft limit on the number of topics worth a dedicated article.

    Additionally, specialists with such detailed knowledge and an interest in Wikipedia would become harder to find than a generalist using a less spoken language and would probably be less inclined to write an article with relevance to only a narrow segment of the public. Finally such detail might be more conveniently included in a parent topic article.

    This would have the effect of pushing up the figures of less spoken languages in the articles per 1000 speakers ranking.

  • Jim Wilson

    My guess is that there would be a greater correlation between the number of Wikipedia articles and the number of internet users than between the number of Wikipedia articles and the number of speakers. There are a great many Bengali speakers, but how many of them have computers? On the other hand, how many Esperanto speakers don’t have a computer?

  • http://GeoCurrents.info Martin W. Lewis

    Good point by Jim Wilson. But although internet usage rates among Bengali speakers are very low, the total population is so large that many do have internet access. Internet World Stats puts the total number of internet users in Bangladesh at almost one million in 2010 (for a penetration rate of 0.6%.) I have not been able to find figures for the Indian state of West Bengal, but the penetration rate here is almost certainly much higher; Calcutta may be a desperately poor city, but it it still noted for its educational institutions and the quality of its intellectual life. But educated people here are all fluent in English, and hence would generally use the English-language Wiki. I am still surprised, however, that pride in the Bengali language has not resulted in a larger Bengali Wikipedia.

  • Ryan Lord

    The “Random Article” button is a quick and dirty way of seeing the relative quality of a certain language’s Wikipedia. Go to the English version hit the button a couple of dozen times to get a feel of what a well loved Wikipedia looks like. Now go to war.wikipedia.org (Waray-Waray), hit the “Bisan ano nga pakli” button; positioned in more or less the same place as the Random Article button is on en.wikipedia.org. You will notice that basically every single article is almost identical, they just give the name of a town and what province (county, state etc) and country it belongs to.

    So, for example, the Chinese Wikipedia is small, but at least every page shows the intervention of a human hand.

    Also you can look at the bottom where the references usually are. If they are in English that probably indicates a Google translated page, if they are in the native language you’re probably looking at something that someone has taken the time to compose.

  • Pingback: News in Brief: High school students in New Zealand and Lithuania, languages on Wikipedia « Living Languages()

  • Bishnu Prasad Gautam

    1. Sarva siksha matri bhasama nadinu sukshma gatima das banaunu.*
    2. Matako dudh sisulai siksha matri bhasama,prabhav parchha sristilai prakashko gatima.*
    3. Sodharthile sodh patra matri bhasama bujhaunu, janma siddha adhikar ra kartavya ho.*
    4. What mother’s milk is to baby, mother tongue is to education.
    5. Research scholars should be given the option to submit thesis in their mother tongue.

  • Bishnu Prasad Gautam

    1. Sarva siksha matri bhasama nadinu sukshma gatima das banaunu.*
    2. Matako dudh sisulai siksha matri bhasama,prabhav parchha sristilai prakashko gatima.*
    3. Sodharthile sodh patra matri bhasama bujhaunu, janma siddha adhikar ra kartavya ho.*
    4. What mother’s milk is to baby, mother tongue is to education.
    5. Research scholars should be given the option to submit thesis in their mother tongue.
    ______________________________________________________________________
    * Nepali language

  • Himalayabookhouse10

    सर्व शिक्षा मातृभाषामा नहुनु सुक्ष्म गतिमा दास हुनु हो |

    माताको दुध शिशुलाई शिक्षा मातृभाषामा प्रभाव पर्छ सृष्टिलाई प्रकाशको गतिमा |

  • http://blog.zolnai.ca/ Andrew Zolnai
    • http://www.pereltsvaig.com Asya Pereltsvaig

      thanks, Andrew

  • http://www.facebook.com/bishnuprasad.gautam.7 Bishnu Prasad Gautam

    मातृभाषामा शिक्षा

    माताको दूध शिशुलाई शिक्षा मातृभाषामा,

    प्रभाव पर्छ सृस्‍टिलाई प्रकाशको गतिमा।…. यी माथिका हरफ मेघालय शिलोंगका नेपालीभाषी पुस्तक ब्यबसायी श्री बिष्णु गौतमले बिगत ५ वर्ष देखि जोड तोडका साथ प्रचार प्रसार गर्दै आएका छन् । उनले प्रकाशन गरेका पुस्तक, बिजक, लेटर प्याड, पुस्तक सुची जताततै यी हरफ देख्न पाइन्छ । नेपाली, अंग्रेजी, खासी र बंगाली भाषामा लेखिएका यी हरफले मातृभाषाको शक्तिले सृष्टिको रक्षा र यस सुन्दर बहुरंगी विश्व-बाटिकालाइ द्रुत गतिमा सुमुन्नत बनाउन टेवा मिल्ने संदेश दिन्छ ।.

    जन्मेपछि सम्बाद गर्न सिकेको पहिलो भाषा नै मानिसको मातृभाषा हो । संसारमा ज्ञान, सोच र कल्पनाको बहुरंगी विविधता कायम राख्न पनि मातृभाषालाइ बचाईराख्न र विकास गर्न जरुरि छ । मातृभाषामा दिइने शिक्षाले सम्बन्धित भाषा त्यसको लिपि, जातीय संस्कार र संस्कृतिको विकास तथा समाजमा उत्प्रेरणा र चेतनाको अभिवृद्धि हुन्छ । यदि कुनै भाषा लोप भएर गयो भने त्यस जतिको संस्कृति पनि लोप भएर जान्छ । संस्कृतिक सम्वृद्धिमा सबैभन्दा ठूलो योगदान भाषाको नै हुन्छ । मातृभाषामा दिइने अभिव्यक्ति सबैभन्दा परिपूर्ण र सहज हुन्छ । यदि मातृभाषा सम्पन्न भयनन भने संसारमा धेरै कारोबार हुने सम्पर्क भाषाको अवस्था पनि खोक्रो हुन जानेछ । ससाना हजारौ मातृभाषाका कारणले नै संसारका सम्पर्क भाषा सम्पन्न र हराभरा भएका हुन् । यदि कारोबारी भाषामा लिप्त भएर मातृभाषाको लोप भयो भने ज्ञान बिज्ञानको संसार उराठिलो मरुभूमि जस्तो बन्ने छ । त्यसैले शिक्षा मातृभाषामै हुनु पर्छ । मातृभाषा मानिसको मौलिक ज्ञान, शिप सृजनाको खजाना हो । यस्तो महत्वपूर्ण खजानाको रक्षामा ध्यान नदिएर क्षणिक लाभको निम्ति कारोबारमा चलेका भाषामा मात्र लिप्त हुनु समाजको भविस्य माथि गरेको बेइमानी र बाल अधिकारको हनन हो ।.

    प्रसिद्ध साहित्यकार रवीन्द्रनाथ टैगोरले भनेका छन्, ‘मातृभाषामा शिक्षा पाउनु मानिसको जन्मसिद्ध अधिकार हो । हामी जसरी आमाको कोखमा जन्मेका हौं त्यसैगरी मातृभाषा पनि हाम्रो कोख हो । यी दुवै आमा हाम्रालागि सधैं सजीव र अपरिहार्य छन् ।’ उनले मातृभाषाको महत्त्वलाई बुझे र बुझाउने कोसिस गरे । प्रसिद्ध राजनीतिज्ञ नेलसन मण्डेलाले भनेका छन्- इफ यू स्पिक टु अ म्यान इन अ ल्याङ्वेज ही अन्डरस्ट्यान्डस, इट गोज् टु हिज माइन्ड बट इफ यू स्पिक इन हिज ल्याङ्वेज इट गोज टु हिज हर्ट । यदि कसैसँग उसले बुझ्ने भाषामा कुरा गर्नुभयो भने त्यो कुरा उसको दिमागमा मात्र पुग्छ । यदि उसको मातृभाषामा भन्नुभयो भने मुटुसम्म पुग्छ । मण्डेलाले मातृभाषाको द्रुत असरलाई प्रस्ट्याए ।.

    संयुक्त राष्ट्रसंघको अध्ययनअनुसार यतिबेला कारोबारमा नचलेका करिब ५३०० मातृभाषा संकटमा परेका छन् । शिक्षामा मातृभाषाको महत्त्वलाई नजरअन्दाज गरेर अबको शिक्षानीति बनाइयो भने सामाजिक र राष्ट्रिय मात्र होइन मानव जातिकै अस्तित्व संकटमा आउन सक्ने स्थिति बन्नेछ । संयुक्त राष्ट्रसंघमा सन् १९९९ बाट यस मुद्दाले स्थान पाइसकेको छ । अब यसलाई संसारभरि उपयुक्त कार्यान्वयनको खाँचो छ ।.See More.