Wikipedia

Geographical Fun with Wikipedia’s Lists of Lists

For those who like country-based lists and rankings, the Wikipedia is hard to beat. It even has a “lists of lists,” linking to such articles as a “list of countries by birth rate,” “list of the most charitable countries,” and “list of countries by cement production.” Another Wikipedia meta-list, “top international rankings by country,” gives information on some of the categories in which particular countries rank at the highest—or lowest—position. Or, as the article itself puts it, the list provides “global-scale lists of countries with rankings, sorted by country that is placed top or bottom in the respective ranking.”

The Wikipedia’s “top international rankings” article seeks to “distinguish notable statistical indicators from trivial ones.” Whether it does so is for the reader to decide, but I am not convinced of the significance of China’s top position in spinach production or of its “best performance at women’s badminton Uber Cup.” China, one quickly learns from the list, occupies the top-most position in many categories, which is exactly what one might expect, given its population. Other countries make limited appearances, but dozens are classified at the top—or bottom—of some indicator. Chad: lowest ease of doing business. Fiji: best performance at Rugby World Cup Sevens (men). Peru: top producer of silver, quinoa, and maca.

Maca?—the quirky list does have some surprises. At any rate, Peru produces 4,589 tons of it a year. Epidium meyenii, or maca, is, the Wikipedia tells us, is “an herbaceous plant … grown for its fleshy hypocotyl, which is used as a root vegetable and a medicinal herb.” It is unusual indeed to find a starchy “root” that is consumed both for calories and medicinal effects, but that does seem to be the case. Maca use is problematic, as it “contains glucosinolates, which can cause goiters when high consumption is combined with a diet low in iodine.” The author of the Wikipedia article, however, is impressed by its potential:

 A recent review states “Randomized clinical trials have shown that maca has favorable effects on energy and mood, may decrease anxiety and improve sexual desire. Maca has also been shown to improve sperm production, sperm motility, and semen volume.

The “list of top international rankings by country” is not always accurate, sometimes failing even to square with the sources that it links to. Syria gets top-billing for fennel production (might that be regarded as trivial?), yet when one clicks on the link, one finds Syria in sixth place, with a total fennel yield one-quarter that of India.

South Korea gets first-place rankings in several significant categories: best student performance in reading, largest ship builder, most intensive broadband coverage. It also makes the top slot in some quirky categories: “battle of the year,” an annual international b-boying series, and “World Cyber Games,” which, the Wikipedia tells us, “is an international competitive video-gaming [e-sports] event operated by South Korean company World Cyber Games Inc.” Such a standing is not surprising: South Korea takes electronic games very seriously. StarCraft Brood War, after all, is a professional spectator sport in the country.

 

The Linguistic Geography of the Wikipedia

One of the highlights of the Association of American Geographers meeting last week in Seattle was the annual Geography Bowl. Student teams competed to answer all manner of geographical questions, including a few that were devilishly difficult. The most impressive answer may have come in the final round, when the two remaining teams were asked to list the five top languages, after English, used in Wikipedia articles. The Middle Atlantic team buzzed in almost immediately, and one of its members confidently and correctly recited, “German, French, Polish, Italian, and Spanish.”

Both the lack of Chinese and the presence of Polish seemed extraordinary, prompting me to query the team after the contest. The response referenced the well-known cultural pride of the Poles, as well as the fact that Polish has roughly 40 million speakers, a considerable number.

An article in the “meta-wiki” provides detailed information on the use of the 281 languages in which Wikipedia articles have been written. The table posted lists the top fourteen of these languages, with their respective number of articles (in rounded figures). As one can see, Chinese is represented here, coming in twelfth place, between Swedish and Catalan. Such a showing is hardly impressive, however, considering the fact that more than a billion people speak Mandarin Chinese, whereas only around 10 million speak Swedish and 11.5 million Catalan. But neither is the showing of the top Wikipedia language, English. To demonstrate relative Wiki language standings, I calculated the number of articles per 1,000 total* speakers for each of the top fourteen Wikipedia languages. Here English is far surpassed by a number of other languages. Considering the fact that most Swedish, Dutch, and Norwegian Wikipedia users are fully fluent in English, the quantity of articles appearing in their native languages is impressive indeed. (Admittedly, articles in languages other than English are often translated from an English original.)

Overall, European languages dominate the Wikipedia list. A number of major non-European languages rank relatively high (Vietnamese coming in 17th place, Korean 21st, Indonesian 22nd, and Arabic 25th), but they are still surpassed by European languages with far fewer speakers. Several important Asian languages, moreover, rank very low: Bengali, for example, with more than 230 million speakers, is outranked by Luxembourgish, Welsh, and Icelandic, none of which even approaches one million speakers. Sub-Saharan African languages are least represented. Swahili ranks a respectable 75th, with more than 21,000 articles, but Hausa, a major language spoken by 43 million people, ranks 245th, with only 263 articles. By this metric, Hausa is bested by such obscure tongues as Norfolk and Nauruan, and even by long-deceased Gothic.

Another notable feature of the list is the relatively large number of articles written in non-national European languages, many of which are often regarded as mere dialects. In Spain alone, Asturian is used for more than 14,000 articles, Aragonese for more than 25,000, and Galician for more than 70,000. Local linguistic pride along with regionalism and sub-state nationalism are no doubt responsible for such elevated numbers. Such processes are largely but not entirely limited to Europe. In the Philippines, the obscure tongue of Waray-Waray (3.5 million speakers) has an amazingly large Wikipedia presence, its 102,000 articles far over-shadowing the 51,000 written in the national language Tagalog (Filipino).

A final oddity is the relatively high rankings of artificial languages. Almost as many Wikipedia articles are written in Esperanto as Arabic, and the constructed language of Volapük bests Hebrew, Hindi, Thai, and Greek. Ido, with an estimated 100-1200 speakers, boasts more than 21,000 articles, while Interlingua has more than 5,000, Novial more than 2,500, Interlingue (“Occidental”) almost 2,000, and Logban over 1,000. So-called dead languages are also reasonably well represented, with Latin being used for more than 52,000 articles, Old English (Anglo-Saxon) for 2,600, and Pali for 2,300. Artificial languages from fictional societies, however, do not make the list, even though such tongues as Navi and Klingon have plenty of aficionados. The explanation comes in a footnote: “The Klingon language edition of the Wikipedia is no longer hosted by Wikimedia and is now hosted by Wikia as Klingon Wiki.”

* As opposed to native speakers.