Wikipedia was founded with the aim of making knowledge freely available around the world — but right now, it’s mostly making it available in English. The English Wikipedia is the largest edition by far, with 5.5 million articles, and only 15 of the 301 editions have more than a million. The quality of those articles can vary drastically, with vital content often entirely missing. Two hundred and six editions are missing an article on the emotional state of happiness and just under half are missing an article on Homo sapiens.
Available as a beta feature, the content translation tool lets editors generate a preview of a new article based on an automated translation from another edition. Used correctly, the tool can save valuable time for editors building out understaffed editions — but when it goes wrong, the results can be disastrous. One global administrator pointed to a particularly atrocious translation from English to Portuguese. What is “village pump” in the English version became “bomb the village” when put through machine translation into Portuguese.
“People take Google Translate to be flawless,” said the administrator, who asked to be referred to by their Wikipedia username, Vermont. “Obviously it isn’t. It isn’t meant to be a replacement for knowing the language.”
Those shoddy machine translations have become such a problem that some editions have created special admin rules just to stamp them out. The English Wikipedia community elected to have a temporary “speedy deletion” criteria solely to allow administrators to delete “any page created by the content translation tool prior to 27 July 2016,” so long as no version exists in the page history which is not machine-translated. The name of this “exceptional circumstances” speedy deletion criterion is “X2. Pages created by the content translation tool.”
The Wikimedia Foundation, which administers Wikipedia, defended the tool when reached for comment, emphasizing that it is just one tool among many. “The content translation tool provides critical support to our editors,” a representative said, “and its impact extends even beyond Wikipedia in addressing the broader, internet-wide challenge of the lack of local language content online.”
That may be surprising if you’ve seen headlines in recent years about AI reaching “parity” with human translators. But those stories usually refer to narrow, specialized tests of machine translation’s abilities, and when the software is actually deployed in the wild, the limitations of artificial intelligence become clear. As Douglas Hofstadter, professor of cognition at Indiana University Bloomington, spelled out in an influential article on the HUC99 topic, AI translation is shallow. It produces text that has surface-level fluency, but which usually misses the deeper meaning of words and sentences. AI systems learn how to translate by studying statistical patterns in large bodies of training data, but that means they’re blind to the nuances of language that are used more infrequently, and lack the common sense of human translators.