Language is a complex thing. People spend their entire careers studying other languages and translating from one tongue to another, because translation is never as simple as just plugging words into a search engine or dictionary and seeing what comes out. That’s true of widely spoken languages, and even more true of lesser-used and understood tongues, like, say … Scots. But that didn’t stop one intrepid American internet user from defining Scots for the whole internet.
Oh yes indeed. Buckle up.
First, let’s get our bearings. For those that might not know, Scots is “one of three native languages spoken in Scotland today, the other two being English and Scottish Gaelic. Scots is the collective name for Scottish dialects known also as ‘Doric’, ‘Lallans’ and ‘Scotch’ or by more local names such as ‘Buchan’, ‘Dundonian’, ‘Glesca’ or ‘Shetland.'” That’s according to the Scots Language Centre, which I assume is a reliable source here. We’ll get to why I’m cautious in a second.
Scots, as you can tell, is a very complex language with lots of dialects and variations, and it’s not super-well known, widely studied or taught the way other languages like, say, Spanish is. But still, the internet being what it is, there are tools and references available for those interested in Scots. And there are supposed to be websites in the language … websites like Wikipedia where the content and translations are made by people in a certain language.
Which brings us to the very weird case of Scots Wikipedia. A wiki with tens of thousands of entries in what’s supposed to be the Scots language … and they were nearly all written by one, very prolific and very American person. A person … who does not speak Scots. The strangeness was first discovered earlier this week by a user on Reddit, who explained their suspicion and discovery in a post that’s now gone viral. User Ultach on r/Scotland wrote:
The Scots language version of Wikipedia is legendarily bad. People embroiled in linguistic debates about Scots often use it as evidence that Scots isn’t a language, and if it was an accurate representation, they’d probably be right. It uses almost no Scots vocabulary, what little it does use is usually incorrect, and the grammar always conforms to standard English, not Scots.
Ultach wanted to see who was making these bad entries on the Scots Wiki and uncovered something that’s pretty amazing.
I checked the edit history to see if anyone had ever tried to correct it, but it had only ever been edited by one person. Out of curiosity I clicked on their user page, and found that they had created and edited tens of thousands of other articles, and this on a Wiki with only 60,000 or so articles total! Every page they’d created was the same. Identical to the English version of the article but with some modified spelling here and there, and if you were really lucky maybe one Scots word thrown into the middle of it.
Now, like Ultach, we’re not going to expose this Wiki editor, or shame them. They seem to have been genuinely trying to do something, but the way they went about it wasn’t correct. And that’s because language is so much more complex than one to one translations. This user didn’t understand that Scots has its own grammar, they didn’t understand how certain words translated, and just running English through a bad online Scots dictionary isn’t going to cut it.
As Internet Linguist Gretchen McCulloch explained in an excellent thread, this is not how it works.
I don’t speak Scots, but even “with vocabulary changes” translation is never a perfect 1-to-1 correspondence
For example, the Reddit post cites “an aw” which apparently can sometimes be translated “also” but not always
Also, you may be missing things :)https://t.co/vmEWEkPDnE
— Gretchen McCulloch (@GretchenAMcC) August 25, 2020
So this is all kind of wild and weird, but it’s not hurting anyone, right? Well, actually, it is.
Because we live in an age of internet and AI, there are all sorts of algorithms, programs, bots, and different technologies that use things like Wikipedia entries supposedly written in another language to learn that language. It’s very much the definition of virality, in the software sense, when bad language examples get integrated into these systems because when programs learn something wrong, it’s hard to erase it.
Why is it a problem that natural language processing tools often just throw in Wikipedia data without checking it by someone who actually knows the language?
Well, https://t.co/yxTeVVUgaH
— Gretchen McCulloch (@GretchenAMcC) August 25, 2020
Especially here, where Scots is a lesser-used language and is, we can see from this whole debacle, poorly understood outside of Scotland, this kind of thing is really harmful. Not just in a programming or AI sense, but for the real people out there struggling to have this language more broadly recognized and worthy of study and respect. People for whom it is part of their culture and heritage.
I’ll let the Redditor Ultach explain:
This is going to sound incredibly hyperbolic and hysterical but I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale. Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone. They wrote such a massive volume of this pretend Scots that anyone writing in genuine Scots would have their work drowned out by rubbish. Or, even worse, edited to be more in line with said rubbish.
It will very likely take a long time for Scots Wikipedia to be fixed if indeed that happens. It just took one very dedicated person to break it, but it may take many more to undo it and translate things correctly. I hope it does. But in this case, if we’re taking the high road and this user took the low road, he certainly got to Scots Wikipedia before us.
(Via: Gretchen McCulloch/Twitter, image: Pixar)
Want more stories like this? Become a subscriber and support the site!
—The Mary Sue has a strict comment policy that forbids, but is not limited to, personal insults toward anyone, hate speech, and trolling.—
Published: Aug 26, 2020 12:16 pm