Monday, September 6, 2010

Google is a Form of Male Hegomony

An e-mail doing the rounds tells that when you put this text into Google Translate -

I drive a car
I don't know how to drive
I wash the car
I wash the floor
I wash the kitchen

and ask it to translate into Hebrew, you get this response -
אני נוהג במכונית
אני לא יודעת איך לנהוג
אני שוטף את האוטו
אני שוטפת את הרצפה
אני שוטפת את המטבח

The problem, for those of you who are Hebraically challenged, is that the driver of the car and whoever washes it are male, while the non-driver, the floor-washer and the kitchen cleaner are all female.


You could say that Google are male chauvinist pigs (along with being male chauvinist capitalist hegemons, I expect), but the thing about Google Translate is that it's supposed to be automatic. There's no little man behind the screen dashing off translations.

The explanation, I expect, is as follows. Google Translate has been built as a product of the massive book-scanning project the company is engaged in, which has allowed them to create awesomely powerful text-comparison algorithms based on very large numbers of translated works. This little experiment seems to indicate that in those books the declinations of male vs female activities - in books of all languages, mind you - indeed have a gender-weighted statistical pattern.

It may really be a fact that more men wash cars, and more women wash floors.


Samsung said...

I have noticed that inconsistency before and never liked how it handled genders but never thought how Translate actually gets its translations. Your explanation which is also on the About page makes sense. It has a "suggest a better translation" feature you can use if you mouseover a translated piece of text, so maybe if that gets used often enough you would see the default gender on all words get changed to masculine.

RK said...

How would it detect a pattern across all languages when most languages don't mark gender in sentences like those? You might be able to figure it out from the immediate context ("I don't know how to drive," Abigail complained.) but that's often hard for humans, much less algorithms.

In fact, the explanation seems to be more complicated than the one you gave. If you type "I don't know how to drive" without a period, you get "ani lo yoda'at eikh linhog," but adding the period changes the "yoda'at" to "yodea."

Even more strangely, adding a period after "I wash the car" changes "shotef" to "rochetz" and "haoto" to "hamekhonit." Adding a period to "I wash the floor" changes "shotefet" to "lishtof." Bizarre.

Yaacov said...

RK - the genius of what they're doing is that the algorithm works on real, high-quality translations done by experts who have translated books. Thus, when someone translates books by Israelis into English or Japanese, the algorithm sees patterns; when someone translates books from Japanese or English into Hebrew, likewise. So figuring out the gender was done by professional translators, not the algorithm; the algorithm crunches numbers, and sees how often "washing the floor" - in any original language - was "Shotefet" in Hebrew (or started as "shotefet", then became "washing the floor" in English or Japanese).

RK said...

Ah, of course, that was stupid of me. Though I still don't understand why adding the period should make such a big difference.

Anonymous said...

and here based on my recent experience that Google translate bettered a callous e-mail by the German Foreign Office by rendering Transgression as Attack (the four or rather five murdered recently) I would assume, that, human nature being what it is, that newspapers will have translations done by Google and have a human eye and keyboard rove over it.

Would the word by Google chosen in the above example i.e. "attack" have roused any suspicion with me that Google might have embellished/PC-ed it, made it acceptable to polite society?

Given the pressure I'd probably have to work under, given the general belief, that the computer knows best, given that I'd probably know only the language well it was translated into, I very much doubt it.

BTW even without Google translate the translations done by our best radio station tended to shift quite often the original speaker (which I may here for a bit in the background) to what the interviewer wants it to be. (recent example to call the attack by the Marmara thugs an attack was too much, so it was "taught" to the public as a provocation with made it easy to imply that the Israelis "overreacted".)