Wednesday, January 06, 2010

Fun with Google Translate, MidEast way - who killed whom?

Inputing "Israeli kills Palestinian" in Arabic translates "Israeli killed by Palestinians"


While we all acknowledge that online translation websites are far from accurate, they generally do a good job giving an approximate rendition of the meanings of the original text.
And with its 51 supported languages, Google Translate has positioned itself as a leader in the online translator market.

Google Translate claims to take a "different approach" in creating its translation algorithms. One notable difference is the option to "contribute a different translation", offered at the bottom of each machine-generated translation page.

Last week surfaced what I first thought of as an amusing quirk. You'd enter the text "اسرائيلي يقتل فلسطيني" -- which is Arabic for "Israeli kills Palestinian".
It's a simple enough sentence. Subject, verb in the present tense, object. Surely Google Translate can handle that?

Well, if you hit "Translate", you get...

"Israeli killed by Palestinians".

The subject and object have been inverted; or to be more precise, the verb has shifted from the active to the passive form. And the one Palestinian is now many.
Pretty confusing to a potential reader who cannot decipher the original.

Funny quirk, I figured? At the suggestion of other friends, I tried a few other objects (and others did, too).

Entering, in Arabic, "اسرائيلي يقتل هندي" - "Israeli kills (an) Indian" (or "American", or "Brit", or "dog") - well, it's the poor who Israeli dies.

Apparently he can only take on the Brazilian fellow: "اسرائيلي يقتل برازيلي" ("Israeli kills Brazilian") is translated (almost) properly, with the verb shifting to the past tense ("Israeli killed Brazilian").
Not very flattering for Brazilians, eh! :)

Changing the subject of the sentence, however - when the murdered is then of any other nationality - the phrase translated properly. Funny, huh?

I did notice something else though. Dropping the object (so just subject+verb) gives very different translations. I tried "Palestinian kills" and "Israeli kills" in Arabic and this is what I got:



"Israeli killed", but "Palestinian kills".

Is that a joke of some sort? Some Google developer with an acute sense of humour and geopolitics? A coordinated user manipulation, the GT equivalent of a Google bomb? If many people decide to "Contribute a better translation" to the same sentence eventually GT will adopt the new sentence as correct.


I nevertheless decided to give Google Translate the benefit of the doubt. After all, I thought, it uses original text to compare its translations, and the English-based media seldom uses a sentence as straightforward as "Israeli kills Palestinian". When such event occurs - as it often does - the subject is usually "the IDF" or "Tsahal" or something...

So I tried a different pair of languages: Arabic and Hebrew. If, as I figure, the change was made by Israelis - I can imagine them, a class of Israeli kids who decided to pull a harmless joke on Google by submitting the same "better" translation dozens of times - then they're likely to have done it to this pair of languages, right?




Translating "Palestinian kills Israeli"(فلسطيني يقتل اسرائيلي) into Hebrew you get הורג פלסטיני ישראלי" - which is close to the original. (running the result in reverse leads to the original text).

But here's the fun bit: translate 'Israeli kills Palestinian' (اسرائيلي يقتل فلسطيني) and you get the unexpected but linguistically very correct (ישראלים שנהרגו בידי פלסטינים): "Israelis who were killed at the hands of Palestinians". Both subject and object now in plural, verb in passive form, with a few extra intermediary particles and words.


This cannot be a translation mistake. Another instance is here: until a few weeks ago, attempting to translate "long live Palestine" (تحيا فلسطين) from Arabic to Hebrew gave (זמן ישראל לחיות), which approximately read as "it's time for Israel to live!". Same: impossible to be a programming mistake.
(it has now been corrected).


Rather than a deficient algorithm, this result - and the linguistic precision of the erroneous wording -is assuredly human-caused. Which points out to a major deficiency in Google Translate: it trusts people.

The obvious correction to this problem is to have user-submitted "corrections" checked by a human translator. This might however turn out to be very costly and time-consuming... But until then, Google Translate will be open to user manipulation. It's the trade-off to make.

13 comments:

SiSi said...

That's crazy!!! I wouldn't have believed it until I tried it myself..

Khaled said...

it got corrected.... As you've already said the contributions play a major factor....

Mo-ha-med said...

Sisi - know, it's hilarious

Khaled - I'm following posts written by others and some people still get weird results!

worriedlebanese said...

excellent!
it seems to have become a game now
for اسرئيلي يقتل فلسطيني
I got Palestinian kills Israeli corporal!

and for فلسطيني قتل اسرئيلي
Palestinian Israeli who was killed

Mo-ha-med said...

Hehehe, this is very good! Thanks for sharing, WL!

Chris Winter said...

I don't think google translate is the only problem out there. If you look at the hyperbolic and panic-inspiring translations of some Arabic speeches in Western (and I guess Israeli) media, one should be very afraid.

I really hate the human tendency to demonize anything we don't immediately understand. As a Dutchman, I feel the need to apologize for some of what my countrymen do in this regard.

Having said that, *most* humans seem to suffer from these dysfunctions.

Mo-ha-med said...

Hello again, WinterTime!
(cool photo gallery, btw).

That's the main reason why I never use the videos from Memri: they CHEAT. their translations and subtitles are deliberately erroneous.

Well, you Dutch people do have a few fuckups. But hey, we all do. Don't worry about it. :-P

Anonymous said...

Ah, it's true even with fusha case endings! Well, if you put the Israeli first, إسرائيلي يقتل فلسطينيا you get "Israeli kills Palestinian" at least, but if you put يقتل first, "يقتل إسرائيلي فلسطينيا" it's still the Israeli who dies! Wonder who wrote the scripts and how much time they must have had on their hands to think of this...cos the translate tool has got much better in the past year or so!

Mo-ha-med said...

Still on, huh? Funny.
Well, it probably took a few people clicking away their 'better translation' but don't think it was that tiresome... strikes me as easier than a google bomb, for instance..

worriedlebanese said...

If you want to see funny google translations, you should check have seen the موقع مدينة بيروت الرسمي
They removed most of these pages yesterday after a local shaming campaign, but here's a couple of lines from the old pages.

"Beirut city falls on the eastern beach from the Mediterranean Sea, borders it a west the sea, and south its suburbs and the region of its immortality an extension to a hunt and its neighborhood, Wsh
And Beirut falls in a moderate region that is distinguished by the quality of the weather and straightness in the climate and a beauty in the view, and mentions some of the sources by that Beirut name derived from (Pyrite)
And when Beirut is said in the Ottoman time, but he the meek Beirut is meant by it inside its fence and while is but from regions that enter today Beirut scope".
Poetry I'm telling u. google translate poetry

Mo-ha-med said...

Hahahahaha.... بيروت تقع في.. became "Beirut falls"? Oh man, this is good. :)
As for "the region of its immortality an extension to a hunt and its neighborhood" - I don't have the slightest idea what that could mean..
Hmm. Let's try to reverse translate it... :)
Thanks for the laugh!

empathyplease said...

I just tried translating English -> Arabic --> English. I started with the phrases:

1) Israeli kills Palestinian,
and
2) Palestinian kills Israeli

both translated correctly.

It appears there is no bias there as of now.

Mo-ha-med said...

Hahaha.. must've gotten corrected. Thanks for keeping us posted!