On Machine Translation — Have No Fear!
Апрель 19, 2015 admin 0 Comments
The number of translation requests Google Translate gets per day corresponds to 1 billion books per day.
Sveta Kelman, Google Translate Program Manager
We have 35 people on our team.
Arne Mauser, Google Translate
The main thing is to make a good start. I have a blog on livejournal.com that I updated regularly for almost 7 years until recently, when I decided to switch languages from Russian to American English, from “dictatorship” to “democracy”, from literature to technology. And here it is — my first blog post in English. But how am I making a good start if my English leaves something to be desired? A good solution would be relying on content rather than style. Here comes another problem: this is a localization blog, so, speaking about content, choices are very limited. Indeed, the first topic may seem overly technical: Machine Translation. So technical and so dull! If I were able to choose, I would probably opt for catchy titles like “To Promoters of J. Steinbeck’s Legacy at Cannery Row (Monterey, CA): Did You Read His Book At All???”, or “Why I Now Fully Understand W. Faulkner’s Pain: A Long Year Spent in the South”. But I am now studying localization, not literature, and tools and books are almost opposite notions, in spite of the same word length and “oo” inside, so – Machine Translation it is. To make this topic slightly less boring and put an emotional charge in it, I will mention that I definitely have a love-hate relationship with MT. Another thing is that I am not going to use any sources here, other than living people whom I talked to. So, no written sources, no quotes from books, no long academic-style sentences – only oral testimonies of those who really know what they are talking about and have infinite passion for it – in my humble interpretation. And my own thoughts, of course. MT, an imminent threat to translators Every time I bring up the MT topic, my friends who do not share a linguistic background emphatically assert that “MT will soon replace translators, no doubt!” When I was a translator, it terrified me so much that I never used Google Translate, hated it, was afraid of it and tried to denounce it as something phony and unprofessional. Now, that my career as a translator is over, I am finally able to evaluate its performance without bias. All I can say is that it is still too far from being perfect for EN-RU language pair, but it can be useful in many cases. On top of everything, I finally spoke to a person who put a decisive end to any doubts and fears related to Google Translate. Sveta Kelman, born in Leningrad, now Program Manager for Google Translate, says that their aim certainly has nothing to do with taking the bread from translators’ mouths, but rather help those who do not speak a foreign language at all. This point of view is fairly supported by the Google’s acquisition of the Quest Visual start-up, producer of the Word Lens app that uses a smartphone camera to translate any text that it sees. https://www.youtube.com/watch?v=h2OfQdYrHRs So I now think Google Translate is more a benefit than an evil for mankind. MT, an imminent threat to secrecy A very common concern about Google Translate is that any text submitted for translation can be disclosed to security agencies. However, this disclosure is limited to US governmental agencies responsible for homeland security and, unless a company is plotting something threatening, it will hardly attract any attention. Just avoid using MT if you are a terrorist. Or, rather, make sure that you use it, and we will all feel safer. If a company thinks that it is safe to use email, it can try Google Translate just as well, the confidentiality risks are almost identical.
Are machines smarter or stupider than humans? Mike Dillinger, head of MT at LinkedIn, formally agrees with Sveta, saying that MT will never replace humans because “machines are stupider than humans.” This is a very controversial topic that can be expanded to thousands of pages, but I will only mention that too many airplane accidents happen exactly due to the fact that humans are actually less reliable, alert, fast or skillful than flight computers. Human error is by far the most common reason for any accident, incident or tragedy involving interaction of “stupid” machines and “smart” people. But I still agree with both of them that software will most likely never replace translators, unless there is a breakthrough in neuroscience that reveals the exact principles of how our brain works and how it actually learns a language. David Rubin Sr., a former manager at Cisco, was the first person who explained to me, using a very simple example, what machine learning is. He compared an MT engine to a child of 2-3 years old who starts speaking. His parents give him a lot of examples of a language usage, i.e. its contextual usage – just like we do while training MT engines when we just give them a lot of examples. But then an interesting thing occurs that gives ground to Mr. Dillinger’s statement about the superiority of a human’s brain over a robot: the child gains “linguistic intuition” that allows him to produce new and unique sentences, whereas stupid machines would never go beyond the predetermined set of the phrases to be used. It is the brain characteristic that is called creativity and that machines will never master, unless, as mentioned above, scientists resolve the mystery and decompose creativity into concrete elements. This paragraph can be actually reduced to a single line by Goethe — Existence divided by human reason leaves a remainder (I give credit to David Rubin Jr. here who found the English translation of this sentence for me).
Different languages – different conventions One last thing I want to touch on is a common problem of Google Translate that has only WRITTEN context, whereas, again, it is designed primarily to help people understand each other, i.e. it desperately needs examples of spoken language usage. Here I refer to Arne Mauser, a Google software engineer of German descent, who says that German translations made by GT have nothing to do with the way that people speak. I, by my own example, can also confirm that spoken and written Russian are two completely different languages. A conversation in Russia is an intellectual game – people play with set expressions, allusions, quotes, language units, phraseology, etc. – to create their own original “mix” thereof. If you are Russian and do not know this art of a creative self-expression, no one will ever listen to you – the more creative, the better. Besides, spoken Russian is nothing but jokes – we juggle them, retort to them, and, again, it is an extremely sophisticated game that can be very tiring, actually, and is far from being reflected in the GT translations. I will conclude by an illustrative example of dramatic disparities that exist between languages due to different conventions, environment and language structure itself (and which the Google Translate team would obviously love to recreate in their translations). In M. Kundera’s famous novel The Unbearable Lightness of Being, one of the main characters, Sabina, first has a Czech love mate and then she starts dating a Frenchman. What disappoints her a little bit in her new love story is that, when with the Czech guy, she was getting a different “affectionate name” every single day – he invented hundreds of sweet little names for her, while the French guy was not that creative at all calling her “ma chérie” or something along those lines 100 times a day. Finally, she became bored and left him for reasons that were well in line with her unmet linguistic expectations. This is a remarkable example that illustrates a fundamental difference between analytical and inflected languages and also importance of taking national traits, conventions and customs into account. These differences cannot be reflected by Google in their translations so far.
Sources: Arne Mauser, David Rubin Jr., David Rubin Sr., J. W. Goethe, Mike Dillinger, Milan Kundera & Sveta Kelman. Note: if you did not like this post because it is not professional enough, check out my second post about MT that I wrote under peer pressure.