Skip to content

Google Does Machine Translation

Neat, google is finally getting around to using their in-home statistical machine translation stuff, instead of Systran’s old rule-based.

Been expecting this for a while, ever since they bought Franz Och from us here at ISI a year or two ago.

Hehe, it’s funny to read layperson commentary on slashdot:

There are five levels of machine translation:
1) word substitution.
2) phrase substitution.
3) cohesive paragraphs and idioms.
4) light literature, magazine articles, and business.
5) classical literature, law, and diplomacy.

Each level requires at least an order of magnitude more computing power than the previous one. Babel fish is on level two and systran is on three. Google is positioning themselves to be between levels four and five.

Oh man, this commenter is seriously confused, in so many ways =). I like how he (one can assume the commenter is male, as he is commenting on Slashdot) transitions from algorithmic criteria in 1-3 to genre criteria in 4-5. And then there’s the bit about “law/diplomacy” being harder than “magazine articles” (untrue from a statistical perspective, as there’s so much more bilingual corpora available created by the U.N. or Canadian Parliament, for example). And then there’s the hubris to assume 4 & 5 are even POSSIBLE given the current level of technology.

Sigh…

</rant>