Automatic language translation has come a long way, thanks to neural networks—computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts—a surprising advance that could make documents in many languages more accessible.
“Imagine that you give one person lots of Chinese books and lots of Arabic books—none of them overlapping—and the person has to learn to translate Chinese to Arabic. That seems impossible, right?” says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastiàn, Spain. “But we show that a computer can do that.”
Most machine learning—in which neural networks and other computer algorithms learn from experience—is “supervised.” A computer makes a guess, receives the right answer, and adjusts its process accordingly. That works well when teaching a computer to translate between, say, English and French, because many documents exist in both languages. It doesn’t work so well for rare languages, or for popular ones without many parallel texts.