To mine that necessary data at scale, Fan’s team relied heavily on the LASER system. “This is difficult at scale because it's hard, for example, to find someone who speaks English and Tamil, but it's even harder to find someone who speaks French and Tamil together, because non-English translation is still an area that needs improvement.” “Traditionally, people use human translators to create translation data,” she continued. “So we partition a bunch of texts from the web into all of these different languages and then our goal is to identify sentences that would be translation.” Then they set about identifying the language that text is in using FastText, a text classification system Facebook developed and open sourced a few years back, “It basically looks at some tests and it tries to decide what language it's written in,” Fan said. To start, the team employed CommonCrawl, which maintains an open repository of web crawl data, to collect text examples from around the web. “A lot of this is really building upon work that we've done for many years at research at Facebook, which are like all of the different Lego pieces that we kind of put together to build the system today,” Fan explained. To do this, Facebook had to collect a whole slew of publicly available data from around the world using a variety of novel techniques. You can disable notifications at any time in your settings menu. Using that, the research team trained a universal translation model with more than 15 billion parameters “that captures information from related languages and reflects a more diverse script of languages and morphology,” according to a Facebook blog post Monday. In all, FBAI has constructed an enormous data set consisting of 7.5 billion sentences for 100 languages. For example, there's plenty of regions in the world where people speak multiple languages, none of which are English, but the existing translation systems rely heavily on English-only data.” Of the billions of posts published daily in 160 languages on Facebook’s platform, two-thirds are in a language other than English, she noted.ĭubbed M2M-100, Facebook claims that it is the first multilingual machine translation model (MMT) that can directly translate back and forth between any pair out of a set of 100 languages. “So you are translating into all of the languages and across all of the directions that people actually want. “The major challenge is really, how do we take the translation systems we have, and then actually meet the demand of people around the world, Angela Fan, a research associate at Facebook AI, told Engadget. That’s why Facebook AI has developed a new MT model that can bidirectionally translate directly between two languages (Chinese to French and French to Chinese) without ever using English as a crutch - and which outperforms the English-centric model by 10 points on BLEU metrics. This is done because data sets of translations to and from English are massive and widely available but putting English in the middle reduces the overall translation accuracy while making the entire process more complex and cumbersome than it needs to be. However these systems typically use English as an intermediary step - that is, translating from Chinese to French actually goes Chinese to English to French. In fact, Facebook provides around 20 billion translations everyday for its News Feed alone. Whether you’re logging on from the US, Brazil, Borneo, or France, Facebook can translate virtually any written content published on its platform into the local language using automated machine translation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |