Helping machines translate: An interview with Dr Bill Byrne

Featuring: Audio Audio

Translating isn't easy for machines - but the FAUST project is using the community to try and help them get better at it. Bill Byrne explains how.

By: Gareth Mitchell (Guest) , Dr Bill Byrne (University of Cambridge)

Share on Google Plus Share on LinkedIn Share on Reddit View article Comments
Print

Listen

A sign poorly translated from French into English Creative commons image Paul Appleyard under CC-BY-NC-SA licence under Creative-Commons license Beware of the tractor - and the dodgy translation Copyright BBC

Read

 

 

Dr Bill Byrne

This is the first demo that we've set up for our project, so the name of the project is FAUST.  I won't go through the acronym, we needed to come up with an acronym, but it’s we have mix of university and commercial researchers, and our French colleagues at Softissimo have set up an interactive environment, a place called labs.reverso.net, that’s the website that anybody can use.  And the idea is all of the research translation systems are hooked into that, and the goal is we want people to look at the output and tell us which ones they like and offer corrections and suggestions.

Gareth Mitchell

And you're on the front page essentially now, so is this where we’re able to try out some translation and then give our feedback?

Dr Bill Byrne

Right, so the website that I'm going to is called labs, labs.reverso.net, and it’s hosted by Softissimo, who is our French commercial partner, and if you go to this page you'll see a window and the first step is to enter or paste the text which you wish to translate.  So I have typed in the text ‘I am here to do an interview with the BBC programme Click’.  And then, so I've typed in the text, which translated, and then I can select the translation direction and also like English into Spanish, and then I click translate.  And then I see four different translations.  One from the Reverso engines, one from the Language Weaver, our other commercial partner, their engines, and then one from Cambridge University, and one from TDLP in Barcelona.  And these are all different, and I have no idea…

Gareth Mitchell

Yeah, how’s your Spanish?

Dr Bill Byrne

My Spanish is abysmal.

Gareth Mitchell

This is why we have machine translation, but, I mean I'm not a Spanish speaker by any means but there are four translations of that sentence ‘I am here to do an interview with the BBC programme Click’.  And on first analysis there are some differences between those translations.  The one at the top is definitely very different to the other three.

Dr Bill Byrne

Yes, and you can see some small interesting things.  So, for technical matters, Click is a proper name, and we don’t, we would want to distinguish it say from clicking a mouse, and so here, the one commercial engine did recognise that, this is the Language Weaver engine, did recognise that it should be translated as a proper noun, so it keeps the upper case C, but it translates, it changes the spelling.  So it adopts the Spanish spelling, whereas we would have preferred it to keep an English spelling.  So let’s say that I want to correct this, and I’ll say programme Click, so I have here the original English and the Spanish, and I will change the spelling on click to read Click and then add a comment which is proper name.

Gareth Mitchell

And what you're doing here is typing into the feedback box.

Dr Bill Byrne

That’s right, yeah, so this is a very exciting thing for us.  This is the motivation for the entire, well, for nearly, for this first stage of the project.  So what we want to do is get the translations in front of people and have them offer corrections and suggestions, this is what we want.  We want to get human expertise inside the system.  So I've indicated that it’s a proper name.  I could also type in my email address; this is completely voluntary, people can stay anonymous if they wish.  But this way, if we have frequent users who identify themselves, we can learn to trust them.  So part of the problem is how do you know when people are providing good feedback?  If people come in at random, perhaps they're disgruntled, perhaps they don’t know much about Spanish, but if we have people who consistently offer good advice, we can learn to trust and incorporate their suggestions.

Gareth Mitchell

So it’s a bit like on auction websites, you can rate the reliability of different sellers, so you know who to buy from?

Dr Bill Byrne

Exactly, just like that, yes.  And so I've clicked it and it’s been added to the dictionary, you know, in theory, and this has then gone onto a very large database hosted at the French company’s website, and so the people inside the project get access to this.

Gareth Mitchell

Right, and so it means that the people who are running this particular translation engine, they’ve seen your correction, and so one assumes once it’s worked into the system, that from now on when somebody types in Click with a capital C, it will know it’s a proper noun.

Dr Bill Byrne

Exactly, exactly, and hopefully it’ll look a little bit about context, so that’ll be BBC programme and it will know in that context that Click with a capital C should be translated in this way, as opposed to say click at the start of a sentence, which would be an instruction to click a mouse of something like that.

Gareth Mitchell

It’s all about the context.

Dr Bill Byrne

Oh yes, indeed, context is the, is what drives the entire process.

Gareth Mitchell

So what this is picking up on is the way that these different machine translators work.  Can you just briefly talk through what goes on within one of these translation systems?

Dr Bill Byrne

The current dominant approach is to develop statistical systems, and what this means is the systems learn from very large collections of example translations.  So for instance in European languages, the European Commission does all its businesses in the language of the member states.  So, if you have a document produced in French, it’s translated into English, Spanish, Czech, all of the languages that are in the European Union.  And with this, we use these example translations to learn how new words and phrases and sentences should be translated.

Gareth Mitchell

And it does it all statistically then, so you mean if you give one of these systems enough documents where it’s able to look at say the English version and the French version, you do that for enough documents, then it sets up a lexicon and that allows it to then become more accurate with translation?

Dr Bill Byrne

That’s correct.  The lexicon is one of the things it learns, so it needs to learn how words are translated to other words.  There’s this notion of alignment, and the idea is an alignment indicates sort of translation equivalents across languages.  So you can think that for instance a book is a translation of another book, but that doesn’t mean that every sentence inside the book is a direct translation of every other sentence.  So it’s sort of a scale problem, you align things at a very coarse level, and then you move down and say okay I've got one book that’s a translation of another book, and then I look in there for paragraphs that are translations of other paragraphs and then sentences that are translations of other sentences.  And this is the alignment process.  It’s sort of how we take all this example parallel data and bring it into the system, and this is done statistically, so there are models that say what’s the probability that this sentence goes to that sentence, and then given that, what’s the probability that say a sequence of words in one sentence translates to a sequence of words in another sentence.

Gareth Mitchell

And by their very nature, languages have all kinds of anomalies that even the best statistical system is not going to pick up, and that’s where this all important interactive human element comes from with your work.

Dr Bill Byrne

Well, that’s right, yeah.  There are certainly anomalies between different languages, and this task is easier when translating between similar languages of course, so European languages, say romance languages translating between are easier than say translating from Chinese into English, or English into Chinese, which is even harder.  But the idea behind getting people in there is that it’s very hard to judge the quality of translation output.  It’s not something that you can do automatically.  Some tasks can be measured automatically, trying to pick up with something, I mean let’s say you're an investment banker, you can simply look at your portfolio to see how well you're doing.  If you have a trading strategy and you’ve done well, you have more money than otherwise.  But as we can see we have, in this example here, we have four different translations, it really requires human judgement to tell us which one is the best, and why it’s the best.  And so rather than pay professional translation services who could provide this, the idea is that we want to go out to the users who want something translated, so these are people who are interested in the task, they have something in one language, they want to get it into another language, and we want their judgement on how good a job we've done.  And hopefully by having interested judges we’ll get more feedback and more valuable feedback.

Gareth Mitchell

If you're translating into a language that somebody doesn’t speak, how are they then able to rate the success of the particular translation tool they’ve been using?

Dr Bill Byrne

Probably they wouldn’t be the ideal judges for that.  What we’re hoping for is that we’ll get a large number of people who ask for translations into a language that they're fluent in, and so they’ll be able to judge.  They would then be able to judge the fluency of the output.  So there’s two things we worry about in translation: a classic way of describing translation quality is in terms of fluency and adequacy.  So for instance adequacy would maintain all the information and the source language text; fluency would mean a fluent translation.  You could imagine a fluent sentence that omitted certain things in the original text.  So someone who’s fluent in the target language can hopefully judge the fluency of the translation, and hopefully they would have some idea about adequacy.

So in other words they could look at the source text that’s in front of them and if it was say a language written in roman text, you would at least be able to look to see that certain words and phrases were brought across.  But the idea is that by having these systems in many languages, we’ll be able to get lots of different combinations, so there will be some translations from Spanish into English, from English into Spanish, we hope to have both populations of Spanish people and English people looking at the translations to provide judgement.  But if you were to think of going in the other direction, let’s say you wrote an article in English and you wish to publish it in German, you could allow readers in Germany, or German readers, to offer judgement on it.  In other words you wouldn’t have to do the translation yourself; you could put it out there and then, say, allow people to go via this website to give their feedback.

Gareth Mitchell

And what about the reliability of your participants, is there a risk that people could game the system?

Dr Bill Byrne

Absolutely and it’s one of the major challenges in these sort of crowd sourcing or community based systems.  We have as one of our major aims to develop ways to avoid this.  The major problem is not so much malicious behaviour as people who offer unqualified judgements, you know, not all feedback is useful, and the systems have some knowledge about when they're doing well versus when they're doing badly.  For instance if they put out a long sentence they may say these five words are being generated with very high confidence and we can trust those words, and so we might then guide the users to give us feedback in areas where the system is doing poorly.  And/or if a person gives feedback in a high confidence region that disagrees with the system, we may decide well perhaps we’ll take a closer look at that user, and maybe wouldn’t trust them so much.

But one of the ideas behind this dictionary system, which I showed where it allows users to put in corrections, is it goes into not quite a forum but it’s a collaborative dictionary, so other people can look at the entries provided by people, and so there’ll be some checking.  So hopefully we’ll be able to learn who is reliable, and also sort of learn the signature, so we’ll look at user behaviour a little bit to determine what are cues for when a person is providing good feedback.

Gareth Mitchell

I guess you can also work out things like what browser somebody is using, so the language settings on that browser.

Dr Bill Byrne

Exactly, so if someone has their default settings to French, we say okay we’ll trust their French more than their English say, or say we would just trust their French.

Gareth Mitchell

And this is an ongoing project, and you're fairly early on in this project.

Dr Bill Byrne

Fairly early on, so it’s the first out of three years, just.  So what we have now is we have this mechanism pointing at the webpage, we have this mechanism set up where we can put our systems in front of people and now we’re starting to collect the data, and hopefully within the next year or so we’ll make our first attempts at having the systems respond to the corrections and suggestions that users give.  Ideally it should be instantaneous, because if you make a suggestion you want to see the system respond to you, you want it truly to be interactive, and hopefully by doing this, if people like it, they will give more data, so the more data we get, the better things work.

Gareth Mitchell

And do you have a way of evaluating the evaluation system?  Do you know how well this kind of crowd sourced interactive feedback mechanism is working?

Dr Bill Byrne

We have strategies at the moment, but this is a research problem, so building translation systems is a research problem.  Evaluating them is a research problem, it’s not a straightforward thing to evaluate a translation system, and it’s not a straightforward thing to evaluate the evaluation of a translation system.  So there are, this is a large international effort, not just our project, there are many people working on these problems, and they have competitive evaluations.  Where people come up with different schemes to evaluate translation quality, and they rank the schemes side by side, they have a set of translations and they have two schemes to rank them, and they put them in front of a user and said which one of these schemes did the best?  And so it’s constantly about going back to people and using them to tell us whether or not our automatic techniques are doing well.

Gareth Mitchell

The big question then, how good is it going to get?

Dr Bill Byrne

Well, it’s a very……we don’t know.  Translation is a human behaviour, and so what we’re attempting to do is mimic human behaviour, and it’s very sophisticated human behaviour.  So for some tasks we hope it’ll be quite good.  So we hope that it would translate say driving instructions quite well, we hope it would translate some news material quite well.  So when the domain is highly structured and is written in such a way to convey information, so like newspaper texts are meant to be read by someone who doesn’t know what they're about or has minimal knowledge.  So when the tasks are relatively straightforward, they are already capable of translating say web pages from online vendors into other languages, and people make buying decisions based on what they see, so this is good.  Whether or not you would want to, if you’ve written, if you’ve been slaving for ten years over a novel, whether you would want to trust that to a translation device is unclear.

Gareth Mitchell

It sounds then that it really, it’s down to what the user wants at the end of the day.

Dr Bill Byrne

It is down to what the user wants, and these statistical systems are completely driven by human behaviour.  So they work very well in applications and, application scenarios where there are many people already doing translations.  And so what we’re hoping is by tracking their behaviour that the systems will continue to improve.

Gareth Mitchell

So the bigger picture is opening up the internet to all of us, you know, regardless of what language we speak, will tools like this help in that, do you think?

Dr Bill Byrne

Well we hope so.  It’s a little hard to say because no one knows how and in what ways these will be adopted.  This particular website, the reverso website, is very popular with language users in Europe, so it’s a very frequently visited site with a great deal of traffic, and much of the feedback we get is from people who are trying to learn a second language, often students but language teachers as well, so we hope so.

Gareth Mitchell

Time will tell, I suppose.

Dr Bill Byrne

Time will tell, yes, yes.

(16’41”)

 

 

 

Bill Byrne shows Gareth Mitchell how user feedback can help shape the quality of machine translations for everyone.

This is an extended version of the interview broadcast on Click on BBC World Service Radio, 19th July 2011.

Find out more

More like this