Monday, 10 May 2004

Translating

I was thinking on the drive home tonight again about how I really wish that I could speak French. This probably came from the fact that I saw Seducing Doctor Lewis on the weekend (excellent movie, go see it). So then I got to thinking....

How would you get a program to recognize what is correct French / English, or how to translate it? It can be very hard to program all the rules and exceptions, and then it's still not going to be perfect. So I was thinking how my email filter finds spam. It does it based on stats on what is recognized as being spam before, and makes a guess of what the new email is. Could we sort of make something like this for languages? Start feeding in simple words, then short sentences, and then larger ones? Let the computer make a guess as what it is, and then make corrections as you go. I'm sure that I'm not the first to think of this and it's probably a bunch of people's master's / PhD thesis, if this isn't how programs like Google's language tools already work.

And there's lots of translated literature that we can feed through the program to "teach it". I'm sure that this solution is way too simple and I don't fully understand the problem, but I can dream can't I? That's how we got here. ;-)
Listening to: Chicane - Don't Give Up


9 comments:

  1. The one huge gigantic problem is that computers don't understand *context*. It will take some serious AI before that's possible.

    ReplyDelete
  2. I agree with Ryan. Language is too Artsy to ever be understood with computers. It's like asking a computer to describe what a painting means. There are many ways to say one thing, and there is one way to say many things. Forget computers getting language right. I just wish that humans could get it right.
    For shizzle, my nizzle.

    ReplyDelete
  3. What we really need is a sort of syntax (I hesitate to call it a language because humans likely wouldn't read or write it) that has no context problems. Each word (combination of symbols or letters) would be unique for a specific context and there would be no ambiguousness.
    Then you could take text in that language and translate it into any other language on Earth.
    So if an original document was in English, it would have to be disambiguated into this language I'm talking about and then it could be translated into any other language without problems *automatically* with a computer.
    Another problem with languages, however, is that some words involve customs specific to a culture and so don't have a literal translation in another language. Japanese is a language that has a lot of cultural words in it. When you learn Japanese you also have to learn the Japanese culture.

    ReplyDelete
  4. Having an in-between language is what I was thinking about, even if I didn't express it well. Sort of like java byte code for languages. It would work better for scaling. ;-)

    ReplyDelete
  5. Three words: Revert to Latin.
    PS.
    German would be a nice compromize.
    Both languages are more formal about what role each word has in a sentence (sentance). Thus helping a heck of a lot in defining context.

    ReplyDelete
  6. JP, I love your Latin idea!
    I am always so amazed, when singing Latin songs with my choir, at the translations between the English and Latin. The Latin will use so few words! Granted, most words are a bit longer ...
    Here, I counted the words from one song:
    59 English to 35 Latin words in the song "Tantum Ergo" (Down in Adoration Falling) nearly (but not quite) a 2:1 ratio ... crazy!

    ReplyDelete
  7. Just because a language uses fewer words does not mean it isn't ambiguous. In fact it may be more ambiguous than English.

    ReplyDelete
  8. Ryan: perhaps. I don't really understand latin well enough to know if this is the case. I do find it interesting that the language can express the same idea as english, but with less words.

    ReplyDelete
  9. Yeah that is interesting. In the case of bible passages and hymns, it could just be a result of the fact that the original language is probably Latin and the English translation is a "kludge" meant to either express the same ideas while fitting the rhythym of the same song properly. It would be interesting to see if an English to Latin translation uses less or more words or suffers from the same problem.
    After studying even basic first year lingustics and the origins of languages and how they morph and come about, I think it's fair to say that no language on earth is free from ambiguity or overlap in any given context ... it depends how important the words are to the culture. Languages are free-forming and constantly moving targets determined by the people that use them.
    Like how eskimos have 7 different words for snow, depending on the exact type. We just call all seven of them "snow", or modify the word snow with another word (ie. "powder snow" as opposed to "packed snow"). In which case we're using two words and being no less ambiguous.
    Fascinating subject though.

    ReplyDelete