Whether you're simply converting all-caps text to lower case to spare delicate ears, or trying to make everyone sound like the Swedish Chef, simple translation software has been a popular theme of computer environments for years. More sophisticated solutions exist as well. This article looks at the technical issues involved in translating chat messages in Second Life.
While there's certainly plenty of industrial-strength translation software out there, my first notion was to use a simple command-line one. A command-line app that requires little setup is easy to configure and check, and likewise easy to incorporate into another program. I picked Linguaphile (see Resources). While it'd certainly be possible to incorporate one of the web-based translation services, or one of the heavier-duty translation packages, Linguaphile has the key advantage that it is very simple to configure, allowing me to focus on the Second Life viewer code, rather than on translation software. It also has the key advantage over some services that it is free software, available for immediate download, and its license is permissive.
Linguaphile had no installation or build process; it's just a bundle of files and a perl script which uses them. If you run it from the directory the archive is unpacked in, it just works. So the build took a total of about 0 seconds. This was a definite strength.
The Second Life documentation for the chat system is incomplete, as of this writing. The code is mostly found in the llviewermessage.cpp file, which handles messages coming in from the simulator, and in llchatbar.cpp, which handles outgoing messages.
Listing 1. Processing incoming messages (llviewermessage.cpp)
void process_chat_from_simulator(LLMessageSystem *msg, void **user_data)
{
[...]
char mesg[DB_IM_MSG_BUF_SIZE];
[...]
|
And sure enough, there's the message, which gets unpacked into a small buffer. For starters, just to confirm this, I modified the program to smash the message into all lowercase; it's easy to see whether a message has been changed, and I always worry about going deaf from all the people online who shout habitually. This is presented as a self-contained code block because nothing outside this block ever needs to see any local variables used along the way:
Listing 2. Smashing case
msg->getStringFast(_PREHASH_ChatData, _PREHASH_Message,
DB_CHAT_ MSG_BUF_SIZE, mesg);
{
char *s;
for (s = mesg; *s; ++s)
*s = tolower(*s);
}
|
The getStringFast call is from the existing code; I include it so you can see where the modification goes. After a quick rebuild (well, not so quick; I see why they use a distributed build system), I got to test this out. Sure enough, I logged in on the rebuilt client, and said something; my client, receiving the message back from the server, translated it to lower case. That test confirms that this is the right place to perform translations. Next up is the task of running a subprocess; unfortunately, this is not a task you can easily perform cross-platform. On UNIX® or UNIX-like systems (such as Linux® and Mac OS X), it's very simple. First, let's just look at the minimalist call to an external program.
Listing 3. Smashing case the hard way
{
int rfd[2], wfd[2];
int pid;
pipe(wfd); // to child
pipe(rfd); // from child
if ((pid = fork())) {
/* close child's side of file descriptors in parent */
close(wfd[0]);
close(rfd[1]);
/* send message, close stream */
write(wfd[1], mesg, strlen(mesg));
close(wfd[1]);
/* read response, close stream */
read(rfd[0], mesg, DB_CHAT_MSG_BUF_SIZE);
close(rfd[0]);
/* wait for child process */
waitpid(pid, NULL, 0);
} else {
/* close parent's side of file descriptors in child */
close(wfd[1]);
close(rfd[0]);
/* close existing standard input and output */
close(0);
close(1);
/* reassign pipe to standard input and output */
dup2(wfd[0], 0);
dup2(rfd[1], 1);
execl("/bin/tr", "tr", "A-Z", "a-z", NULL);
}
}
|
This may seem a little obfuscated, but it makes perfect sense once you know what it does. Each call to pipe() makes a pair of file descriptors; data written to fd[1] can be read from fd[0]. Our program needs to do two things; first, it needs to write a message to the external utility program, second, it needs to read a response back. That requires a pair of pipes. I named them "wfd" (write file descriptor) and "rfd" (read file descriptor) respectively. In the parent program (which receives the child's process id as the return from fork), the unused halves of the pipes are closed, the message is written, and the file descriptor is closed (this makes the client detect EOF after it reads the data sent). Meanwhile, the child closes the other halves of the pipes, then uses close and dup2 to map them onto standard input and standard output. The child executes an external program; in this case, "tr".
The tr utility reads until it reaches EOF (getting the message we sent), sends the data back converted, and exits, closing the file descriptor. The parent process reads everything available up through EOF back into the message buffer, calls waitpid() to reap the now-deceased child process, and continues.
This is just a proof of concept, but now, any program you can call that converts input to output can be used instead of tr. For instance, the aforementioned translating program.
It would seem like it would be totally trivial to patch in linguaphile. It was close, but there were a couple of problems. One obvious problem is that the original code, for smashing case, simply assumes that the output message is the same length as the input message. This is not one bug, but two! The first bug (and I hope it's the one that leapt out at you immediately) is that a message which becomes longer could overrun the message buffer, allowing a carefully crafted message to smash the stack and potentially execute dangerous code. But what happens in the case where the translation is shorter? In early testing, using German to English, the German word "mich" got translated to "mech". Why? Linguaphile's translation ("me") was being overwritten onto a buffer containing "mich"; the result was "mech" (emphasis mine).
The corrected read code is simple enough:
Listing 4. Terminating the buffer
len = read(fd[1][0], mesg, DB_CHAT_MSG_BUF_SIZE - 1);
mesg[len] = '\0';
|
The "-1" is to avoid smashing another object with the terminating null byte; it would also work to just make the buffer a byte longer.
The code to call linguaphile is simplicity itself:
Listing 5. Calling linguaphile
chdir("/home/seebs/ic/linguaphile-0.2");
execl("/usr/bin/perl", "perl", "linguaphile.pl", "-q", NULL);
|
The chdir call is used so that linguaphile can find its distribution package materials; without these, it won't run. The "-q" option suppresses the initial message identifying which languages are being translated. In testing, I just set the default source language in the code. For a production environment, you'd obviously want to specify those as arguments, and provide some way to change the translation layer.
Linguaphile's a fairly simple word-for-word translation program; it just passes unknown words through without alteration or marking. However, it did the thing that was most useful for getting this in place; it worked as a command-line app out of the box. If you desperately need to talk to someone whose language you don't know, an application like this could be enough to get you stumbling through.
One of the frustrations in pitching open source is the difficulty of convincing people not yet familiar with it that it matters. I just took a major application consisting of thousands of lines of code, and got a working proof of concept of a plug-in translator architecture working. It's pretty rough, but if I desperately needed to talk to someone in a foreign language, this might be enough to get a few concepts across. It also changes the nature of the problem; now that a way to translate incoming messages has been established, working on improving the translation is simple and nicely modular; the external program can be any program, although you'd want a responsive one; a web-based service might be problematic.
The version presented so far is far from production code. It runs only on UNIX platforms (Linux, OS X, FreeBSD; not Windows®). It's woefully inefficient, spawning an external application for every line of text it processes. There's no runtime user configuration, and it doesn't provide for any translation of outgoing text. It's a proof of concept.
Each of these issues can be addressed. The spawning overhead is harder to fix than you might think; the default behavior of handling input and output data in blocks makes it easy for the application to enter a deadlock waiting for linguaphile to send data back, while linguaphile is waiting for more data from the viewer. The hard part is that there's no trivial way for the viewer to know when the client is done sending data, without waiting to see if more data come along. Inserting an arbitrary delay is hardly ideal, either. A real solution to this would require, at the minimum, the addition of some kind of sentinel value (probably a newline) to the message, and then checking for the sentinel value while reading data from the pipe. Another solution would be to prefix incoming data with its length. All of these are more complicated; the big appeal of the original version is that the read system call will always return once it hits EOF, which is generated automatically when the client program completes and terminates.
Runtime user configuration might be a little easier to deal with. Since the translation program is already messing with incoming requests, it makes sense to add it to outgoing requests, and along the way add a feature of intercepting IRC-style "/" messages. By convention, assume that you always want to type and read in the same language. So, all you need is a pair of language settings; one for the language people will be talking to you in, one for the language you'll be talking in. I nominate /speak and /hear, which were (as of this writing) not already in use as commands in the existing emote system.
The code I am about to present is ugly in a few ways. It works, but it's not a clean solution. The focus here was on minimal intrusion into the Second Life code, not on an elegant design. Still, it works. The goal was to add two-way communication to the Second Life chat system; that is, translate outgoing material into a language, and incoming material from it. The interface I designed was to add two "emotes", /speak and /hear. The usage of these is ambiguous; "/speak" is the language you type and read in, "/hear" is the language that other people send to you, or receive from you. If you "/speak en" and "/hear de", you write "book" and the other party hears "buch"; if they say "buch", you hear "book".
My solution to keep this isolated was a single function with static local variables to hold the language type. The function is a close relative of the code previously presented in llviewermessage.cpp:
Listing 6. The translation function
void translate_in_place(const char *newhear, const char *newspeak, int out, char *mesg)
{
static char hear[3] = "de", speak[3] = "en";
int rfd[2], wfd[2];
int pid;
if (newspeak)
strncpy(speak, newspeak, 2);
if (newhear)
strncpy(hear, newhear, 2);
llinfos << "speaking " << speak << ", hearing " << hear << "." << llendl;
if (!mesg) {
return;
}
pipe(wfd); // to child
pipe(rfd); // from child
[...]
chdir("/home/seebs/ic/linguaphile-0.2");
execl("/usr/bin/perl", "perl", "linguaphile.pl",
"-s", out ? speak : hear,
"-d", out ? hear : speak,
"-q", NULL);
}
}
|
The most significant change to the actual translation logic is the addition of the -s (source) and -d (destination) flags to linguaphile. These are set based on the current values for the hear and speak variables, and whether the message is "outgoing". The rather awkward calling sequence lets you specify languages with one call, or a message to translate with another. It's ugly, but it doesn't pollute the global namespace. The assumption that all language codes are two letters is actually wrong, but many of them were, and it was good enough for testing. There's no real error-checking here; it might make sense in production to add return codes to translate_in_place and some kind of diagnostic output for the user. The essential algorithm is clearer without it, though. The code which calls this to set these variables goes in llchatbar.cpp, in the LLChatBar::sendChat() function:
Listing 7. Handling the /speak and /hear emotes
std::string utf8text = wstring_to_utf8str(text);
/* Check for translation hack */
if (!strncmp(utf8text.data(), "/speak ", 7)) {
translate_in_place(NULL, utf8text.data() + 7, 0, 0);
utf8text.erase();
} else if (!strncmp(utf8text.data(), "/hear ", 6)) {
translate_in_place(utf8text.data() + 6, NULL, 0, 0);
utf8text.erase();
}
|
This code is ugly beyond words, but, as long as useful arguments are given to it, it does what is intended; it modifies the "hear" and "speak" values in the translation function. Finally, a call to the translator goes later in this function. The utf8_revised_text variable holds the text as modified by the viewer's native "emote" system which detects strings like "/smoke" and turns them into visible actions.
Listing 8. Translating outgoing text
utf8_revised_text = utf8str_trim(utf8_revised_text);
{
char mesg[DB_CHAT_MSG_BUF_SIZE + 1];
strncpy(mesg, utf8_revised_text.data(), DB_CHAT_MSG_BUF_SIZE);
mesg[DB_CHAT_MSG_BUF_SIZE] = '\0';
translate_in_place(NULL, NULL, 1, mesg);
utf8_revised_text.assign(mesg);
}
|
With this code in place, the system does what you expect; on the default settings, if you use a word that linguaphile can translate, the word is translated. Other words are left alone. You might not realize it's working right away, as your translated words are translated back when the game echoes your statements! I did find a corner case, though; with the default English/German translation, where "table" gets translated to "tisch", but "tisch" doesn't get translated back to table.
The code's still rough; it needs error checking to indicate what languages are known or unknown, and of course, it wouldn't hurt to buff the translation engine. However, the essential goal, of creating something that could let you talk to someone who doesn't speak your native language, without any effort on their part, has been met. Don't use this in production; if you give it an invalid language code, it may well simply eliminate all incoming or outgoing text, or otherwise act up. There's no real handling of translation problems, and there's no error-checking. This is a proof of concept; don't rely on it in real life. (Or second life, for that matter.)
Looking around the system and modifying it has been enlightening. The Second Life implementation seems to be fairly well organized, but not perfectly. For instance, the logic to handle users using "/me" in messages to indicate actions ("/me says hi" turns into "Yourname says hi") occurs in more than one place. Still, it's consistent and fairly easy to find. Names are well-chosen and code doesn't generally try to poke around in class internals.
That speaks to a good development model, and one well suited to an open source release. Modules are modular; that's a good thing. It's fairly easy to find the right place to make a given change, and the system provides a broad range of helpful tools, such as the llinfos stream. Years of playing around with other peoples' code have left me generally skeptical, but in this case, the experience was rewarding and fairly fun. Compared to other projects I've seen, the code is quite good; I find myself wondering if one of the biggest objections to open source isn't just that some companies don't feel their code would withstand careful scrutiny.
Learn
-
Read more Second Life articles on developerWorks.
- The
Second Life Open Source portal
is the place for information about the open source Second Life client.
- The
Second
Life open source client build instructions for Linux;
note that these are stored on a wiki!
- The
SCons home page has a fair amount of
information, and even a little evangelism.
-
Linguaphile is a simple
translation program.
- Wikipedia reveals that JFK never actually
claimed to be a
jelly doughnut,
funny though it would have been.
- In the
developerWorks
Linux zone,
find more resources for Linux developers.
- Stay current with
developerWorks
technical events and Webcasts.
Get products and technologies
-
Order
the SEK for Linux,
a two-DVD set containing the latest IBM trial software for Linux from DB2®,
Lotus®, Rational®, Tivoli®, and WebSphere®.
- With
IBM
trial software,
available for download directly from developerWorks, build your next development
project on Linux.
Discuss
- Check out
developerWorks blogs and
get involved in the
developerWorks
community.
Comments (Undergoing maintenance)






