February 16, 2010

Translations and locale #2

Did you know that there is no generic tool to extract translation strings from non source files (anything else not related to some programming language, like .desktop files)? Neither did I, until I tried to find one.

Yes, I know about Intltool, but it is too much tied to Automake that makes it pretty cumbersome outside of it. In README, there is short how-to about Intltool usage outside autotools, but the solution simply cries for something better.

Now, taking into account how Intltool is mostly used by default from autotool users, I checked what autotool non-users has to say about this problem, primarly focusing on KDE, as they are using CMake.

Yes, they have solution and is not intltool, but after seeing it, intltool seems to me as the best choice I can get. Besides a full python framework dedicated to this problem, they have a help from cron script that nightly (or at some part of day or week) walks over repository merging and updating translations in .desktop files and presumably, documentation and other like files.

100 heads are smarter than one, but why this (their) solution looks to me pretty cumbersome? I mean, why someone couldn't made something that rest of the world can use, no matter are utilized make, cmake, scons or jam? Clearly, translation problem is not simple as it looks like.

The clear example was/is bad state about general translation are tools we had before. Not counting GUI frontends (like KBabel) which are mostly copies in different toolkits, solutions in form of web frontends (or for God sake anything else smarter than just calling xgettext in background) we only had in form of Pootle project, famous for it's slowness and memory usage.

Now you know why I see (and presumably others) Transifex project as gift from gods. Count to that free service they put on transifex.net for your project and we, users and developers, could not be happier.

But Transifex is still frontend... if we are going to find the root of this problem, maybe we should start from gettext tools. Hell, even xgettext (part of gettext chain) has differences between GNU and OpenSolaris version (presumably the same is for other Solaris-es). Sun version of xgettext is so dumb you can't even specify extraction keyword, so you must use 'gettext' instead e.g. '_' tag we are used to.

On other hand, GNU xgettext is no perfect either: it tries to be smart where it should be dumb. GNU xgettext can recognize programming language of source it scans and can extract translation strings in the way is best in that language. But, sometimes you want different things.

Recently I tried to add translation feature to theming engine in edelib, which includes some Scheme code, inspired with translation tags found in GIMP. Basically I wanted it looks like this:

 (display _"This text will be translated")


but GNU people visualized how things should be in more lispy like:

 (display (_ "This text will be translated"))


so there is no way you can force xgettext to see the first example as valid scheme source code without hackery with sed and shell.

Count to that how Intltool (which uses xgettext) does the same job as xgettext (extract strings from source code), but in own specific way, you simply lose any desire to do anything related to translation in your project.
Tags: EDE