Monday, November 17, 2008

Fresh import

Fresh import of the translations has been uploaded to the server. By the way, I introduced the following changes to the service:
  • After several long outages in September, I had to adopt additional measures to prevent the server from going down with every peak of usage. I am proud to announce to you that open-tran has been working constantly for 58 days without even a single outage (see uptime statistics) despite the growth of the number of visits.
  • Frisian and Galician versions of the pages were updated.
  • Kannada was added to the list of supported languages.
  • Yves Savourel contributed a Java example for the use of open-tran's API. He also suggested introduction of suggest3 method for limiting the number of returned records.

Saturday, August 30, 2008

New Layout: Consistency Matters

I've just uploaded the prototype of the new layout to the server. You can see it by visiting the following URL: http://open-tran.eu/front.shtml.

I'd like this page to become the new front page of open-tran.eu. It is based on this set of ideas presented in the Google Redux. You may call it a shameless plagiarism, but (as far as I know) it has never been realized. I haven't asked the author yet, but I hope that he will not object.

I had the following objectives while implementing the page:
  1. Due to the new compare functionality, the page needs to have an unlimited width, such that it might be scrolled horizontally without breaking the layout.
  2. The new front page has to be very simple - just the search box - like... you know... Google, or something.
  3. The contents of the current front pages will be moved to "help" (but I haven't done that, yet).
  4. And finally... I like blue :)
I need your help again. I had to introduce some new phrases and am not 100% sure if I translated them well, so I'd like you to verify if they are correct. Obviously, I hope for more feedback and suggestions.

Known Issues
I'm not an HTML guru - it took me a lot of time to learn it well enough to bring the page to the current state. Any help will be appreciated. Furthermore, I might have done some fundamental mistakes. If you have better ideas on how to improve the site, just let me know.
  1. You cannot select the mode yet. The search box doesn't work either. I am working on the form and the accompanying JavaScript.
  2. For some reason, the search box border has a proper padding in Opera and IE, but not in Firefox.. I've got no idea, why.
  3. Internet Explorer doesn't render the blue box at the top and has problems with transparent backgrounds, so the language choice looks stupid.
  4. In some setups, the list of language choices renders incorrectly - the longest line (Biełaruskaja) breaks.
  5. I was not able to position all the elements exactly where I wanted. The arrow (↔) is not vertically aligned, the drop down lists are not in the right positions, the submit button should probably be lower.
  6. The logo is terrible, there is no real favicon.
I will be mailing those, who translated the page, but I know that some of them read this blog, so beware ;) If you have other questions or ideas, post a comment or send an e-mail to the mailing list.

Saturday, August 9, 2008

Articles in various languages

I need your advice again. In an effort to further reduce the database and provide more accurate results I am trying to exclude various types of "words" (lexemes or tokens to be precise). First, I removed all format specifiers, like %s. Next, I decided to reduce the number of words that don't carry any meaning. An obvious example from English language is the article "the". Looking up "the" in open-tran won't display any results, because the engine considers this phrase empty (no words). However, I speak only English and German. Together with my girlfriend and Wikipedia we prepared the following list of articles in several European languages:
CodeLanguageList of ignored lexemes
deGermandas, dem, den, der, deren, des, dessen, die, ein, eine, einem, einen
enEnglisha, an, the
esSpanishel, la, las, los, una, uno, unas, unos
frFrenchla, le, les, un, une
itItaliani, il, lo, gli, la, le, un, uno, una
plPolishby
ptPortugueseo, os, a, as, um, uns, uma, umas
I am planning to add Dutch articles (de, een, het) in the future, too. As you can see, my table covers only 7 languages and open-tran supports more than 90. So I hope that maybe you could help me assembling similar lists for the remaining languages.

I am aware that suffixes may be used as articles in some languages (e.g. Romanian) and if you have any idea on how to tackle this issue without integration of expensive, language-specific dictionaries, then let me know.

If you have other ideas on how to improve the accuracy and/or limit the number of records stored in the database - I will appreciate your feedback. Leave a comment here, or send an e-mail to open-tran@groups.google.com. Thanks!

Wednesday, July 9, 2008

Compare

I added new functionality to the service. Now you can compare the translations of phrases between the projects. Right now, the only way to see the comparison table is to enter an appropriate URL (or use the API). Here is an example URL: If you follow it, you will see the table with suggestions grouped by the projects, so that you can see the differences between their translations.

Do you have any idea, how I could put in on the page? Any suggestions for improvements?

Sunday, July 6, 2008

Module names and fuzzy translations

I have just finished the implementation, testing and processing of the newest release and am currently deploying it on the server. Below is the screenshot that depicts the new features: module names and fuzzy translation indication.
screenshot
I am writing about it today, but you won't be able to see any of the results before tomorrow, when the databases are copied to the server (I've got an ADSL link at home and will need around 10 hours to send 3GB over it). I will try to restart the service before leaving to work.
UPDATE: The new version has already been deployed.

Sunday, June 1, 2008

Off-line mode

The off-line mode is ready for download. Here is what you need to take advantage of it:
  1. Download the latest release of the scripts from this location.
  2. Create .open-tran subfolder in your home directory.
  3. Download nine-en.db (85MB) and place it in the .open-tran folder.
  4. Download a database for your language. To do this, you need to replace en with your language code (e.g.: nine-pt_br.db for Brazilian Portuguese or nine-es.db for Spanish) and place it under .open-tran.
  5. Extract the scripts and enjoy :)
You will find 2 executable scripts in the tarball:
  1. suggest.py is a central part of the web service. You can use it with other Python scripts (the TranDB class has the methods described in our developers' corner). But at the same time, it is a command-line tool for retrieving suggestions. You should run it like this: suggest.py "do you really want to quit" pt_br.
  2. open-tran.py is a GTK tool that opens a po file and provides you suggestions for them. It is a very old tool that I wrote more than year ago. I decided not to develop it any further once I learned that PyGTK is one-threaded. Yesterday I tweaked it to support the off-line mode, but I'm not going to maintain it - it was always meant to provide an example.
It is too late today, I will try to write more in the next days. Anyway, if you have any problems with it, if something crashes, if you need some improvements or don't understand how it works, if you need assistance, send an e-mail to open-tran@googlegroups.com.

Friday, May 16, 2008

Languages and cultures

It turns out that there is a lot of inconsistency between the projects when it comes to the naming conventions of languages and cultures. So I thought that maybe someone could help me. Right now I am making the following assumptions:
  1. For any language code ab: ab_AB is the same as ab
  2. fy_NL is the same as fy
  3. ga_IE is the same as ga
  4. hy_AM is the same as hy
  5. nb_NO is the same as nb
  6. nn_NO is the same as nn
  7. nds_DE is the same as nds
  8. sv_SE is the same as sv
  9. ur_PK is the same as ur
However, I was not able to determine, if the following language codes are really the same:
  • bn_in and bn
  • gu_in and gu
  • pa_in and pa
  • no and (nb or nn)
  • nds_NFE and nds
I suppose they are the same, but I wouldn't like to offend anybody, so I put those on hold. If my reasoning is wrong, please let me know, so that I could fix it.

Thursday, May 15, 2008

More updates

I'm going nuts with KDE's anonymous svn access. It is sooooo slow. I am trying to upload the latest version of translations and once a while it just stalls... I have to regularly kill the subversion client, then run cleanup. And it seems that they are throttling the bandwidth to about 10K/s. Anyway, I managed to download and run the first step of import for openSUSE and XFCE, so you can expect them in the nearest future with the updated translations for KDE (this time KDE4), GNOME and Mozilla. I have also automated the updates, so I should be able to update the database more frequently.

Furthermore, I have created a discussion group for anybody interested in open-tran. If you have any suggestions or questions, you can post an e-mail to open-tran@googlegroups.com. You don't have to be a subscriber to post messages.

And by the way, I've got two questions for you:
  1. Have you noticed any performance improvement after the last upgrade?
  2. Would you download the database for off-line use (~100MB)?

Sunday, May 11, 2008

Implementation

Eventually, every plan needs to be executed. I am very proud to be able to let you know that my plan has finally been implemented and the new version of open-tran has been deployed on the server. I hope that the response times will be better now. I am now working on the rest of my plan: soon you should be able to download the database for off-line use and some new translation sources should be added. Keep your fingers crossed!

BTW: I updated the python snippet, which included the wrong URL for accessing the service via XML-RPC. Only Python implementation adds /RPC2 automatically to the server's address. Please visit our developers' corner for more details.

Sunday, May 4, 2008

Plans

I thought that you might be interested in the plans I have regarding this service and the directions of its development. Recently I started digging in the code and (since every test takes some time) I decided to give you an outlook on what I am working on right now. So here we go:
  1. Monitoring. Sometimes things just don't work. And open-tran.eu may go down once a while due to unknown reasons. Few weeks ago I decided to start monitoring the website in order to provide constant service. There were few bogus outages, but it already proved useful in April, when the site went down. In the right pane of this blog you will find a button that leads to public uptime statistics for the site:
    Website Monitoring by ServiceUptime.com
  2. Better performance. I decided to reorganize the database a little bit and make it more compact. Right now I am working with a set of 3 languages on my local computer and the results look promising. If everything goes right, I should upgrade the service in the coming week.
  3. Off-line mode. New database schema will allow downloading an arbitrary set of languages for off-line use. One language pair will require approximately 100MB space and will provide exactly the same functionality (but using a desktop app). I will provide the databases and a set of console tools for updating the database and accessing the translations. The complete documentation will also be posted on the website.
  4. openSUSE translations. The openSUSE translations were brought to my attention by their maintainers. Adding them to the database looks like a piece of cake, so this is going to be the first thing I will do as soon as the upgrade is complete.
There is one small disadvantage of the new release. It won't be possible to trace the translation back to its source, because this information is dropped in order to make the database more compact. You will still be able to see which project uses this translation and how many times, but that will be all. If you have any comments or other ideas, just post a comment or send me an e-mail.

Wednesday, January 16, 2008

New Design

Finally, I have found some time to redesign the site. Some features have been disabled for now and some links don't work correctly. Nevertheless, I think that the new version is much better. You don't have to encrypt source and destination languages in the domain name anymore, you can choose both languages from drop-down lists. Your choice of locale is now going to persist, regardless of source and destination languages. Let me know what you think about the new layout.