Specialized tools for translating and publishing OER is one of the possible uses of an API for publishing to open education repositories. Repositories may have general purpose editors for creating content, but they aren’t likely to have great facilities for translating content.
Carl Scheffler and I spent some time in the Geneva airport investigating whether Google Translator Toolkit could be the translation editor of choice for Connexions modules. Translator Toolkit has to be convinced and helped along, however, because it was designed for HTML (web pages), rather than for the structured XML format of Connexions' modules. It just might be possible, however, and advice and comments would be most welcome.
The workflow would be just a bit more complicated than the normal route for translation and would look something like this:
- Find a module that you want to tranlsate on Connexions and record its ID. Lets say the module is Electric Circuits - Grade 10, http://cnx.org/content/m32830/latest/. Then the id is “m32830”.
- Open Google Translator Toolkit and select a URL something like this: http://www.coolhelperservice.org/cnxtranslate/m32830. This would fetch the module in a format that Google Translate can use well.
- Translate it using the Translator Toolkit.
- Save the file to your laptop.
- Go to something like http://www.coolhelperservice.org/cnxpublish and upload the saved file. Fill out a bit of information and then push a button to sign the license and publish it to Connexions.
Does that work flow seem reasonable? Is there a better work flow that you can think of and suggest?
Some technical details for those that are interested. Those that aren't can safely stop here and still be able to give feedback on the process from a translator's perspective.
Google Translator Toolkit doesn't work with XML formats. But Connexions does produce an HTML format for modules that can be be converted back into Connexions XML without any loss. So the “coolhelperservice” needs to retrieve the module, format it in HTML for the translator toolkit, and then do the opposite transform (HTML → CNXML) on the way back into Connexions.
To get the HTML for the body of a module from Connexions, you append “/body” to the module URL. And the module metadata (title and such) is available by appending /metadata to the module URL. So with the module ID, the “coolhelperservice” can put together a nice package of HTML for the translator to use, and still be able to reconstruct the XML to publish the translated version.
One tricky bit is that Google Translator Toolkit makes a mess of the mathematics that comes in from Connexions, so the math has to be protected somehow. Carl and I experimented with a few ideas for how to do that, and toolkit didn't cooperate with most of those, but Carl came up with the idea of putting all the math into an HTML id. Amazingly, that worked. It comes out all escaped, but that is good enough. (Toolkit won't keep around a random attribute, so “id” was the way to go). Carl is pretty sure that there is a webservice that will take a snippet of mathml and give back an image. He is going to investigate that further. So in principle, you can stuff the math into an image ID (so it doesn't get lost) and replace the math with a URL to this service that will render the math. The translator won't be able to translate words that were inside the math, but Carl had previously looked around and that isn't very common, so this might just be good enough.
At the end, the “coolhelperservice” will use a publishing API (SWORD V2) to publish the translation back to Connexions. Implementing that API is part of my fellowship work so it is coming later this year. There will have to be a bit of license signing back at Connexions, but the “coolhelperservice” can make that smoother also.
I think something like this could work. What do you think? And did we miss some clever idea or service that could be of help? Actually, I am sure we did since this was a 2 hour experiment. So send help, advice, etc. Carl will keep investigating, and maybe we will have some screenshots to clarify all this for a future post.