Actions

Developer Area/Language Pack Generation: Difference between revisions

From Mahara Wiki

< Developer Area
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Two scripts in the mahara-langpacks directory of the [https://git.mahara.org/scripts/mahara-scripts mahara-scripts repository] keep the Mahara translations up to date:
Two main scripts in the <code>mahara-langpacks</code> directory of the [https://git.mahara.org/scripts/mahara-scripts mahara-scripts repository] push new English language strings from Mahara into Launchpad, and then pull non-English translations out of Launchpad, to publish them on langpacks.mahara.org.
# update-pot.sh polls the Mahara code for changes to strings, and pushes those changes to the mahara-lang project in launchpad
 
# <code>update-pot.sh</code> polls the Mahara code's <code>htdocs/lang/en.utf8</code> directory for changes to strings, converts Mahara's lang PHP files into a <code>.pot</code> file, and pushes this update file into a Bazaar branch in the <code>mahara-lang</code> project on Launchpad.
# Launchpad periodically imports the English-lanuage .pot file from the Bazaar branch, and uses it to populate its web-based translation interface for all the other languages.
# Launchpad periodically exports the latest translation data for all languages, into a separate <code>.po</code> file for each language, and publishes these onto another Bazaar branch in the <code>mahara-lang</code> project.
# langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org.
# langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org.


mahara-scripts also has a debian/ directory, which creates a package called custom-site-mahara-langpacks_x.y_all.deb. This package installs the above scripts and their dependencies, and sets them up to run on cron.
<code>mahara-scripts</code> also has a <code>debian/</code> directory, which creates a package called <code>custom-site-mahara-langpacks_x.y_all.deb</code>. This package installs <code>update-pot.sh</code> and <code>langpacks.sh</code> and their dependencies, and sets them up to run on cron. (Currently the scripts are installed and run on the same Catalyst IT servers that host langpacks.mahara.org itself.)
 
The following is a general summary of what these scripts are trying to do.
 
==Generation of .pot files for Launchpad==
 
* The <code>update-pot.sh</code> script runs once a day as the maharabot user (at 7:52AM NZDT)
* Checks current branches at https://git.mahara.org/mahara/mahara.git for updates to English language files.
* If there have been changes, runs a php script called <code>php-po.php</code> to generate a single mahara.pot file (for each branch) from the <code>lang/en.utf8</code> directories in the mahara HEAD commit.
* On the <code>master</code> branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos).
* Pushes updated pot and po files to <code>lp:~mahara-lang/mahara-lang/<branch></code>, where Launchpad will import them into its web-based translation interface.
 
The package takes care of most of the necessary dependencies except for the '''maharabot user's ssh key''', needed for the bzr push to Launchpad.  That still needs to be installed manually on the server.
 
==Launchpad's side of things==
 
Because Mahara requires .po or .mo files for its translation interface, and Mahara itself doesn't directly use either of those formats, we use a proxy project to translate Mahara. This project is called "mahara-lang" (aka Mahara Translations). It doesn't have releases like the normal Mahara project, but it does have a separate series for each Mahara series. We basically turn Mahara's PHP lang string files into a POT file which is the only content of this "project", and then let Launchpad's translation interface work with that.
 
===Launchpad import===
 
The Launchpad translation interface lets you configure a Bazaar import branch for each series. It expects the import branch to contain one or more English-language PO or POT files. We push to this branch from <code>update-pot.sh</code>, and Launchpad checks it periodically, notice our updates, and stores them in its servers where it uses them to inform its web-based translation interface.


The following is a general summary of what these scripts are trying to do.  Obviously, to fix bugs or find out what's really going on, you need to read them.
===Human translators===


==Generation of .pot files for launchpad==
Human translators go to the Launchpad web interface, and translate the strings for a particular Mahara series and language. Launchpad saves these changes internally.


* The update-pot.sh script runs once a day as the maharabot user (at 7:52AM NZDT)
===Launchpad export===
* Checks current branches at https://git.mahara.org/mahara/mahara.git for updates
* If there have been changes, runs a php script called php-po.php to generate a single mahara.pot file (for each branch) from the en.utf8 directories in the mahara working tree
* On the master branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos)
* Pushes updated pot and po files to lp:~mahara-lang/mahara-lang/<branch>, where they will be imported into launchpad ready for translation


The package takes care of most of the necessary dependencies except for the '''maharabot user's ssh key''', needed for the bzr push to launchpad.  That still needs to be installed manually on the server.
Launchpad also lets you configure an export branch for each series. Once a day, Launchpad automatically takes its internally stored translation data for all the languages on a series, converts it into a separate PO file for each language, and commits those files into the export branch.


==Generation of language packs==
==Generation of language packs==


* The langpacks.sh script runs once per hour as the maharabot user
* The <code>langpacks.sh</code> script runs once per hour as the maharabot user
* Checks all the languages in the [https://git.mahara.org/scripts/mahara-scripts/blobs/master/mahara-langpacks/language-repos.txt language-repos.txt] file for newly translated strings.  This file is pulled straight from git.mahara.org each time the script runs, so to add a new language, you only need to commit this file, you do not need to update the package or copy the file to the server.
* Checks all the languages in the language-repos.txt file for newly translated strings.
* For each language and each current mahara branch, there is a single .po file exported by launchpad*
** The script is hard-coded to check for the latest version of this file in the [https://git.mahara.org/scripts/mahara-scripts/blob/master/mahara-langpacks/language-repos.txt mahara-scripts git repo], or to use a local version on the server with it. If a local version exists, it takes precedence.
* For each Mahara series, Launchpad will have an export branch in Bazaar, and each branch will contain a single .po file exported by Launchpad as described above.
* The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory).  If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch.
* The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory).  If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch.
* The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara
* The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara
Line 44: Line 64:
  mkdir ~/code
  mkdir ~/code
  cd ~/code
  cd ~/code
  git clone git@git.mahara.org:scripts/mahara-scripts.git
  git clone https://git.mahara.org/scripts/mahara-scripts.git


* Get the po file of the language from Launchpad
* Get the po file of the language from Launchpad
Line 57: Line 77:
  tar -czf <language code>-master.tar.gz <language code>.utf8
  tar -czf <language code>-master.tar.gz <language code>.utf8


==Installation these language packs update scripts==
==Installation of these scripts==
Early, the two scripts have been installed in the 'chatter' server. Currently, the two scripts have been deployed in the twin servers: learn-docus-web1 and learn-docus-web2.
 
There are some notes:
The scripts were initially written to run on one server, but more recently the langpacks.mahara.org site has been moved to a cluster of two web servers, each of which has a running copy of the scripts. This poses some challenges:


* The script "update-pot.sh" needs to run on ONE server (learn-docus-web1).
* The script <code>update-pot.sh</code> needs to run on ONE server (currently, server 1 in the cluster)


* The script "langpacks.sh" should run on 2 servers but at different times in order to avoid overloading. The script will store its data in "$DATA" directory (defined in the file /etc/mahara-langpacks.conf). This directory should not be shared between two servers.
* The script <code>langpacks.sh</code> should run on each server, but at different times in order to avoid overloading. The script will store its data in the "$DATA" directory (defined in the file <code>/etc/mahara-langpacks.conf</code>). This directory should '''not''' be shared between two servers.


* User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net  
* User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net  


* bzr must be installed and configured on 2 servers
* The Bazaar client must be installed and configured on each server (<code>apt-get bzr</code>)
 
==Git-based translation branches==
Before we started using the Launchpad translation interface in 2010, we stored all the translations in PHP files in Git. The plan was to phase all translation branches over to Launchpad, but as of 2016 a couple of them still remain in Git, most prominently the Czech translation.
 
Fortunately, all the code for handling translations in Git is still present in the scripts mentioned above. The repo list file <code>language-repos.txt</code> indicates whether each language is stored in Launchpad, or the URL of the git repository it should come from.
 
* <code>update-pot.sh</code> doesn't actually do anything for Git-based translations right now. Prior to our Launchpad switchover, it used to generate the POT files and publish them to langpacks.mahara.org/pot/, where translators could download and use them in their translation tools. Now, if a translator wants to use the POT files directly, they need to fetch them from the Bazaar branch, like so: <code>bzr branch lp:mahara-lang/16.04</code>
 
* <code>langpacks.pl</code> knows whether each language should be handled by Launchpad or Git, as specified in <code>language-repos.txt</code>. In the repo, it looks for branches named after each supported Git series (master, 15.10_STABLE, 15.04_STABLE, etc).
** Within each branch, it looks for a PO file (i.e. <code><lang>.po</code>) and uses that the same as it would a PO file from Launchpad.
** If it doesn't find a PO file, it looks for a <code>lang/<lang>.utf8</code> directory, and tries to pull translation PHP strings from there. So this means that Git translations, unlike Launchpad, can use PHP files directly. The PHP files get packaged up into the langpack without a PO conversion step.
 
Note that the scripts (at present) only ''read'' from Git, not ''write'' to it. So if the repository where it's stored allows anonymous Git read access, everything should be good to go.
 
==Combining Launchpad-based and non-Launchpad translations==
 
This section is theoretical.
 
=== Launchpad and offline translations===
Some of our translators would prefer to use offline POT-based translation tools rather than Launchpad's translation interface (which is admittedly a little clunky). Here are some ideas about how we might allow them to do that and combine this with the translations in Launchpad.
 
Currently, the stuff we're importing into Launchpad is actually what Launchpad considers "templates" rather than translations. This means that it only reads in an English-language lang file, and uses that to create a list of strings for other languages to translate. See https://help.launchpad.net/Translations/YourProject/ImportingTemplates
 
Offline translations can also be imported, by the methods described on this page: https://help.launchpad.net/Translations/YourProject/ImportingTranslations
 
* Uploading a tarball that contains the .po file for a language (but you can only do this for the trunk branch, so it's not very useful)
* Commit the language's po file into the relevant import branch in Bazaar, and then use the "One-off import" command under the branch's synchronization settings page
* Or if you're going to have a regular offline translator, you might set it to automatically import translations (although Launchpad warns this might overwrite translations created via Launchpad).
 
=== Launchpad and Git translations===
 
If we had a situation where there were some contributors using Git for a language, and others using Launchpad, then we might be able to rig up something like this:


1. Have <code>update-pot.sh</code> pull from the Git repository and push into the Launchpad import branch
2. Set Launchpad to regularly import translations (and not just templates) from the import branch
3. Have <code>langpacks.pl</code> export the generated PHP files into the Git repository.


<nowiki>*</nowiki> There is still a lot of code for the old git.mahara.org language repos that can go away if/when the last gitorious translations finally die.
You'd need to give some careful thought about how to avoid a circular over-write sequence, though.

Latest revision as of 18:09, 4 July 2018

Two main scripts in the mahara-langpacks directory of the mahara-scripts repository push new English language strings from Mahara into Launchpad, and then pull non-English translations out of Launchpad, to publish them on langpacks.mahara.org.

  1. update-pot.sh polls the Mahara code's htdocs/lang/en.utf8 directory for changes to strings, converts Mahara's lang PHP files into a .pot file, and pushes this update file into a Bazaar branch in the mahara-lang project on Launchpad.
  2. Launchpad periodically imports the English-lanuage .pot file from the Bazaar branch, and uses it to populate its web-based translation interface for all the other languages.
  3. Launchpad periodically exports the latest translation data for all languages, into a separate .po file for each language, and publishes these onto another Bazaar branch in the mahara-lang project.
  4. langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org.

mahara-scripts also has a debian/ directory, which creates a package called custom-site-mahara-langpacks_x.y_all.deb. This package installs update-pot.sh and langpacks.sh and their dependencies, and sets them up to run on cron. (Currently the scripts are installed and run on the same Catalyst IT servers that host langpacks.mahara.org itself.)

The following is a general summary of what these scripts are trying to do.

Generation of .pot files for Launchpad

  • The update-pot.sh script runs once a day as the maharabot user (at 7:52AM NZDT)
  • Checks current branches at https://git.mahara.org/mahara/mahara.git for updates to English language files.
  • If there have been changes, runs a php script called php-po.php to generate a single mahara.pot file (for each branch) from the lang/en.utf8 directories in the mahara HEAD commit.
  • On the master branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos).
  • Pushes updated pot and po files to lp:~mahara-lang/mahara-lang/<branch>, where Launchpad will import them into its web-based translation interface.

The package takes care of most of the necessary dependencies except for the maharabot user's ssh key, needed for the bzr push to Launchpad. That still needs to be installed manually on the server.

Launchpad's side of things

Because Mahara requires .po or .mo files for its translation interface, and Mahara itself doesn't directly use either of those formats, we use a proxy project to translate Mahara. This project is called "mahara-lang" (aka Mahara Translations). It doesn't have releases like the normal Mahara project, but it does have a separate series for each Mahara series. We basically turn Mahara's PHP lang string files into a POT file which is the only content of this "project", and then let Launchpad's translation interface work with that.

Launchpad import

The Launchpad translation interface lets you configure a Bazaar import branch for each series. It expects the import branch to contain one or more English-language PO or POT files. We push to this branch from update-pot.sh, and Launchpad checks it periodically, notice our updates, and stores them in its servers where it uses them to inform its web-based translation interface.

Human translators

Human translators go to the Launchpad web interface, and translate the strings for a particular Mahara series and language. Launchpad saves these changes internally.

Launchpad export

Launchpad also lets you configure an export branch for each series. Once a day, Launchpad automatically takes its internally stored translation data for all the languages on a series, converts it into a separate PO file for each language, and commits those files into the export branch.

Generation of language packs

  • The langpacks.sh script runs once per hour as the maharabot user
  • Checks all the languages in the language-repos.txt file for newly translated strings.
    • The script is hard-coded to check for the latest version of this file in the mahara-scripts git repo, or to use a local version on the server with it. If a local version exists, it takes precedence.
  • For each Mahara series, Launchpad will have an export branch in Bazaar, and each branch will contain a single .po file exported by Launchpad as described above.
  • The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory). If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch.
  • The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara
  • Tarballs of these directories are put into the document root of the http://langpacks.mahara.org site, and index.html, status.html files are generated

Manually update language packs

If you are in charge of mahara translation management, you can manually update language packs on [1]. This is the case where the langpack scripts can not be run on the server.

  • You should have permission to access the langpacks.mahara.org server.

Here are the instructions

  • Update environment variables in /etc/mahara-langpacks.conf
  • Run the script update-pot.sh
  • Run the script langpacks.sh
  • Copy the directory mahara-langpacks to the server
    • Use 'scp' to copy the langpacks directory from your local machine to a temporary directory on the server.
    • Use 'sudo -u maharabot cp -ar ...' to copy to the langpacks directory.

Manually create a tar ball of a language for testing

You can generate the language pack of a particular language to test. Here are the instructions

mkdir ~/code
cd ~/code
git clone https://git.mahara.org/scripts/mahara-scripts.git
  • Get the po file of the language from Launchpad
  • Run the script po-php.pl
cd mahara-scripts/mahara-langpacks
po-php.pl /path/to/po/files/<po file> /path/to/langpacks/<language code>.utf8 <language code>.utf8
  • Build the tar ball
cd /path/to/langpacks
tar -czf <language code>-master.tar.gz <language code>.utf8

Installation of these scripts

The scripts were initially written to run on one server, but more recently the langpacks.mahara.org site has been moved to a cluster of two web servers, each of which has a running copy of the scripts. This poses some challenges:

  • The script update-pot.sh needs to run on ONE server (currently, server 1 in the cluster)
  • The script langpacks.sh should run on each server, but at different times in order to avoid overloading. The script will store its data in the "$DATA" directory (defined in the file /etc/mahara-langpacks.conf). This directory should not be shared between two servers.
  • User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net
  • The Bazaar client must be installed and configured on each server (apt-get bzr)

Git-based translation branches

Before we started using the Launchpad translation interface in 2010, we stored all the translations in PHP files in Git. The plan was to phase all translation branches over to Launchpad, but as of 2016 a couple of them still remain in Git, most prominently the Czech translation.

Fortunately, all the code for handling translations in Git is still present in the scripts mentioned above. The repo list file language-repos.txt indicates whether each language is stored in Launchpad, or the URL of the git repository it should come from.

  • update-pot.sh doesn't actually do anything for Git-based translations right now. Prior to our Launchpad switchover, it used to generate the POT files and publish them to langpacks.mahara.org/pot/, where translators could download and use them in their translation tools. Now, if a translator wants to use the POT files directly, they need to fetch them from the Bazaar branch, like so: bzr branch lp:mahara-lang/16.04
  • langpacks.pl knows whether each language should be handled by Launchpad or Git, as specified in language-repos.txt. In the repo, it looks for branches named after each supported Git series (master, 15.10_STABLE, 15.04_STABLE, etc).
    • Within each branch, it looks for a PO file (i.e. <lang>.po) and uses that the same as it would a PO file from Launchpad.
    • If it doesn't find a PO file, it looks for a lang/<lang>.utf8 directory, and tries to pull translation PHP strings from there. So this means that Git translations, unlike Launchpad, can use PHP files directly. The PHP files get packaged up into the langpack without a PO conversion step.

Note that the scripts (at present) only read from Git, not write to it. So if the repository where it's stored allows anonymous Git read access, everything should be good to go.

Combining Launchpad-based and non-Launchpad translations

This section is theoretical.

Launchpad and offline translations

Some of our translators would prefer to use offline POT-based translation tools rather than Launchpad's translation interface (which is admittedly a little clunky). Here are some ideas about how we might allow them to do that and combine this with the translations in Launchpad.

Currently, the stuff we're importing into Launchpad is actually what Launchpad considers "templates" rather than translations. This means that it only reads in an English-language lang file, and uses that to create a list of strings for other languages to translate. See https://help.launchpad.net/Translations/YourProject/ImportingTemplates

Offline translations can also be imported, by the methods described on this page: https://help.launchpad.net/Translations/YourProject/ImportingTranslations

  • Uploading a tarball that contains the .po file for a language (but you can only do this for the trunk branch, so it's not very useful)
  • Commit the language's po file into the relevant import branch in Bazaar, and then use the "One-off import" command under the branch's synchronization settings page
  • Or if you're going to have a regular offline translator, you might set it to automatically import translations (although Launchpad warns this might overwrite translations created via Launchpad).

Launchpad and Git translations

If we had a situation where there were some contributors using Git for a language, and others using Launchpad, then we might be able to rig up something like this:

1. Have update-pot.sh pull from the Git repository and push into the Launchpad import branch 2. Set Launchpad to regularly import translations (and not just templates) from the import branch 3. Have langpacks.pl export the generated PHP files into the Git repository.

You'd need to give some careful thought about how to avoid a circular over-write sequence, though.