Developer Area/Language Pack Generation

From Mahara Wiki

< Developer Area
Revision as of 18:04, 19 April 2017 by Cecilia (talk | contribs)

Two main scripts in the mahara-langpacks directory of the mahara-scripts repository push new English language strings from Mahara into Launchpad, and then pull non-English translations out of Launchpad, to publish them on

  1. polls the Mahara code's htdocs/lang/en.utf8 directory for changes to strings, converts Mahara's lang PHP files into a .pot file, and pushes this update file into a Bazaar branch in the mahara-lang project on Launchpad.
  2. Launchpad periodically imports the English-lanuage .pot file from the Bazaar branch, and uses it to populate its web-based translation interface for all the other languages.
  3. Launchpad periodically exports the latest translation data for all languages, into a separate .po file for each language, and publishes these onto another Bazaar branch in the mahara-lang project.
  4. polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at

mahara-scripts also has a debian/ directory, which creates a package called custom-site-mahara-langpacks_x.y_all.deb. This package installs and and their dependencies, and sets them up to run on cron. (Currently the scripts are installed and run on the same Catalyst IT servers that host itself.)

The following is a general summary of what these scripts are trying to do.

Generation of .pot files for Launchpad

  • The script runs once a day as the maharabot user (at 7:52AM NZDT)
  • Checks current branches at for updates to English language files.
  • If there have been changes, runs a php script called php-po.php to generate a single mahara.pot file (for each branch) from the lang/en.utf8 directories in the mahara HEAD commit.
  • On the master branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos).
  • Pushes updated pot and po files to lp:~mahara-lang/mahara-lang/<branch>, where Launchpad will import them into its web-based translation interface.

The package takes care of most of the necessary dependencies except for the maharabot user's ssh key, needed for the bzr push to Launchpad. That still needs to be installed manually on the server.

Launchpad's side of things

Because Mahara requires .po or .mo files for its translation interface, and Mahara itself doesn't directly use either of those formats, we use a proxy project to translate Mahara. This project is called "mahara-lang" (aka Mahara Translations). It doesn't have releases like the normal Mahara project, but it does have a separate series for each Mahara series. We basically turn Mahara's PHP lang string files into a POT file which is the only content of this "project", and then let Launchpad's translation interface work with that.

Launchpad import

The Launchpad translation interface lets you configure a Bazaar import branch for each series. It expects the import branch to contain one or more English-language PO or POT files. We push to this branch from, and Launchpad checks it periodically, notice our updates, and stores them in its servers where it uses them to inform its web-based translation interface.

Human translators

Human translators go to the Launchpad web interface, and translate the strings for a particular Mahara series and language. Launchpad saves these changes internally.

Launchpad export

Launchpad also lets you configure an export branch for each series. Once a day, Launchpad automatically takes its internally stored translation data for all the languages on a series, converts it into a separate PO file for each language, and commits those files into the export branch.

Generation of language packs

  • The script runs once per hour as the maharabot user
  • Checks all the languages in the language-repos.txt file for newly translated strings.
    • The script is hard-coded to check for the latest version of this file in the mahara-scripts git repo, or to use a local version on the server with it. If a local version exists, it takes precedence.
  • For each Mahara series, Launchpad will have an export branch in Bazaar, and each branch will contain a single .po file exported by Launchpad as described above.
  • The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory). If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch.
  • The script converts the .po file into the directory tree of php and html files required by Mahara
  • Tarballs of these directories are put into the document root of the site, and index.html, status.html files are generated

Manually update language packs

If you are in charge of mahara translation management, you can manually update language packs on [1]. This is the case where the langpack scripts can not be run on the server.

  • You should have permission to access the server.

Here are the instructions

  • Update environment variables in /etc/mahara-langpacks.conf
  • Run the script
  • Run the script
  • Copy the directory mahara-langpacks to the server
    • Use 'scp' to copy the langpacks directory from your local machine to a temporary directory on the server.
    • Use 'sudo -u maharabot cp -ar ...' to copy to the langpacks directory.

Manually create a tar ball of a language for testing

You can generate the language pack of a particular language to test. Here are the instructions

mkdir ~/code
cd ~/code
git clone [email protected]:scripts/mahara-scripts.git
  • Get the po file of the language from Launchpad
  • Run the script
cd mahara-scripts/mahara-langpacks /path/to/po/files/<po file> /path/to/langpacks/<language code>.utf8 <language code>.utf8
  • Build the tar ball
cd /path/to/langpacks
tar -czf <language code>-master.tar.gz <language code>.utf8

Installation of these scripts

The scripts were initially written to run on one server, but more recently the site has been moved to a cluster of two web servers, each of which has a running copy of the scripts. This poses some challenges:

  • The script needs to run on ONE server (currently, server 1 in the cluster)
  • The script should run on each server, but at different times in order to avoid overloading. The script will store its data in the "$DATA" directory (defined in the file /etc/mahara-langpacks.conf). This directory should not be shared between two servers.
  • User maharabot on both servers must be created and his SSH keys must be updated on
  • The Bazaar client must be installed and configured on each server (apt-get bzr)

Git-based translation branches

Before we started using the Launchpad translation interface in 2010, we stored all the translations in PHP files in Git. The plan was to phase all translation branches over to Launchpad, but as of 2016 a couple of them still remain in Git, most prominently the Czech translation.

Fortunately, all the code for handling translations in Git is still present in the scripts mentioned above. The repo list file language-repos.txt indicates whether each language is stored in Launchpad, or the URL of the git repository it should come from.

  • doesn't actually do anything for Git-based translations right now. Prior to our Launchpad switchover, it used to generate the POT files and publish them to, where translators could download and use them in their translation tools. Now, if a translator wants to use the POT files directly, they need to fetch them from the Bazaar branch, like so: bzr branch lp:mahara-lang/16.04
  • knows whether each language should be handled by Launchpad or Git, as specified in language-repos.txt. In the repo, it looks for branches named after each supported Git series (master, 15.10_STABLE, 15.04_STABLE, etc).
    • Within each branch, it looks for a PO file (i.e. <lang>.po) and uses that the same as it would a PO file from Launchpad.
    • If it doesn't find a PO file, it looks for a lang/<lang>.utf8 directory, and tries to pull translation PHP strings from there. So this means that Git translations, unlike Launchpad, can use PHP files directly. The PHP files get packaged up into the langpack without a PO conversion step.

Note that the scripts (at present) only read from Git, not write to it. So if the repository where it's stored allows anonymous Git read access, everything should be good to go.

Combining Launchpad-based and non-Launchpad translations

This section is theoretical.

Launchpad and offline translations

Some of our translators would prefer to use offline POT-based translation tools rather than Launchpad's translation interface (which is admittedly a little clunky). Here are some ideas about how we might allow them to do that and combine this with the translations in Launchpad.

Currently, the stuff we're importing into Launchpad is actually what Launchpad considers "templates" rather than translations. This means that it only reads in an English-language lang file, and uses that to create a list of strings for other languages to translate. See

Offline translations can also be imported, by the methods described on this page:

  • Uploading a tarball that contains the .po file for a language (but you can only do this for the trunk branch, so it's not very useful)
  • Commit the language's po file into the relevant import branch in Bazaar, and then use the "One-off import" command under the branch's synchronization settings page
  • Or if you're going to have a regular offline translator, you might set it to automatically import translations (although Launchpad warns this might overwrite translations created via Launchpad).

Launchpad and Git translations

If we had a situation where there were some contributors using Git for a language, and others using Launchpad, then we might be able to rig up something like this:

1. Have pull from the Git repository and push into the Launchpad import branch 2. Set Launchpad to regularly import translations (and not just templates) from the import branch 3. Have export the generated PHP files into the Git repository.

You'd need to give some careful thought about how to avoid a circular over-write sequence, though.