Developer Area/Language Pack Generation: Difference between revisions
From Mahara Wiki
< Developer Area
No edit summary |
No edit summary |
||
(5 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Two scripts in the mahara-langpacks directory of the [https://git.mahara.org/scripts/mahara-scripts mahara-scripts repository] | Two main scripts in the <code>mahara-langpacks</code> directory of the [https://git.mahara.org/scripts/mahara-scripts mahara-scripts repository] push new English language strings from Mahara into Launchpad, and then pull non-English translations out of Launchpad, to publish them on langpacks.mahara.org. | ||
# update-pot.sh polls the Mahara code for changes to strings, and pushes | |||
# <code>update-pot.sh</code> polls the Mahara code's <code>htdocs/lang/en.utf8</code> directory for changes to strings, converts Mahara's lang PHP files into a <code>.pot</code> file, and pushes this update file into a Bazaar branch in the <code>mahara-lang</code> project on Launchpad. | |||
# Launchpad periodically imports the English-lanuage .pot file from the Bazaar branch, and uses it to populate its web-based translation interface for all the other languages. | |||
# Launchpad periodically exports the latest translation data for all languages, into a separate <code>.po</code> file for each language, and publishes these onto another Bazaar branch in the <code>mahara-lang</code> project. | |||
# langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org. | # langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org. | ||
mahara-scripts also has a debian/ directory, which creates a package called custom-site-mahara-langpacks_x.y_all.deb. | <code>mahara-scripts</code> also has a <code>debian/</code> directory, which creates a package called <code>custom-site-mahara-langpacks_x.y_all.deb</code>. This package installs <code>update-pot.sh</code> and <code>langpacks.sh</code> and their dependencies, and sets them up to run on cron. (Currently the scripts are installed and run on the same Catalyst IT servers that host langpacks.mahara.org itself.) | ||
The following is a general summary of what these scripts are trying to do. | |||
==Generation of .pot files for Launchpad== | |||
* The <code>update-pot.sh</code> script runs once a day as the maharabot user (at 7:52AM NZDT) | |||
* Checks current branches at https://git.mahara.org/mahara/mahara.git for updates to English language files. | |||
* If there have been changes, runs a php script called <code>php-po.php</code> to generate a single mahara.pot file (for each branch) from the <code>lang/en.utf8</code> directories in the mahara HEAD commit. | |||
* On the <code>master</code> branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos). | |||
* Pushes updated pot and po files to <code>lp:~mahara-lang/mahara-lang/<branch></code>, where Launchpad will import them into its web-based translation interface. | |||
The package takes care of most of the necessary dependencies except for the '''maharabot user's ssh key''', needed for the bzr push to Launchpad. That still needs to be installed manually on the server. | |||
==Launchpad's side of things== | |||
Because Mahara requires .po or .mo files for its translation interface, and Mahara itself doesn't directly use either of those formats, we use a proxy project to translate Mahara. This project is called "mahara-lang" (aka Mahara Translations). It doesn't have releases like the normal Mahara project, but it does have a separate series for each Mahara series. We basically turn Mahara's PHP lang string files into a POT file which is the only content of this "project", and then let Launchpad's translation interface work with that. | |||
===Launchpad import=== | |||
The Launchpad translation interface lets you configure a Bazaar import branch for each series. It expects the import branch to contain one or more English-language PO or POT files. We push to this branch from <code>update-pot.sh</code>, and Launchpad checks it periodically, notice our updates, and stores them in its servers where it uses them to inform its web-based translation interface. | |||
===Human translators=== | |||
Human translators go to the Launchpad web interface, and translate the strings for a particular Mahara series and language. Launchpad saves these changes internally. | |||
===Launchpad export=== | |||
Launchpad also lets you configure an export branch for each series. Once a day, Launchpad automatically takes its internally stored translation data for all the languages on a series, converts it into a separate PO file for each language, and commits those files into the export branch. | |||
==Generation of language packs== | ==Generation of language packs== | ||
* The langpacks.sh script runs once per hour as the maharabot user | * The <code>langpacks.sh</code> script runs once per hour as the maharabot user | ||
* Checks all the languages in the [https://git.mahara.org/scripts/mahara-scripts/ | * Checks all the languages in the language-repos.txt file for newly translated strings. | ||
* For each | ** The script is hard-coded to check for the latest version of this file in the [https://git.mahara.org/scripts/mahara-scripts/blob/master/mahara-langpacks/language-repos.txt mahara-scripts git repo], or to use a local version on the server with it. If a local version exists, it takes precedence. | ||
* For each Mahara series, Launchpad will have an export branch in Bazaar, and each branch will contain a single .po file exported by Launchpad as described above. | |||
* The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory). If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch. | * The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory). If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch. | ||
* The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara | * The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara | ||
Line 44: | Line 64: | ||
mkdir ~/code | mkdir ~/code | ||
cd ~/code | cd ~/code | ||
git clone | git clone https://git.mahara.org/scripts/mahara-scripts.git | ||
* Get the po file of the language from Launchpad | * Get the po file of the language from Launchpad | ||
Line 57: | Line 77: | ||
tar -czf <language code>-master.tar.gz <language code>.utf8 | tar -czf <language code>-master.tar.gz <language code>.utf8 | ||
==Installation these | ==Installation of these scripts== | ||
The scripts were initially written to run on one server, but more recently the langpacks.mahara.org site has been moved to a cluster of two web servers, each of which has a running copy of the scripts. This poses some challenges: | |||
* The script | * The script <code>update-pot.sh</code> needs to run on ONE server (currently, server 1 in the cluster) | ||
* The script | * The script <code>langpacks.sh</code> should run on each server, but at different times in order to avoid overloading. The script will store its data in the "$DATA" directory (defined in the file <code>/etc/mahara-langpacks.conf</code>). This directory should '''not''' be shared between two servers. | ||
* User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net | * User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net | ||
* | * The Bazaar client must be installed and configured on each server (<code>apt-get bzr</code>) | ||
==Git-based translation branches== | |||
Before we started using the Launchpad translation interface in 2010, we stored all the translations in PHP files in Git. The plan was to phase all translation branches over to Launchpad, but as of 2016 a couple of them still remain in Git, most prominently the Czech translation. | |||
Fortunately, all the code for handling translations in Git is still present in the scripts mentioned above. The repo list file <code>language-repos.txt</code> indicates whether each language is stored in Launchpad, or the URL of the git repository it should come from. | |||
* <code>update-pot.sh</code> doesn't actually do anything for Git-based translations right now. Prior to our Launchpad switchover, it used to generate the POT files and publish them to langpacks.mahara.org/pot/, where translators could download and use them in their translation tools. Now, if a translator wants to use the POT files directly, they need to fetch them from the Bazaar branch, like so: <code>bzr branch lp:mahara-lang/16.04</code> | |||
* <code>langpacks.pl</code> knows whether each language should be handled by Launchpad or Git, as specified in <code>language-repos.txt</code>. In the repo, it looks for branches named after each supported Git series (master, 15.10_STABLE, 15.04_STABLE, etc). | |||
** Within each branch, it looks for a PO file (i.e. <code><lang>.po</code>) and uses that the same as it would a PO file from Launchpad. | |||
** If it doesn't find a PO file, it looks for a <code>lang/<lang>.utf8</code> directory, and tries to pull translation PHP strings from there. So this means that Git translations, unlike Launchpad, can use PHP files directly. The PHP files get packaged up into the langpack without a PO conversion step. | |||
Note that the scripts (at present) only ''read'' from Git, not ''write'' to it. So if the repository where it's stored allows anonymous Git read access, everything should be good to go. | |||
==Combining Launchpad-based and non-Launchpad translations== | |||
This section is theoretical. | |||
=== Launchpad and offline translations=== | |||
Some of our translators would prefer to use offline POT-based translation tools rather than Launchpad's translation interface (which is admittedly a little clunky). Here are some ideas about how we might allow them to do that and combine this with the translations in Launchpad. | |||
Currently, the stuff we're importing into Launchpad is actually what Launchpad considers "templates" rather than translations. This means that it only reads in an English-language lang file, and uses that to create a list of strings for other languages to translate. See https://help.launchpad.net/Translations/YourProject/ImportingTemplates | |||
Offline translations can also be imported, by the methods described on this page: https://help.launchpad.net/Translations/YourProject/ImportingTranslations | |||
* Uploading a tarball that contains the .po file for a language (but you can only do this for the trunk branch, so it's not very useful) | |||
* Commit the language's po file into the relevant import branch in Bazaar, and then use the "One-off import" command under the branch's synchronization settings page | |||
* Or if you're going to have a regular offline translator, you might set it to automatically import translations (although Launchpad warns this might overwrite translations created via Launchpad). | |||
=== Launchpad and Git translations=== | |||
If we had a situation where there were some contributors using Git for a language, and others using Launchpad, then we might be able to rig up something like this: | |||
1. Have <code>update-pot.sh</code> pull from the Git repository and push into the Launchpad import branch | |||
2. Set Launchpad to regularly import translations (and not just templates) from the import branch | |||
3. Have <code>langpacks.pl</code> export the generated PHP files into the Git repository. | |||
You'd need to give some careful thought about how to avoid a circular over-write sequence, though. |
Latest revision as of 18:09, 4 July 2018
Two main scripts in the mahara-langpacks
directory of the mahara-scripts repository push new English language strings from Mahara into Launchpad, and then pull non-English translations out of Launchpad, to publish them on langpacks.mahara.org.
update-pot.sh
polls the Mahara code'shtdocs/lang/en.utf8
directory for changes to strings, converts Mahara's lang PHP files into a.pot
file, and pushes this update file into a Bazaar branch in themahara-lang
project on Launchpad.- Launchpad periodically imports the English-lanuage .pot file from the Bazaar branch, and uses it to populate its web-based translation interface for all the other languages.
- Launchpad periodically exports the latest translation data for all languages, into a separate
.po
file for each language, and publishes these onto another Bazaar branch in themahara-lang
project. - langpacks.sh polls the mahara-lang repositories on launchpad, and generates official mahara language pack tarballs at http://langpacks.mahara.org.
mahara-scripts
also has a debian/
directory, which creates a package called custom-site-mahara-langpacks_x.y_all.deb
. This package installs update-pot.sh
and langpacks.sh
and their dependencies, and sets them up to run on cron. (Currently the scripts are installed and run on the same Catalyst IT servers that host langpacks.mahara.org itself.)
The following is a general summary of what these scripts are trying to do.
Generation of .pot files for Launchpad
- The
update-pot.sh
script runs once a day as the maharabot user (at 7:52AM NZDT) - Checks current branches at https://git.mahara.org/mahara/mahara.git for updates to English language files.
- If there have been changes, runs a php script called
php-po.php
to generate a single mahara.pot file (for each branch) from thelang/en.utf8
directories in the mahara HEAD commit. - On the
master
branch, it may also create po files for existing translations, when there have been changes to Mahara strings that don't need to be translated (e.g. typos). - Pushes updated pot and po files to
lp:~mahara-lang/mahara-lang/<branch>
, where Launchpad will import them into its web-based translation interface.
The package takes care of most of the necessary dependencies except for the maharabot user's ssh key, needed for the bzr push to Launchpad. That still needs to be installed manually on the server.
Launchpad's side of things
Because Mahara requires .po or .mo files for its translation interface, and Mahara itself doesn't directly use either of those formats, we use a proxy project to translate Mahara. This project is called "mahara-lang" (aka Mahara Translations). It doesn't have releases like the normal Mahara project, but it does have a separate series for each Mahara series. We basically turn Mahara's PHP lang string files into a POT file which is the only content of this "project", and then let Launchpad's translation interface work with that.
Launchpad import
The Launchpad translation interface lets you configure a Bazaar import branch for each series. It expects the import branch to contain one or more English-language PO or POT files. We push to this branch from update-pot.sh
, and Launchpad checks it periodically, notice our updates, and stores them in its servers where it uses them to inform its web-based translation interface.
Human translators
Human translators go to the Launchpad web interface, and translate the strings for a particular Mahara series and language. Launchpad saves these changes internally.
Launchpad export
Launchpad also lets you configure an export branch for each series. Once a day, Launchpad automatically takes its internally stored translation data for all the languages on a series, converts it into a separate PO file for each language, and commits those files into the export branch.
Generation of language packs
- The
langpacks.sh
script runs once per hour as the maharabot user - Checks all the languages in the language-repos.txt file for newly translated strings.
- The script is hard-coded to check for the latest version of this file in the mahara-scripts git repo, or to use a local version on the server with it. If a local version exists, it takes precedence.
- For each Mahara series, Launchpad will have an export branch in Bazaar, and each branch will contain a single .po file exported by Launchpad as described above.
- The last commit id for each language/branch is stored in the file /var/lib/mahara-langpacks/tarballs/mahara-langpacks.last (in the script's working directory). If you ever need to force regeneration of a particular language pack, you probably need to hack that file to remove the language and/or branch.
- The script po-php.pl converts the .po file into the directory tree of php and html files required by Mahara
- Tarballs of these directories are put into the document root of the http://langpacks.mahara.org site, and index.html, status.html files are generated
Manually update language packs
If you are in charge of mahara translation management, you can manually update language packs on [1]. This is the case where the langpack scripts can not be run on the server.
- You should have permission to access the langpacks.mahara.org server.
Here are the instructions
- Update environment variables in /etc/mahara-langpacks.conf
- Run the script update-pot.sh
- Run the script langpacks.sh
- Copy the directory mahara-langpacks to the server
- Use 'scp' to copy the langpacks directory from your local machine to a temporary directory on the server.
- Use 'sudo -u maharabot cp -ar ...' to copy to the langpacks directory.
Manually create a tar ball of a language for testing
You can generate the language pack of a particular language to test. Here are the instructions
- Get the language scripts from mahara-scripts repository
mkdir ~/code cd ~/code git clone https://git.mahara.org/scripts/mahara-scripts.git
- Get the po file of the language from Launchpad
- Run the script po-php.pl
cd mahara-scripts/mahara-langpacks po-php.pl /path/to/po/files/<po file> /path/to/langpacks/<language code>.utf8 <language code>.utf8
- Build the tar ball
cd /path/to/langpacks tar -czf <language code>-master.tar.gz <language code>.utf8
Installation of these scripts
The scripts were initially written to run on one server, but more recently the langpacks.mahara.org site has been moved to a cluster of two web servers, each of which has a running copy of the scripts. This poses some challenges:
- The script
update-pot.sh
needs to run on ONE server (currently, server 1 in the cluster)
- The script
langpacks.sh
should run on each server, but at different times in order to avoid overloading. The script will store its data in the "$DATA" directory (defined in the file/etc/mahara-langpacks.conf
). This directory should not be shared between two servers.
- User maharabot on both servers must be created and his SSH keys must be updated on Launchpad.net
- The Bazaar client must be installed and configured on each server (
apt-get bzr
)
Git-based translation branches
Before we started using the Launchpad translation interface in 2010, we stored all the translations in PHP files in Git. The plan was to phase all translation branches over to Launchpad, but as of 2016 a couple of them still remain in Git, most prominently the Czech translation.
Fortunately, all the code for handling translations in Git is still present in the scripts mentioned above. The repo list file language-repos.txt
indicates whether each language is stored in Launchpad, or the URL of the git repository it should come from.
update-pot.sh
doesn't actually do anything for Git-based translations right now. Prior to our Launchpad switchover, it used to generate the POT files and publish them to langpacks.mahara.org/pot/, where translators could download and use them in their translation tools. Now, if a translator wants to use the POT files directly, they need to fetch them from the Bazaar branch, like so:bzr branch lp:mahara-lang/16.04
langpacks.pl
knows whether each language should be handled by Launchpad or Git, as specified inlanguage-repos.txt
. In the repo, it looks for branches named after each supported Git series (master, 15.10_STABLE, 15.04_STABLE, etc).- Within each branch, it looks for a PO file (i.e.
<lang>.po
) and uses that the same as it would a PO file from Launchpad. - If it doesn't find a PO file, it looks for a
lang/<lang>.utf8
directory, and tries to pull translation PHP strings from there. So this means that Git translations, unlike Launchpad, can use PHP files directly. The PHP files get packaged up into the langpack without a PO conversion step.
- Within each branch, it looks for a PO file (i.e.
Note that the scripts (at present) only read from Git, not write to it. So if the repository where it's stored allows anonymous Git read access, everything should be good to go.
Combining Launchpad-based and non-Launchpad translations
This section is theoretical.
Launchpad and offline translations
Some of our translators would prefer to use offline POT-based translation tools rather than Launchpad's translation interface (which is admittedly a little clunky). Here are some ideas about how we might allow them to do that and combine this with the translations in Launchpad.
Currently, the stuff we're importing into Launchpad is actually what Launchpad considers "templates" rather than translations. This means that it only reads in an English-language lang file, and uses that to create a list of strings for other languages to translate. See https://help.launchpad.net/Translations/YourProject/ImportingTemplates
Offline translations can also be imported, by the methods described on this page: https://help.launchpad.net/Translations/YourProject/ImportingTranslations
- Uploading a tarball that contains the .po file for a language (but you can only do this for the trunk branch, so it's not very useful)
- Commit the language's po file into the relevant import branch in Bazaar, and then use the "One-off import" command under the branch's synchronization settings page
- Or if you're going to have a regular offline translator, you might set it to automatically import translations (although Launchpad warns this might overwrite translations created via Launchpad).
Launchpad and Git translations
If we had a situation where there were some contributors using Git for a language, and others using Launchpad, then we might be able to rig up something like this:
1. Have update-pot.sh
pull from the Git repository and push into the Launchpad import branch
2. Set Launchpad to regularly import translations (and not just templates) from the import branch
3. Have langpacks.pl
export the generated PHP files into the Git repository.
You'd need to give some careful thought about how to avoid a circular over-write sequence, though.