Actions

Developer Area/Cron API

From Mahara Wiki

< Developer Area
Revision as of 11:20, 27 March 2014 by Aaronw (talk | contribs) (→‎Core cron tasks)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Mahara provides a Cron API to allow for scheduled tasks. It uses an internal table of schedules to determine how frequently the tasks run, and it uses a lock to prevent two instances of the same task from running at the same time.

Basically we reimplemented crond. ;) The main reason for this is to make it easier for Mahara plugins to register new cron tasks, that can run at different intervals or times of day, without the system administrator having to manually configure a new cronjob for each one.

How it works

Here's the underlying architecture of the Mahara cron job.

htdocs/lib/cron.php, the One and Only cron script

All of Mahara's cron tasks are handled by one script, htdocs/lib/cron.php. This script is meant to be scheduled by the System Administrator to be executed once per minute, either via the command-line or through an HTTP request.

It then checks a series of internal tables in Mahara to see which internal cron tasks need to be executed, and carries them out.

Note that the cron script sets the pagetop constant 'CRON', which allows for scripts to detect that they're being executed by the cron and to behave accordingly. For instance, some checks for a current logged-in user are ignored when CRON is defined.

The cron tables

The individual cron tasks that Mahara should execute, are stored in a series of tables in Mahara's database.

Cron tasks pertaining to Mahara core (as opposed to a plugin) are stored in the aptly named cron table. Its most important field is cron.callfunction, which holds the name of a PHP function that will be called to carry out that task. Most of the other fields store scheduling information about how often it should run.

Each Mahara plugin type has its own separate cron table: blocktype_cron, artefact_cron, etc. These tables are much the same as the core cron table, except they additionally indicate the name of the plugin the task belongs to. The callfunction in these tables should be a static method of the plugin's main class.

Each cron table also has a nextrun column, which stores a timestamp representing the next time the cron should run. This is actually what the cron script uses to determine whether or not to execute a particular task, and it gets updated at the end of each successful task execution. If you want to force a task to run at the next running of cron.php, set its nextrun value to "NULL" (or to a timestamp in the past).

Cron locks

In order to avoid concurrency problems, Mahara uses a system of cron locks to prevent multiple copies of the same cron task from running at the same time.

The system is quite simple. Before executing a task, Mahara looks for a lock record for that cron, in the database. Specifically, it checks for a config record called "_cron_lock_core_{$callfunction}" (for a core cron task) or "_cron_lock_{$plugintype}{$pluginname}_{$callfunction}" (for a plugin cron task). If it finds this, it knows that another copy of the cron task already claimed the lock and is executing, and so it skips that task and doesn't execute it. On the other hand, if it doesn't find a lock present, it sets the lock itself and begins executing the task. When it has finished the task, it deletes the lock record.

But what happens if the cron task crashes before it can delete the lock record? Well, the lock record is a config record with a particular name, but every config also has a value. In this case, the cron script sets the value to be the time the lock was claimed. Using that, we can tell how long a particular task has been running. Each time the cron job finds a lock already present, it checks the timestamp stored in its value, and if it's more than 24 hours old it assumes the lock belonged to a task that crashed, so it clears it and begins executing the task again.

How to set up new cron tasks

Plugin cron tasks

It's quite easy to schedule cron tasks for a plugin. In the plugin's lib.php file, you simply add a public static function get_cron() to the plugin's "Plugin" subclass. For instance, for the cron task that updates the RSS feeds in external feed blocks, we added a get_cron() method to the class PluginBlocktypeExternalfeed, in the file htdocs/blocktype/externalfeed/lib.php.

The get_cron() method should take no arguments, and should return an array with one stdClass object for each cron task. Each of these objects should have a callfunction field, as well as any scheduling fields needed (minute, hour, day, month, dayofweek). Any scheduling fields left out will default to '*'. The "callfunction" should be the name of a public static method of the plugin's main class, which can be executed with no required parameters.

Example:

// in htdocs/blocktype/externalfeed/lib.php
class PluginBlocktypeExternalfeed {

   // ... lots of other methods also in this class

   /**
    * get_cron() tells Mahara which tasks to schedule
    */
   public static function get_cron() {
       $refresh = new stdClass();
       $refresh->callfunction = 'refresh_feeds';
       $refresh->hour = '*';
       $refresh->minute = '0';

       $cleanup = new stdClass();
       $cleanup->callfunction = 'cleanup_feeds';
       $cleanup->hour = '3';
       $cleanup->minute = '30';

       return array($refresh, $cleanup);
   }

   /**
    * Gets invoked by the cron script at :00 every hour.
    */
   public static refresh_feeds() {
       // do stuff
   }

   /**
    * Gets invoked by the cron script at 3:30 am each day
    */
   public static cleanup_feeds() {
       // do stuff
   }
}

And lastly, in order to make sure that existing installations of the plugin get upgraded to include the cron task, increment the plugin's version number in its version.php file.

Core cron tasks

The system for adding core cron tasks is not as graceful.

First, you write your callfunction. This should be a function in the global scope of a file included by the cron script. Most of the existing ones are in mahara.php, although it may be appropriate to place them in a different file.

Then, to schedule your cron task for new installations, you add a record to the $cronjobs array in the method core_install_firstcoredata_defaults() in htdocs/lib/upgrade.php. A few cron tasks schedule their times using rand(). These are cron tasks that "phone home" back to mahara.org to check for updates, etc. They're scheduled randomly in order to distribute the load on our servers.

Example:

// in lib/upgrade.php
function core_install_firstcoredata_defaults() {
    // ...lots of other code too

    // install the cronjobs...
    $cronjobs = array(
        'rebuild_artefact_parent_cache_dirty'       => array('*', '*', '*', '*', '*'),
        'rebuild_artefact_parent_cache_complete'    => array('0', '4', '*', '*', '*'),
        'activity_process_queue'                    => array('*/5', '*', '*', '*', '*'),
        'cron_send_registration_data'               => array(rand(0, 59), rand(0, 23), '*', '*', rand(0, )),
        'export_cleanup_old_exports'                => array('0', '3,15', '*', '*', '*'),
        // etc...
       'cron_institution_data_weekly'              => array('55', '23', '*', '*', '6'),
   );

   // ... and more stuff after
}

Next, to schedule your task for sites that are being upgraded, you add an insert_records() call in the main htdocs/lib/db/upgrade.php file, to insert the proper record directly into the "cron" table.

// in lib/db/upgrade.php:
if ($oldversion < 2012062902) {

    // Insert cron job to save institution data
    $cron = new stdClass();
    $cron->callfunction = 'cron_institution_data_weekly';
    $cron->minute       = 55;
    $cron->hour         = 23;
    $cron->day          = '*';
    $cron->month        = '*';
    $cron->dayofweek    = 6;
    insert_record('cron', $cron);
}

And of course, you should increment the version number in htdocs/lib/version.php.

What goes in the callfunction

Each Mahara task has exactly one callfunction that gets invoked directly by the cron script. If you're writing a core cron task, it needs to be a function in the global scope of a file loaded by cron.php. (Most Mahara core callfunctions are defined in htdocs/lib/mahara.php, but it may appropriate to place one elsewhere.) If you're writing a plugin cron task, it needs to be a public static method of the plugin's "Plugin" subclass.

Each callfunction is invoked by cron.php with no arguments, and its return value is ignored. The only thing the cron task must do is not die or error out. If it doesn't execute to completion, its cron lock won't be cleared, and the cron script won't run it again for 24 hours. So, it's best to make a callfunction robust against failure, and use log_warn() rather than log_error() when there are problems.

Output from a cron task should be sent using the log methods, log_info(), log_debug(), and log_warn(), rather than using normal PHP output methods like echo. When you use the log methods, Mahara will insert a hash number before each line, making it easier to interpret the output from multiple runs of the cron, in the cron log (if the system administrator has set up the script's output to be stored in a log).

Scheduling parameters

The Mahara cron scheduling parameters are inspired by the standard Unix crontab format, but limited to:

  • Integers
  • Asterisk "*" which matches any value
  • Hyphen "-" which defines ranges
  • Slash "/" which describes an increment of ranges
  • Comma "," which lists multiple values

The scheduling fields you can place these in are:

  • minute: 0-59
  • hour: 0-23
  • day: Day of the month, 0-30ish
  • month: 1-12
  • dayofweek: 0-6 (0 is Sunday)

Look up one of the many great crontab format tutorials on the Internet for a thorough explanation of how to use these.