Developer Area/File uploads API

From Mahara Wiki
< Developer Area
Revision as of 18:50, 25 August 2016 by Aaronw (talk | contribs) (upload_manager)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page discusses how to use Mahara's APIs for uploading files, storing them on the server, and retrieving them for later use.

Basic principles

Dataroot, not webroot

The most basic thing to understand about Mahara is that you should never store uploaded files in the Mahara code directory itself. This is for two main reasons. First, the uploaded files may get clobbered during a later upgrade. But more importantly, it's insecure because it is prone to creating a remote code execution vulnerability. For instance, if you stored file uploads into htdocs/artefact/file/uploads, then an attacker might upload a malicious PHP file, calculate that it will be stored to htdocs/artefact/file/uploads/myscript.php, and access it in their browser by a URL such as http://www.example.com/mahara/artefact/file/uploads/myscript.php.

To avoid this whole category of vulnerabilities, Mahara instead stores files under a separate dataroot directory, which should not be directly accessible by URL. The path of the dataroot directory is specified in the config.php file, as $cfg->dataroot.

To avoid interfering with other parts of Mahara that are storing files in dataroot, you should either store your files through an existing Mahara file storage API (which will determine the storage location for you), or create your own new directory under dataroot just for whatever new class of files you're creating. If you're only uploading files temporarily, though (such as a CSV file) you probably don't need to store anything in dataroot, and can just directly process the PHP upload temp file.

Validation

Uploaded files must be validated in a few different ways:

  1. Clamav virus scanning (if enabled)
  2. Upload limits on filesize (which can be specified in Mahara, or via php.ini)
  3. File storage quota for the user, group, or institution that owns the file.
  4. For files that will be served directly via the web server (such as images, video, audio, PDFs, fonts, etc) it's also important for security reasons to validate that the file actually contains the type of content it claims to.

The easiest way to handle this validation properly, is to use one of Mahara's existing file management API's.

File upload APIs

Pieforms

There are three Pieforms elements for handling file uploads.

Pieform "filebrowser" element

The best option, where applicable, is to use the Pieform "filebrowser" element. This allows the user to upload a file, or to select a file they've already uploaded into their Content -> Files area. On the downside, it's a rather complex, so its use can be tricky, and it uses a lot of Javascript which can sometimes cause subtle bugs if you are pulling it up dynamically (say in a modal window). But there are plenty of examples of its usage in the Mahara codebase to choose from. See htdocs/artefact/blog/post.php for one of the simpler examples, where it's used for journal entry attachments.

In general usage, what you do is add a "filebrowser" element to your pieform. Then, in your pieform submit method, you simply check for the value of the filebrowser element, and it will give you the ID of the selected (or uploaded) file artefact. You don't need to write any code to handle saving the file; this is all handled automagically by the file browser when the form is submitted.

It's recommended to use this element whenever you want users to upload a file that will belong to that specific user, and be accessible to them as an artefact. (Or a group file or institution file artefact.) This element is not usable by logged out users (who don't have their own artefacts) and it's generally not applicable for file uploads that will only be processed temporarily, like Leap2a imports or CSV files uploaded by admins. It also can't be used to process files that have been uploaded outside of the web browser, such as files coming from web services or Curl requests.

Pieform "files" element

A simpler option than the filebrowser, the "files" Pieform element creates a dynamically-expandable list of file upload buttons. It is gradually being replaced with the filebrowser in most places, but is still in use in some areas of core, such as comment attachments and resume attachments. For an example of its usage look at the comment artefact's form validation and submission methods add_feedback_form_validate and add_feedback_form_submit in htdocs/artefact/comment/lib.php.

This element will return an array, where each element is a key to an element in the PHP $_FILES superglobal, representing one of the uploaded files. It does none of the necessary Mahara validation or file storage, so you'll need to use one of the API's below for that.

Pieform "file" element

There is also a very simple Pieform element called "file", which only creates an <input type="file"> and does no file handling at all. See htdocs/admin/groups/uploadcsv.php for an example of its use.

This element simply returns the value of the entry in the PHP superglobal $_FILES representing the uploaded file. It does no Mahara validation or file storage, so if you use this you'll need to manually handle file validation and processing, using one of the API's below.

ArtefactTypeFile::save_uploaded_file

Whenever a user uploads a file that goes into their "Content -> Files" storage area, that's a file artefact. If you use the filebrowser Pieform element, a File artefact will have been created for you and you won't have to worry about this. But if you're using a different file upload method, you can use this static method to save the uploaded file into a File artefact.

Unfortunately, this method relies on a poorly-documented $data attribute. The $data attribute fills in many fields of metadata for the file artefact (it's ultimately passed to the constructor for the artefact's object). The best example of its usage is probably in the Pieform filebrowser element, in the pieform_element_filebrowser_upload function in htdocs/lib/form/elements/filebrowser.php.

Example code:

$data = stdClass();

// The folder to put the file in. This refers to a "folder" artefact
// in the "Content->Files" area, not to a physical directory on the
// disk. This value should either be the ID of the folder artefact,
// or NULL to store the file in the home directory of the user/group
$data->parent = $parentfolderid;

// Who owns the folder. The ID of a user OR a group, or the name
// of an institution. (Or 'mahara' for site files.)
$data->owner = $ownerid;
$data->group = $groupid;
$data->institution = $institutionname;

// The title for the new file artefact. To avoid name conflicts, it's
// best to use ArtefactTypeFileBase::get_new_file_title to make sure
// you've got a unique name.
$originalname = $_FILES[$inputname]['name'];
$originalname = $originalname ? basename($originalname) : get_string('file', 'artefact.file');
$data->title = ArtefactTypeFileBase::get_new_file_title(
    $originalname,
    $parentfolderid,
    $data->owner,
    $data->group,
    $data->institution
);

try {
    $artefactid = ArtefactTypeFile::save_uploaded_file(
        $inputname, // The name of the <input type="file"> element the file came from
        $data,
        $inputindex, // (optional) If you're using an array of file elements, the index of the file to process
        $resized // (optional) If you've processed the file on the server-side, set this to TRUE to tell Mahara to re-check the file's size
    );
    
    // $artefactid has the ID of the new artefact
}
catch (QuotaExceededException $e) {
    // The file was too big for the user/group/institution's file storage quota.
}
catch (UploadException $e) {
    // There was some other problem uploading the file.
}

upload_manager

If you need to save a file that is not a file artefact, the upload_manager class is probably the best option. This is the class that the ArtefactTypeFile uses to move the physical files around, and it's the class used for most of the admin pages that need to store non-artefact files (such as skin fonts).

The preprocess_file method of this class will run the file through ClamAV and validate it for upload limits. (It does not check any user/group/institution file storage quotas.)

This class assumes that you are processing files that have been uploaded by the user and are present in the PHP $_FILES superglobal, and it's built to fit into the lifecycle of Pieforms validation and submission (typically along with a Pieform "file" or "files" element. If you're processing files that have come from somewhere else, like webservices or a zip archive, you'll need to go even more basic.

For a simple example of it in action, look at the code for uploading skin fonts in htdocs/admin/site/font/add.php

Example code:

function myform_validate(Pieform $form, $values) {
    $um = new upload_manager(
        $inputname, // Name of the <input type="file"> form element
        $handlecollisions, // (Optional, default FALSE) Rename upload instead of replacing existing file with same name
                           // Use with care, because the upload_manager won't tell you if it changed the upload's name!
        $inputindex, // (Optional, default NULL) If you've used an array of file inputs, which index to handle
        $optional // (Optional, default TRUE) Set to FALSE if the file upload is NOT a required field.
    );

    $error = $um->preprocess_file();
    if ($error) {
        $form->set_error($inputname, $error);
    }
}

function myform_submit(Pieform $form, $values) {
    $um = new upload_manager(
        $inputname,
        $handlecollisions,
        $inputindex,
        $optional
    );

    $error = $um->save_file(
        $directory, // The directory under $cfg->dataroot to store the file in. Will be created if necessary.
        $filename // The name to save the file as (if you use $handlecollisions this may be changed without notice!)
    );

    if ($error) {
       // Handle the error during upload.
    }    
}

Handling files not uploaded through the browser

All of the above file upload APIs rely on the PHP $_FILES superglobal, which is how PHP gives you access to files that were uploaded through the HTTP request that triggered the current script's execution.

Sometimes, though, you need to store files from other sources. For instance, if you've expanded a ZIP archive, or if you've used CURL to fetch a remote file from another server, then these files will not be present in the $_FILES superglobal.

Mahara does not very many library functions to handle this scenario. The best example of how this is handled in Mahara is the file import plugin, htdocs/import/file/lib.php. If you need a temporary staging directory, for instance, to download a file into before processing, you can either create your own temp directory somewhere under dataroot (which is what the PluginImportFile and PluginArtefactFile plugins do) or you can use $CFG->unziptempdir, if it has been defined. You will need to do your own cleanup of the temp directory afterwards (perhaps via a cron job to guard against your script crashing) and your own checking for name collisions (or use a function such as get_random_key to ensure a unique directory or filename.)

Past versions of Mahara included $CFG->pathtozip and $CFG->pathtounzip configuration parameters, for using the command-line zip utilities for zipping and unzipping. However, this has been removed from Mahara core, and now the preferred method is to use the PHP ZipArchive class.

Example code:

// Get temp directory
$tempdir = 'myplugin/tmp.' . get_random_key();
$status = check_dir_exists($tempdir, true, true);
if (!$status) {
    // Couldn't create the the temp directory
}

// ... get my file from somewhere. Maybe Curl? Maybe a ZIP?

// Validate the file
// $pathtofile = the absolute path of the file, on the physical filesystem
$error = mahara_clam_scan_file($pathtofile);
if ($error) {
    // ClamAV thinks this file has a virus!
    // handle the error, delete the file, and die.
    die();
}

// Save the file somewhere in the dataroot, and set its permissions to be correct.
rename($origfilepath, $finalfilepath);
chmod($finalfilepath, get_config('filepermissions'));

// Clean up after yourself! Delete your temp directory and its files.
// rmdirr (in htdocs/lib/file.php) will recursively delete a directory and its contents.
rmdirr($tempdir);

Creating an artefact from a non-browser-uploaded file

If you need to save these files into artefacts, you can do that with the ArtefactTypeFile::save_file method, which is similar to ArtefactTypeFile::save_uploaded_file but uses the file's pathname instead of the $_FILES superglobal. This method is used by the "file" import plugin (which handles Leap2a imports). It checks the file owner's quota, but does not check against clamAV, so you'll still need to do that manually.

// Get temp directory
$tempdir = 'myplugin/tmp.' . get_random_key();
$status = check_dir_exists($tempdir, true, true);
if (!$status) {
    // Couldn't create the the temp directory
}

// ... get my file from somewhere. Maybe Curl? Maybe a ZIP?

// Validate the file
// $pathtofile = the absolute path of the file, on the physical filesystem
$error = mahara_clam_scan_file($pathtofile);
if ($error) {
    // ClamAV thinks this file has a virus!
    // handle the error, delete the file, and die.
    die();
}

// Don't need to worry about the file's final location or permissions. File artefact will take care of that.
// rename($origfilepath, $finalfilepath);
// chmod($finalfilepath, get_config('filepermissions'));

// Save the file as an artefact.
$data = new stdClass();
$data->title = basename($origfilepath); // Title of the file artefact
$data->parent = $parentid; // ID of the folder artefact to place it in, or null for home
$data->owner = $ownerid; // ID of user who owns it
$data->group = $groupid; // *OR* ID of group that owns it
$data->institution = $instname; // $OR name of institution that owns it

try {
    $artefactid = ArtefactTypeFile::save_file(
        $origfilepath,
        $data,
        $user, // (Deprecated) the User object of the file's owner (if owned by a user)
        $outsidedataroot // (Optional, default false) If true, $pathtofile is an absolute path to the file. If false, $pathtofile is assumed to be relative to dataroot.
    );
}
catch (QuotaExceededException $e) {
    // quota exceeded
}
catch (Exception $e) {
    // Some other problem with the upload
}

if ($artefactid === false) {
   // We errored out while trying to create the file artefact.
}

// Clean up after yourself! Delete your temp directory and its files.
// rmdirr (in htdocs/lib/file.php) will recursively delete a directory and its contents.
rmdirr($tempdir);