Developer Area/Specifications in Development/Sitemaps

From Mahara Wiki
Jump to: navigation, search

Note: This feature has been developed, so this page should be moved from "Specifications in Development" to just plain documentation. This feature can be enabled from the "General" section of the admin settings page. It generates a machine-readable XML sitemap of the publicly accessible content in your Mahara site. The URL for the sitemap is YOURSITE/download.php?type=sitemap

This feature was funded by the New Zealand Ministry of Education and implemented by Catalyst IT for Mahara 1.5.

This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and Digital NZ.

Indexable content

Mahara will generate sitemaps to make it easier for search engines to index:

  • public views (user and group ones)
  • forum posts in public groups
  • site pages (not views)

Sitemaps will be generated once a day

Full sitemaps of indexable content will be generated once a day on cron. They will live in the dataroot directory.

These sitemaps will be in the standard format and we will grouping them in a sitemap index. The sitemaps will be gzipped and there will be one sitemap per day unless the uncompressed sitemap would be larger than 10 MB, in which case it will be broken up into multiple files.

The URLs that the sitemap will use are the view URLs or the artefact landing pages.

Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was shared with the public (see #4) or the last create/modify time for forum posts in public groups.

Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot.

New settings for enabling/disabling public views

Site administrators can already enable/disable the ability to make views public at the site level. A similar setting will be available at the institution level.

When public views are globally disabled, institutional admins will see a grayed out checkbox and will not be able to enable them. On the other hand, when public views have been enabled for the whole site, institutional admins will be able to turn them off for their institution.

Public views will be enabled by default at the site and institution level.

Disabling public views at the institution level will not impact the ability of users to make group views and artefacts public. If an institution wants to disable that, they will need to disable the ability for their users to create groups.

Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled.

Keeping track of when views are made public

Mahara is not currently keeping track of when a view was made public.

We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views.

Refactoring the view access permission checks

Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order:

  1. check whether public views are enabled site-wide
  2. check whether or not the view has been made public by the author
  3. (user views only) check whether public views are enabled in at least one of the institutions that the user belongs to

Access will only be granted to the public if all three of these conditions are satisfied.

Support for basic metadata

Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc.

This will be included by adding extra meta tags in the page header.

For example, a view might look like this:

 <head>
   <title>My first view</title>
   <meta name="mahara:author" value="John Smith">
   <meta name="mahara:license" value="CC BY-SA">
 </head>

Support for extra metadata

If extra metadata has been set on views then it will also be included in the relevant entries:

 <head>
   <title>My first view</title>
   <meta name="mahara:author" value="John Smith">
   <meta name="mahara:license" value="CC BY-SA">
   <meta name="mahara:learningarea" value="English">
   <meta name="mahara:learningareastrand" value="Speaking">
   <meta name="mahara:learningareasubstrand" value="Public Speaking">
 </head>

Identifying content suitable for Digital NZ

Since Digital NZ has specific guidelines about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an extra metadata field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ.

That extra metadata field should provide explanations (as contextual help) to help users decide whether or not their content is suitable.

It will then be included in the regular sitemaps as an extension like this:

 <url>
   <loc>http://www.example.com/mahara/view/view.php?id=42</loc>
   <lastmod>2011-04-01</lastmod>
   <mahara:digitalnz>yes</mahara:digitalnz>
 </url>
An alternative to this if extra metadata has not been implemented in Mahara is to let users tag their views / artefacts with the "digitalnz" tag and then add a hook in the sitemap generator to include the above XML node.