Actions

Difference between revisions of "Proposals/Done/Sitemaps"

From Mahara Wiki

< Proposals‎ | Done
m (Anitsirk moved page Developer Area/Specifications in Development/Done/Sitemaps to Proposals/Done/Sitemaps: Shorter navigation, not always technical)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
'''Note:''' This feature '''has been developed''', so this page should be moved from "Specifications in Development" to just plain documentation. This feature can be enabled from the "General" section of the admin settings page. It generates a machine-readable XML sitemap of the publicly accessible content in your Mahara site. The URL for the sitemap is YOURSITE/download.php?type=sitemap
 +
 +
This feature was funded by the [http://www.minedu.govt.nz New Zealand Ministry of Education] and implemented by [http://catalyst.net.nz Catalyst IT] for Mahara 1.5.
 +
 
This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and [http://digitalnz.org.nz/ Digital NZ].
 
This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and [http://digitalnz.org.nz/ Digital NZ].
  
Line 9: Line 13:
 
* public views (user and group ones)
 
* public views (user and group ones)
 
* forum posts in public groups
 
* forum posts in public groups
* site pages (not views)
+
* <s>site pages (not views)</s>
 
 
</div><div id="section_2">
 
  
 
===Sitemaps will be generated once a day===
 
===Sitemaps will be generated once a day===
Line 21: Line 23:
 
The URLs that the sitemap will use are the view URLs or the artefact landing pages.
 
The URLs that the sitemap will use are the view URLs or the artefact landing pages.
  
Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was made indexable on that day.
+
Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was shared with the public (see #4) or the last create/modify time for forum posts in public groups.
  
 
Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot.
 
Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot.
 
</div><div id="section_3">
 
  
 
===New settings for enabling/disabling public views===
 
===New settings for enabling/disabling public views===
Line 38: Line 38:
  
 
Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled.
 
Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled.
 
</div><div id="section_4">
 
  
 
===Keeping track of when views are made public===
 
===Keeping track of when views are made public===
Line 46: Line 44:
  
 
We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views.
 
We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views.
 
</div><div id="section_5">
 
  
 
===Refactoring the view access permission checks===
 
===Refactoring the view access permission checks===
Line 53: Line 49:
 
Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order:
 
Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order:
  
 +
# check whether public views are enabled site-wide
 
# check whether or not the view has been made public by the author
 
# check whether or not the view has been made public by the author
# check whether public views are enabled site-wide
 
 
# (user views only) check whether public views are enabled in '''at least one''' of the institutions that the user belongs to
 
# (user views only) check whether public views are enabled in '''at least one''' of the institutions that the user belongs to
  
 
Access will only be granted to the public if '''all three''' of these conditions are satisfied.
 
Access will only be granted to the public if '''all three''' of these conditions are satisfied.
  
</div><div id="section_6">
+
===Support for basic metadata===
 
 
===6- Support for basic metadata===
 
  
 
Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc.
 
Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc.
Line 74: Line 68:
 
   </head>
 
   </head>
  
===7- Support for extra metadata===
+
===Support for extra metadata===
  
 
If [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] has been set on views then it will also be included in the relevant entries:
 
If [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] has been set on views then it will also be included in the relevant entries:
Line 86: Line 80:
 
   </head>
 
   </head>
  
==='''8- Identifying content suitable for Digital NZ'''===
+
===Identifying content suitable for Digital NZ===
  
 
Since Digital NZ has specific [http://www.digitalnz.org/contributor/getting-started/#Content_Scope guidelines] about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ.
 
Since Digital NZ has specific [http://www.digitalnz.org/contributor/getting-started/#Content_Scope guidelines] about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ.

Latest revision as of 17:43, 11 July 2020

Note: This feature has been developed, so this page should be moved from "Specifications in Development" to just plain documentation. This feature can be enabled from the "General" section of the admin settings page. It generates a machine-readable XML sitemap of the publicly accessible content in your Mahara site. The URL for the sitemap is YOURSITE/download.php?type=sitemap

This feature was funded by the New Zealand Ministry of Education and implemented by Catalyst IT for Mahara 1.5.

This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and Digital NZ.

Indexable content

Mahara will generate sitemaps to make it easier for search engines to index:

  • public views (user and group ones)
  • forum posts in public groups
  • site pages (not views)

Sitemaps will be generated once a day

Full sitemaps of indexable content will be generated once a day on cron. They will live in the dataroot directory.

These sitemaps will be in the standard format and we will grouping them in a sitemap index. The sitemaps will be gzipped and there will be one sitemap per day unless the uncompressed sitemap would be larger than 10 MB, in which case it will be broken up into multiple files.

The URLs that the sitemap will use are the view URLs or the artefact landing pages.

Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was shared with the public (see #4) or the last create/modify time for forum posts in public groups.

Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot.

New settings for enabling/disabling public views

Site administrators can already enable/disable the ability to make views public at the site level. A similar setting will be available at the institution level.

When public views are globally disabled, institutional admins will see a grayed out checkbox and will not be able to enable them. On the other hand, when public views have been enabled for the whole site, institutional admins will be able to turn them off for their institution.

Public views will be enabled by default at the site and institution level.

Disabling public views at the institution level will not impact the ability of users to make group views and artefacts public. If an institution wants to disable that, they will need to disable the ability for their users to create groups.

Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled.

Keeping track of when views are made public

Mahara is not currently keeping track of when a view was made public.

We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views.

Refactoring the view access permission checks

Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order:

  1. check whether public views are enabled site-wide
  2. check whether or not the view has been made public by the author
  3. (user views only) check whether public views are enabled in at least one of the institutions that the user belongs to

Access will only be granted to the public if all three of these conditions are satisfied.

Support for basic metadata

Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc.

This will be included by adding extra meta tags in the page header.

For example, a view might look like this:

 <head>
   <title>My first view</title>
   <meta name="mahara:author" value="John Smith">
   <meta name="mahara:license" value="CC BY-SA">
 </head>

Support for extra metadata

If extra metadata has been set on views then it will also be included in the relevant entries:

 <head>
   <title>My first view</title>
   <meta name="mahara:author" value="John Smith">
   <meta name="mahara:license" value="CC BY-SA">
   <meta name="mahara:learningarea" value="English">
   <meta name="mahara:learningareastrand" value="Speaking">
   <meta name="mahara:learningareasubstrand" value="Public Speaking">
 </head>

Identifying content suitable for Digital NZ

Since Digital NZ has specific guidelines about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an extra metadata field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ.

That extra metadata field should provide explanations (as contextual help) to help users decide whether or not their content is suitable.

It will then be included in the regular sitemaps as an extension like this:

 <url>
   <loc>http://www.example.com/mahara/view/view.php?id=42</loc>
   <lastmod>2011-04-01</lastmod>
   <mahara:digitalnz>yes</mahara:digitalnz>
 </url>

An alternative to this if extra metadata has not been implemented in Mahara is to let users tag their views / artefacts with the "digitalnz" tag and then add a hook in the sitemap generator to include the above XML node.