Proposals/Done/Sitemaps: Difference between revisions
From Mahara Wiki
< Proposals | Done
No edit summary |
m (Anitsirk moved page Developer Area/Specifications in Development/Done/Sitemaps to Proposals/Done/Sitemaps: Shorter navigation, not always technical) |
||
(23 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
'''Note:''' This feature '''has been developed''', so this page should be moved from "Specifications in Development" to just plain documentation. This feature can be enabled from the "General" section of the admin settings page. It generates a machine-readable XML sitemap of the publicly accessible content in your Mahara site. The URL for the sitemap is YOURSITE/download.php?type=sitemap | |||
This feature was funded by the [http://www.minedu.govt.nz New Zealand Ministry of Education] and implemented by [http://catalyst.net.nz Catalyst IT] for Mahara 1.5. | |||
This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and [http://digitalnz.org.nz/ Digital NZ]. | This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and [http://digitalnz.org.nz/ Digital NZ]. | ||
<div id="section_1"> | <div id="section_1"> | ||
=== | ===Indexable content=== | ||
Mahara will generate sitemaps to make it easier for search engines to index: | Mahara will generate sitemaps to make it easier for search engines to index: | ||
Line 9: | Line 13: | ||
* public views (user and group ones) | * public views (user and group ones) | ||
* forum posts in public groups | * forum posts in public groups | ||
* site pages (not views) | * <s>site pages (not views)</s> | ||
</ | |||
=== | ===Sitemaps will be generated once a day=== | ||
Full sitemaps of indexable content will be generated once a day on cron. They will live in the dataroot directory. | Full sitemaps of indexable content will be generated once a day on cron. They will live in the dataroot directory. | ||
Line 21: | Line 23: | ||
The URLs that the sitemap will use are the view URLs or the artefact landing pages. | The URLs that the sitemap will use are the view URLs or the artefact landing pages. | ||
Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was | Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was shared with the public (see #4) or the last create/modify time for forum posts in public groups. | ||
Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot. | Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot. | ||
===New settings for enabling/disabling public views=== | |||
== | |||
Site administrators can already enable/disable the ability to make views public at the site level. A similar setting will be available at the institution level. | Site administrators can already enable/disable the ability to make views public at the site level. A similar setting will be available at the institution level. | ||
Line 39: | Line 39: | ||
Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled. | Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled. | ||
===Keeping track of when views are made public=== | |||
== | |||
Mahara is not currently keeping track of when a view was made public. | Mahara is not currently keeping track of when a view was made public. | ||
Line 47: | Line 45: | ||
We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views. | We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views. | ||
===Refactoring the view access permission checks=== | |||
== | |||
Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order: | Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order: | ||
# check whether public views are enabled site-wide | |||
# check whether or not the view has been made public by the author | # check whether or not the view has been made public by the author | ||
# (user views only) check whether public views are enabled in '''at least one''' of the institutions that the user belongs to | # (user views only) check whether public views are enabled in '''at least one''' of the institutions that the user belongs to | ||
Access will only be granted to the public if '''all three''' of these conditions are satisfied. | Access will only be granted to the public if '''all three''' of these conditions are satisfied. | ||
===Support for basic metadata=== | |||
== | |||
Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc. | Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc. | ||
Line 68: | Line 62: | ||
For example, a view might look like this: | For example, a view might look like this: | ||
<head> | |||
<title>My first view</title> | |||
<meta name="mahara:author" value="John Smith"> | |||
<meta name="mahara:license" value="CC BY-SA"> | |||
</head> | |||
===Support for extra metadata=== | |||
== | |||
If [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] has been set on views then it will also be included in the relevant entries: | If [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] has been set on views then it will also be included in the relevant entries: | ||
<head> | |||
<title>My first view</title> | |||
<meta name="mahara:author" value="John Smith"> | |||
<meta name="mahara:license" value="CC BY-SA"> | |||
<meta name="mahara:learningarea" value="English"> | |||
<meta name="mahara:learningareastrand" value="Speaking"> | |||
<meta name="mahara:learningareasubstrand" value="Public Speaking"> | |||
</head> | |||
===Identifying content suitable for Digital NZ=== | |||
== | |||
Since Digital NZ has specific [http://www.digitalnz.org/contributor/getting-started/#Content_Scope guidelines] about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ. | Since Digital NZ has specific [http://www.digitalnz.org/contributor/getting-started/#Content_Scope guidelines] about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an [[Developer Area/Specifications in Development/Metadata for views and artefacts|extra metadata]] field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ. | ||
Line 109: | Line 88: | ||
It will then be included in the regular sitemaps as an [http://sitemaps.org/protocol.php#extending extension] like this: | It will then be included in the regular sitemaps as an [http://sitemaps.org/protocol.php#extending extension] like this: | ||
<url> | |||
<loc>http://www.example.com/mahara/view/view.php?id=42</loc> | |||
<lastmod>2011-04-01</lastmod> | |||
<mahara:digitalnz>yes</mahara:digitalnz> | |||
</url> | |||
An alternative to this if extra metadata has not been implemented in Mahara is to let users tag their views / artefacts with the "digitalnz" tag and then add a hook in the sitemap generator to include the above XML node. | An alternative to this if extra metadata has not been implemented in Mahara is to let users tag their views / artefacts with the "digitalnz" tag and then add a hook in the sitemap generator to include the above XML node. | ||
Latest revision as of 17:43, 11 July 2020
Note: This feature has been developed, so this page should be moved from "Specifications in Development" to just plain documentation. This feature can be enabled from the "General" section of the admin settings page. It generates a machine-readable XML sitemap of the publicly accessible content in your Mahara site. The URL for the sitemap is YOURSITE/download.php?type=sitemap
This feature was funded by the New Zealand Ministry of Education and implemented by Catalyst IT for Mahara 1.5.
This is a description of what would be needed for Mahara to be able to allow users to export their content (views and artefacts) to search engines such as Google and Digital NZ.
Indexable content
Mahara will generate sitemaps to make it easier for search engines to index:
- public views (user and group ones)
- forum posts in public groups
site pages (not views)
Sitemaps will be generated once a day
Full sitemaps of indexable content will be generated once a day on cron. They will live in the dataroot directory.
These sitemaps will be in the standard format and we will grouping them in a sitemap index. The sitemaps will be gzipped and there will be one sitemap per day unless the uncompressed sitemap would be larger than 10 MB, in which case it will be broken up into multiple files.
The URLs that the sitemap will use are the view URLs or the artefact landing pages.
Each sitemap will contain the new content that was made indexable since the last sitemap. In other words, it will contain what was shared with the public (see #4) or the last create/modify time for forum posts in public groups.
Once a month, a new "comprehensive" sitemap will be created. It will include all indexable content on the site. Once that sitemap has been created, all older sitemaps (which are included in this one) will be deleted from the dataroot.
New settings for enabling/disabling public views
Site administrators can already enable/disable the ability to make views public at the site level. A similar setting will be available at the institution level.
When public views are globally disabled, institutional admins will see a grayed out checkbox and will not be able to enable them. On the other hand, when public views have been enabled for the whole site, institutional admins will be able to turn them off for their institution.
Public views will be enabled by default at the site and institution level.
Disabling public views at the institution level will not impact the ability of users to make group views and artefacts public. If an institution wants to disable that, they will need to disable the ability for their users to create groups.
Also, if a user is a member of more than one institution, he or she will be able to make views and artefacts public as long as one of their institutions has it enabled.
Keeping track of when views are made public
Mahara is not currently keeping track of when a view was made public.
We will be adding an extra "ctime" column to the appropriate database table and leave a NULL value in there for pre-existing public views.
Refactoring the view access permission checks
Because it will now be easier for admins to turn ON and OFF their users' ability to make things public, the existing permission checks within Mahara will be changed to "fail fast" and verify things in this order:
- check whether public views are enabled site-wide
- check whether or not the view has been made public by the author
- (user views only) check whether public views are enabled in at least one of the institutions that the user belongs to
Access will only be granted to the public if all three of these conditions are satisfied.
Support for basic metadata
Mahara already has some basic metadata for views. For example, author's name, title of the work, license, etc.
This will be included by adding extra meta tags in the page header.
For example, a view might look like this:
<head> <title>My first view</title> <meta name="mahara:author" value="John Smith"> <meta name="mahara:license" value="CC BY-SA"> </head>
Support for extra metadata
If extra metadata has been set on views then it will also be included in the relevant entries:
<head> <title>My first view</title> <meta name="mahara:author" value="John Smith"> <meta name="mahara:license" value="CC BY-SA"> <meta name="mahara:learningarea" value="English"> <meta name="mahara:learningareastrand" value="Speaking"> <meta name="mahara:learningareasubstrand" value="Public Speaking"> </head>
Identifying content suitable for Digital NZ
Since Digital NZ has specific guidelines about what content is suitable for their search engine, we suggest that Mahara sites wanting to have their content harvested by Digital NZ include an extra metadata field (i.e. a "yes/no" drop down) allowing end users to choose what pages wil be indexed by Digital NZ.
That extra metadata field should provide explanations (as contextual help) to help users decide whether or not their content is suitable.
It will then be included in the regular sitemaps as an extension like this:
<url> <loc>http://www.example.com/mahara/view/view.php?id=42</loc> <lastmod>2011-04-01</lastmod> <mahara:digitalnz>yes</mahara:digitalnz> </url>
An alternative to this if extra metadata has not been implemented in Mahara is to let users tag their views / artefacts with the "digitalnz" tag and then add a hook in the sitemap generator to include the above XML node.