Actions

Difference between revisions of "Developer Area/Specifications in Development/Search 2.0"

From Mahara Wiki

< Developer Area‎ | Specifications in Development
(add a note about the search box itself)
 
(14 intermediate revisions by 4 users not shown)
Line 2: Line 2:
  
 
There are a few elements in here which aren't directly related to search but that are designed to provide more metadata and therefore improve the quality of searches.
 
There are a few elements in here which aren't directly related to search but that are designed to provide more metadata and therefore improve the quality of searches.
 +
 +
=Development=
 +
[https://wiki.mahara.org/index.php/Developer_Area/Specifications_in_Development/Elasticsearch Elasticsearch: installation]
 +
 +
[https://wiki.mahara.org/index.php/Developer_Area/Specifications_in_Development/Elasticsearch_Plugin Elasticsearch plugin]
  
 
=Full text search=
 
=Full text search=
Line 13: Line 18:
 
* users (firstname, lastname, email)
 
* users (firstname, lastname, email)
 
* groups (name and description)
 
* groups (name and description)
 +
 +
Note that searches will not pick up the content of blocks (e.g. Google Apps block, external feed block, etc.).
  
 
==Search box==
 
==Search box==
  
 
The universal search box will be done in a dashboard block. It can be removed by users but it's also visible for logged out users.
 
The universal search box will be done in a dashboard block. It can be removed by users but it's also visible for logged out users.
 +
 +
By default, it will appear above the "popular pages showcase" block, once that block is in core.
  
 
The types of results you will get depend on the search plugin you have enabled. For example, there will be no full text search in the content of artefacts if using the <tt>internal</tt> plugin.
 
The types of results you will get depend on the search plugin you have enabled. For example, there will be no full text search in the content of artefacts if using the <tt>internal</tt> plugin.
Line 22: Line 31:
 
==Search result page==
 
==Search result page==
  
Clicking on a result will take you as close as possible to the result. For example, if an artefact matches one of the search terms, then clicking on that result will take you to the artefact page (which has a link back to the containing view).
+
The page will also use facets so that users can easily hide results in the categories they don't want to see (e.g. hiding the user account and group matches).
  
==Revive lucene plugin==
+
Clicking on a result will take you as close as possible to the result. For example, if an artefact matches one of the search terms, then clicking on that result will take you to the artefact page (which has a link back to the containing view) if it exists.
  
Not sure about the state of the <tt>solr</tt> search plugin one but we're told it no longer works (see [https://bugs.launchpad.net/mahara/+bug/680710 bug #680710]) and we'll be using [http://www.elasticsearch.org/ elasticsearch] instead.
+
===New parameter on paginated artefact pages===
  
==Plugin architecture==
+
For artefact pages that are displayed using the pager (e.g. blog, blogpost, feedback, plans), we'll need to add a new query string parameter to tell the pager to display a particular page. This will allow us to link to the right page from the search result page.
  
The <tt>elasticsearch</tt> plugin will inherit from <tt>internal</tt> search and may in the future override methods that it can implement faster.
+
==Access control==
  
In order to get the universal search working, we'll add new methods to the <tt>internal</tt> search plugin which will only be implemented in <tt>elasticsearch</tt>. Search plugin writers may choose to implement them in their own plugins, but <tt>internal</tt> search is not going to have an implementation for it.
+
Mahara will only show results that a user is allowed to see. This means for example that the logged out page will only show content from public views and groups.
  
==Facetted search pages==
+
Otherwise, We will implement the access control checks in one of the following two ways:
  
(This may require the Solr search plugin and not have a fallback for the internal search plugin.)
+
# We will use the existing SQL queries to get a list of all of the view IDs that the current user has access to. Then we will query <tt>elasticsearch</tt> with a restriction on these view IDs. Should the SQL turn out to be too slow, we will cache that list for a certain period of time with the understanding that some content might be missing from the results or that some of the results returned may no longer be accessible.
 +
# We will mirror the access control table inside the <tt>elasticsearch</tt> data structures and make sure that anything in Mahara that changes the database table gets refactored to use a function which updates both the database and the search server. The downside of this approach is that we will duplicate the data and that we will also duplicate the "query".
  
Each of the following existing searches will include a few facets.
 
  
* user search ("Find friends"): insitution, institution category, user tags, role (e.g. staff, admin, other)
+
==Plugin architecture==
* group search ("Find groups"): group category, group type, group tags
 
* page search ("Shared pages"): institution, institution category, page tags, user tags (user pages), user role (user pages only), group tags (group pages only), group category (group pages only)
 
  
==Full-text search for "Shared pages"==
+
Not sure about the state of the <tt>solr</tt> search plugin one but we're told it no longer works (see [https://bugs.launchpad.net/mahara/+bug/680710 bug #680710]) and we'll be using [http://www.elasticsearch.org/ elasticsearch] instead:
  
(This may require the Solr search plugin and not have a fallback for the internal search plugin.)
+
* documentation on how to setup elastic search will be provided
 +
* Mahara will perform some sanity checks on your elastic search configuration to make sure everything is setup right (and display appropriate warnings if it's not)
 +
* some debugging information will be available, along with a manual "reindex" button
  
This will allow users to search in the contents of pages that are visible to them:
+
The <tt>elasticsearch</tt> plugin will inherit from <tt>internal</tt> search and may in the future override methods that it can implement faster.
  
* artefact content (where it makes sense)
+
In order to get the universal search working, we'll add new methods to the <tt>internal</tt> search plugin which will only be implemented in <tt>elasticsearch</tt>. Search plugin writers may choose to implement them in their own plugins, but <tt>internal</tt> search is not going to have an implementation for it.
* artefact description and tags
 
* filenames
 
* file contents (e.g. PDF, ODF, Word documents, text files)
 
 
 
=Extra Tagging=
 
 
 
==Add user tags==
 
 
 
We currently have tags for users but they are set by the system and are not user-editable.
 
 
 
We will make new types of tags which will be '''defined by each user''' in their profile area. They will be displayed as part of the '''Profile Info block'''.
 
 
 
Furthermore, these tags will be searchable in the "Find friends" and "Copy page" searches.
 
 
 
==Add group tags==
 
 
 
These tags will be '''defined by group owners''' in the group settings. They will be displayed as part of the '''Group Info block''' on the group homepage.
 
 
 
The tags will be searchable in the "Find groups" searches.
 
 
 
Group categories will be replaced with group tags.
 
  
==Create a new "Page Tags" block==
+
===Hooks===
  
Users are already able to assign tags to the pages they create but these tags aren't displayed anywhere on these pages.
+
Thanks to the existing search plugins, we already have a few hooks in Mahara that can be used to trigger the reindexing of some of the content.
  
==Landing pages for tags==
+
We will be adding more hooks, for example to detect changes in the contents of artefacts.
  
When tags are displayed anywhere, it will be in the form of links that will lead to index/landing pages which will display other similar items.
+
=Search improvements=
  
There will therefore be three different landing pages displaying items:
+
Other things that we could improve later but that are out of scope at the moment:
  
* one for all users tagged with a particular tag (can be optionally restricted to one of the institution you belong to)
+
* making sure there's only ever 1 search box per page (usability issue)
* one for publicly visible groups tagged with a particular tag
+
* migrating all slow searches to <tt>elasticsearch</tt> (while retaining the implementation in <tt>internal</tt>)
* one for pages (views) accessible to the current user and tagged with a particular tag
+
* search inside the [http://www.elasticsearch.org/guide/reference/mapping/attachment-type.html contents of uploaded files] (e.g. PDF, ODF, Word documents, text files)

Latest revision as of 16:23, 25 June 2013

This is a proposal for improving the search experience in Mahara. It covers searching for users, groups and pages (views).

There are a few elements in here which aren't directly related to search but that are designed to provide more metadata and therefore improve the quality of searches.

Development

Elasticsearch: installation

Elasticsearch plugin

Full text search

The main goal of this work is to add a universal search box to the home page which will search in:

  • forums (title & description)
  • forum threads (contents)
  • views (title, description, tags)
  • artefacts (tags, content)
  • users (firstname, lastname, email)
  • groups (name and description)

Note that searches will not pick up the content of blocks (e.g. Google Apps block, external feed block, etc.).

Search box

The universal search box will be done in a dashboard block. It can be removed by users but it's also visible for logged out users.

By default, it will appear above the "popular pages showcase" block, once that block is in core.

The types of results you will get depend on the search plugin you have enabled. For example, there will be no full text search in the content of artefacts if using the internal plugin.

Search result page

The page will also use facets so that users can easily hide results in the categories they don't want to see (e.g. hiding the user account and group matches).

Clicking on a result will take you as close as possible to the result. For example, if an artefact matches one of the search terms, then clicking on that result will take you to the artefact page (which has a link back to the containing view) if it exists.

New parameter on paginated artefact pages

For artefact pages that are displayed using the pager (e.g. blog, blogpost, feedback, plans), we'll need to add a new query string parameter to tell the pager to display a particular page. This will allow us to link to the right page from the search result page.

Access control

Mahara will only show results that a user is allowed to see. This means for example that the logged out page will only show content from public views and groups.

Otherwise, We will implement the access control checks in one of the following two ways:

  1. We will use the existing SQL queries to get a list of all of the view IDs that the current user has access to. Then we will query elasticsearch with a restriction on these view IDs. Should the SQL turn out to be too slow, we will cache that list for a certain period of time with the understanding that some content might be missing from the results or that some of the results returned may no longer be accessible.
  2. We will mirror the access control table inside the elasticsearch data structures and make sure that anything in Mahara that changes the database table gets refactored to use a function which updates both the database and the search server. The downside of this approach is that we will duplicate the data and that we will also duplicate the "query".


Plugin architecture

Not sure about the state of the solr search plugin one but we're told it no longer works (see bug #680710) and we'll be using elasticsearch instead:

  • documentation on how to setup elastic search will be provided
  • Mahara will perform some sanity checks on your elastic search configuration to make sure everything is setup right (and display appropriate warnings if it's not)
  • some debugging information will be available, along with a manual "reindex" button

The elasticsearch plugin will inherit from internal search and may in the future override methods that it can implement faster.

In order to get the universal search working, we'll add new methods to the internal search plugin which will only be implemented in elasticsearch. Search plugin writers may choose to implement them in their own plugins, but internal search is not going to have an implementation for it.

Hooks

Thanks to the existing search plugins, we already have a few hooks in Mahara that can be used to trigger the reindexing of some of the content.

We will be adding more hooks, for example to detect changes in the contents of artefacts.

Search improvements

Other things that we could improve later but that are out of scope at the moment:

  • making sure there's only ever 1 search box per page (usability issue)
  • migrating all slow searches to elasticsearch (while retaining the implementation in internal)
  • search inside the contents of uploaded files (e.g. PDF, ODF, Word documents, text files)