Actions

Developer Area/Setting up Elasticsearch

From Mahara Wiki

< Developer Area

Mahara ships with an optional plugin to use an Elasticsearch search server to provide internal indexing and searching. This page describes how to install Elasticsearch on a developer machine, and configure your Mahara site to use it.

0. Dependencies

1. Elasticsearch requires Java to run.

2. These instructions assume this is a developer environment, where Elasticsearch and Mahara's web server are running on the same machine. For a production environment, you may want to move Elasticsearch to a different server, and to add some access control to it. Those tasks are outside the scope of this page.

3. The Elasticsearch plugin relies on database triggers to keep its contents up to date. So the database account used by Mahara must have the ability to create and delete triggers.

1. Install Elasticsearch

There are some packaged installations of Elasticsearch available, but for development purposes this isn't necessary. Elasticsearch is available as a completely self-contained ZIP file.

1. Download the Elasticsearch ZIP file from www.elastic.co. If you're wondering what version of Elasticsearch to install, try the latest. Mahara uses a simple subset of the Elasticsearch functionality, which should work with a wide range of Elasticsearch versions. If the latest version doesn't work, look in htdocs/lib/elastica/READEME.markdown to see what version of Elasticsearch is compatible with our current version of the Elastica PHP library.

2. Extract the Elasticsearch ZIP file into a convenient directory. Call it elasticsearch

3. In your elasticsearch directory, find the file config/elasticsearch.yml and open it in a text editor. Add these configuration lines to the bottom (replace YOURNAME with your name). These will prevent your site from automatically clustering with other Elasticsearch servers other devs might be running on your network.

cluster.name: YOURNAME-dev-elasticsearch
discovery.zen.ping.multicast.enabled: false //note: In 5.5.3 This had to be taken out to run as it caused an error that the setting doesn't exist

4. Once that's done, open a terminal and cd into your elasticsearch directory.

5. In the terminal, run this command: bin/elasticsearch

You should see your terminal fill up with Elasticsearch log messages, indicating that the server has started running. Elasticsearch will now continue to run until you close this Terminal window, or hit Control-C to kill it.

2. Configure Mahara

1. While your Elasticsearch server is running, log in to your Mahara site (which is running on a web server on the same machine as your Elasticsearch server).

2. Go to "Administration -> Extensions -> search -> elasticsearch"

3. This should bring up the "Plugin administration: search: elasticsearch" page, which lets you configure the Elasticsearch search plugin for Mahara. You'll notice that many of the settings are grayed out and can only be changed by editing your config.php file. That's okay! The default settings will work perfectly with an Elasticsearch server also on its default settings, running on the same machine.

4. Scroll down, and select all the artefact types that you want Elasticsearch to index. (Probably all of them.)

5. Click Save

6. Go to "Administration -> Configure site -> Site options"

7. Open the "Search settings" subsection, and set "Search plugin" to "Elasticsearch".

8. Press the "Update site options" button. You may notice a longer-than-normal "Loading" time after pressing this. That's because, when you first activate the Elasticsearch plugin, Mahara initiates an initial indexing of the site. So once you see the "Site options have been updated" message, you should be able to use the search field at the top of the page, and get search results based on the current content of your site.

3. Updating data in Elasticsearch

Mahara uses triggers, a queue table, and a cron job, to keep the data in the Elasticsearch server up to date.

0. When you first enable Elasticsearch, the plugin runs a database query that puts a record into the search_elasticsearch_queue table for every artefact, view, and user in your database.

1. Henceforth, every time you insert, update, or delete a record of the types indexed by Elasticsearch, the triggers insert a record into the search_elasticsearch_queue database table.

2. The "search.elasticsearch.cron" cron function, which is scheduled to run every 5 minutes, pulls records from this table, and then pushes them into the Elasticsearch server using Elasticsearch's REST api.

3. The number of records sent to Elasticsearch in a single cron run, is limited by the "cronlimit" setting in the Elasticsearch plugin settings page.

So you can update the data in Elasticsearch by simply running the cron. There is also a standalone cron script in the Elasticsearch plugin directory, which you can use if you want to run the Elasticsearch cron function more frequently: htdocs/search/elasticsearch/cron.php

4. Resetting Elasticsearch

If your Elasticsearch falls out of sync due to problems with your server, or if you just want to reset it, you can do that using the "Reset ALL Indexes" button on the Elasticsearch plugin settings page. This will delete & recreate the database triggers, destroy and recreate the Elasticsearch index, clear out the search_elasticsearch_queue table, and then re-fill it by directly querying your database, and send one starting set of records to the Elasticsearch server.

5. Troubleshooting Elasticsearch

As well as looking at the log output from the Elasticsearch program in your terminal, you can also query Elasticsearch directly, using the same REST api that Mahara uses. See the Elasticsearch website for extensive documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/_exploring_your_cluster.html

When wanting to click 'Reset' for resetting all indexes from 'Plugin administration: search: Elasticsearch', if it says something about a cron job in red: run this in the postgrtes db: delete from config where field like '\_%';

Some commands that might be helpful to start with:

1. Status of your index (putting ?pretty=true on the end tells Elasticsearch to format the response to make it more human-readable.)

curl -XGET 'http://localhost:9200/mahara/_status?pretty=true'

2. Delete the index:

curl -XGET 'http://localhost:9200/mahara?pretty=true'

3. Delete all indexes in the cluster:

curl -XGET 'http://localhost:9200/_all?pretty=true'

4. A basic search of the index:

curl -XGET 'http://localhost:9200/mahara/_search?q=admin&pretty=true'