Actions

User

Difference between revisions of "Gold/Elasticsearch"

From Mahara Wiki

< User:Gold
(Initial run at notes on Elasticsearch)
 
m (→‎The plan: fix link)
 
(8 intermediate revisions by the same user not shown)
Line 25: Line 25:
  
 
* Investigate what [https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html Elasticsearch-PHP] gives us. ES7 is fairly straightforward to use. Do we still even need the library?
 
* Investigate what [https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html Elasticsearch-PHP] gives us. ES7 is fairly straightforward to use. Do we still even need the library?
* Start a clean plugin
+
* <strike>Start a clean plugin</strike>
* Add config form support
+
* <strike>Add config form support</strike>
 
* Get data into ES7
 
* Get data into ES7
 +
** Data is being queued for indexing
 +
** Clicking Reset is walking over queued data and assembling it for submission to ES7 as a Bulk transaction and as Individual transactions.
 +
** Data is being submitted and indexed in ES7.
 +
*** <strike>artefact</strike>
 +
*** <strike>block_instance</strike>
 +
*** <strike>collection</strike> (I think)
 +
*** <strike>event_log</strike> (I think)
 +
*** <strike>group</strike>
 +
*** <strike>interaction_forum_post</strike>
 +
*** <strike>interaction_instance</strike>
 +
*** <strike>usr</strike>
 +
*** <strike>view</strike>
 
* Get data out of ES7
 
* Get data out of ES7
** Get the top level search to return results
+
** In progress
** Ensure results take into account if the user has access to see the them
+
*** Get the [https://manual.mahara.org/en/20.10/misc/fulltextsearch.html top level search] to return results
 +
*** Ensure results take into account if the user has access to see the them
 
* Add reporting support
 
* Add reporting support
 +
 +
=== The Elasticsearch-PHP library ===
 +
 +
Just reading the [https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/overview.html Overview] has already made it clear that we should stick with it. Key points are that it is a low level client and adds "cluster state sniffing, round-robin requests, and so on".  This last would have been a thing we would need to do if we didn't use the class.  With it being a low level client my concern that somethings may be abstracted away have been alleviated.
 +
 +
==== ElasticsearchDSL ====
 +
 +
The [https://github.com/ongr-io/ElasticsearchDSL ElasticsearchDSL] library is looking good for querying Elasticsearch as well.  It's not been touched in just over a year and the issue queue is short with no real stoppers in it from what I can see.

Latest revision as of 14:13, 14 June 2021

tl;dr;

Upgrading has turned out to be... involved. We're creating a new ElasticSearch7 search plugin.

Where are we?

The state of play

Upgrading the existing Elasticsearch search plugin has turned out to be way more involved that previously anticipated. The differences between how ES6 and ES7 work has meant that trying to massage the old code to work with how ES now expects data to be formatted is causing issues to cascade throughout the system revealing more places that need to be touched. This leaves me with the feeling that we are likely to miss things which leaves open the potential that the work may appear shoddy.

SotA

Elasticsearch have been moving towards a more and more simplified structure for ingesting and managing data. Things are, currently, quite 'flat' when it comes to the data being stored. Despite the data structure changing from version to version it has been trending towards a less and less complicated system. Due to this trend it is still desirable to stick with Elasticsearch.

Where to from here?

The current plan is to leave the existing Elasticsearch search plugin in place and create a new Elasticsearch 7 plugin.

This has multiple advantages;

  • Sites that are unable/unwilling to move from their existing ES server can continue with that.
  • We don't need work with existing code to try and bend it into shape for the new way things are done.
  • We can take a "clean canvas" approach and not be hobbled by previous decisions.
  • I get to build a plugin from scratch. << I am quite pleased that this is a thing :)

The plan

  • Investigate what Elasticsearch-PHP gives us. ES7 is fairly straightforward to use. Do we still even need the library?
  • Start a clean plugin
  • Add config form support
  • Get data into ES7
    • Data is being queued for indexing
    • Clicking Reset is walking over queued data and assembling it for submission to ES7 as a Bulk transaction and as Individual transactions.
    • Data is being submitted and indexed in ES7.
      • artefact
      • block_instance
      • collection (I think)
      • event_log (I think)
      • group
      • interaction_forum_post
      • interaction_instance
      • usr
      • view
  • Get data out of ES7
    • In progress
      • Get the top level search to return results
      • Ensure results take into account if the user has access to see the them
  • Add reporting support

The Elasticsearch-PHP library

Just reading the Overview has already made it clear that we should stick with it. Key points are that it is a low level client and adds "cluster state sniffing, round-robin requests, and so on". This last would have been a thing we would need to do if we didn't use the class. With it being a low level client my concern that somethings may be abstracted away have been alleviated.

ElasticsearchDSL

The ElasticsearchDSL library is looking good for querying Elasticsearch as well. It's not been touched in just over a year and the issue queue is short with no real stoppers in it from what I can see.