Gold/Elasticsearch: Difference between revisions
From Mahara Wiki
< User:Gold
mNo edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 28: | Line 28: | ||
* <strike>Add config form support</strike> | * <strike>Add config form support</strike> | ||
* Get data into ES7 | * Get data into ES7 | ||
** Data is being queued for indexing | |||
** Clicking Reset is walking over queued data and assembling it for submission to ES7 as a Bulk transaction and as Individual transactions. | |||
** Data is being submitted and indexed in ES7. | |||
*** <strike>artefact</strike> | |||
*** <strike>block_instance</strike> | |||
*** <strike>collection</strike> (I think) | |||
*** <strike>event_log</strike> (I think) | |||
*** <strike>group</strike> | |||
*** <strike>interaction_forum_post</strike> | |||
*** <strike>interaction_instance</strike> | |||
*** <strike>usr</strike> | |||
*** <strike>view</strike> | |||
* Get data out of ES7 | |||
** In progress | ** In progress | ||
* | *** Get the [https://manual.mahara.org/en/20.10/misc/fulltextsearch.html top level search] to return results | ||
**** This is somewhat working now. The results are at a point where I need to take into account the user doing the search. Figuring out ACL now. | |||
*** Ensure results take into account if the user has access to see the them | |||
** Get the [https://manual.mahara.org/en/20.10/misc/fulltextsearch. | |||
** Ensure results take into account if the user has access to see the them | |||
* Add reporting support | * Add reporting support | ||
Latest revision as of 10:39, 25 August 2021
tl;dr;
Upgrading has turned out to be... involved. We're creating a new ElasticSearch7 search plugin.
Where are we?
The state of play
Upgrading the existing Elasticsearch search plugin has turned out to be way more involved that previously anticipated. The differences between how ES6 and ES7 work has meant that trying to massage the old code to work with how ES now expects data to be formatted is causing issues to cascade throughout the system revealing more places that need to be touched. This leaves me with the feeling that we are likely to miss things which leaves open the potential that the work may appear shoddy.
SotA
Elasticsearch have been moving towards a more and more simplified structure for ingesting and managing data. Things are, currently, quite 'flat' when it comes to the data being stored. Despite the data structure changing from version to version it has been trending towards a less and less complicated system. Due to this trend it is still desirable to stick with Elasticsearch.
Where to from here?
The current plan is to leave the existing Elasticsearch search plugin in place and create a new Elasticsearch 7 plugin.
This has multiple advantages;
- Sites that are unable/unwilling to move from their existing ES server can continue with that.
- We don't need work with existing code to try and bend it into shape for the new way things are done.
- We can take a "clean canvas" approach and not be hobbled by previous decisions.
- I get to build a plugin from scratch. << I am quite pleased that this is a thing :)
The plan
- Investigate what Elasticsearch-PHP gives us. ES7 is fairly straightforward to use. Do we still even need the library?
Start a clean pluginAdd config form support- Get data into ES7
- Data is being queued for indexing
- Clicking Reset is walking over queued data and assembling it for submission to ES7 as a Bulk transaction and as Individual transactions.
- Data is being submitted and indexed in ES7.
artefactblock_instancecollection(I think)event_log(I think)groupinteraction_forum_postinteraction_instanceusrview
- Get data out of ES7
- In progress
- Get the top level search to return results
- This is somewhat working now. The results are at a point where I need to take into account the user doing the search. Figuring out ACL now.
- Ensure results take into account if the user has access to see the them
- Get the top level search to return results
- In progress
- Add reporting support
The Elasticsearch-PHP library
Just reading the Overview has already made it clear that we should stick with it. Key points are that it is a low level client and adds "cluster state sniffing, round-robin requests, and so on". This last would have been a thing we would need to do if we didn't use the class. With it being a low level client my concern that somethings may be abstracted away have been alleviated.
ElasticsearchDSL
The ElasticsearchDSL library is looking good for querying Elasticsearch as well. It's not been touched in just over a year and the issue queue is short with no real stoppers in it from what I can see.