Developer Area/Import//Export/Import: Implementation Plan

From Mahara Wiki

< Developer Area‎ | Import//Export
Revision as of 17:55, 9 May 2011 by Brettwilkins (talk | contribs) (Created page with "'''<span style="background-color: rgb(192, 192, 192)">Note:</span>'''<span style="background-color: rgb(192, 192, 192)"> This document describes how we're planning to import LEAP…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Note: This document describes how we're planning to import LEAP2A data. It's not final yet! - N

PluginImportLeap gets access to the absolute path of a leap2a.xml. leap2a.xml files are allowed relative URLs to other resources, so as long as those resources have been unpacked in that directory, everything will be fine. Also: zipping up this content is NOT part of the spec - as per, things can be packaged in more than one way.

Note: I'm not sure where validate_import_data fits into this. It's only used during send_content_ready, so any code we write to do importing might not even use it.



Given the file path we have, the document is parsed by an XML DOM parser. We load the whole thing into memory, for ease of processing. Hopefully this shouldn't cause too many RAM issues, but we should probably increase the RAM limit when processing such a file, and make sure the plugin cleans up after itself.

We can check some things about the document to see how well formed it is:

  • is it an atom feed? does it have xmlns:leap? xmlns:portfolio?
  • does it have an ?
  • does it have a list of s?

first pass - get scores from plugins for each entry

The object of this pass is to find out all the possible ways that each entry can be converted into data in Mahara, and also find out how "good" each possibility is.

An artefact plugin could provide several different possible "strategies" for importing a given entry, all with different scores depending on how well the entry maps to the strategy.

Artefact plugins will need to keep a list of the strategies they can apply, separately of any entries. E.g., the file plugin may have "import as image" and "import as file" strategies. Each strategy may need some extra fields associated with it, such as an english language description of what the strategy does. This will be necessary for the interactive import.

In this step, entries are passed to artefact plugins one at a time. They examine the entry and return a list of strategies that could apply to the entry, what score they want to give that strategey, and what other entries would be required to implement the strategy. Higher scores mean that the artefact plugin thinks the strategy will work better for the entry.

 strategy_listing = []
 for each entries as entry_id => entry
    for each artefact plugin as artefact_plugin
       // pass entry to artefact plugin - also pass DOM object so
       // artefact plugin can do XPath to work out what else is available
       // artefact plugin returns a list of (input_strategy, score, other_required_entries) tuples
       push strategy_listing[entry_id],
          all (artefact plugin, input_strategy, score, other_required_entries) tuples returned
    end for
 end for

 The strategy listing is in the form:

      entryid => [
          (artefact_plugin, input_strategy, score, other_required_entries),
          (artefact_plugin, input_strategy, score, other_required_entries)...
      entryid => [
          (artefact_plugin, input_strategy, score, other_required_entries),
          (artefact_plugin, input_strategy, score, other_required_entries)...


At this point, we now know all the possible things we could do with the export.

Note: Scores would be influenced by things like whether the entry conflicts with some information the user already has. E.g. this is a possibility for profile fields. We would want to arrange it so that importing introduction as a file had a lower score than importing it as an introduction, if one already existed. This is because:

  1. When doing import of new user accounts, there won't be conflicts, so it'll import as profile information just fine.
  2. When doing interactive import into existing accounts, it will show up as two strategies for the field, with the default being to import as a file. We might even be able to introduce the concept of 'conflict resolution default', so the user could say that by default the info in the import overwrites their current info.

converting possibilities into a plan

At this point, we have to translate the possibilities we have been given into a plan of how to import each entry. This can be done either interactively or non-interactively (automated import). The interactive method is where we take the possibilities and present them to the user, asking them to help turn it into a plan. With the non interactive method, we try to do this automatedly.

non-interactive import

With the non-interactive import, we use the scores to plot a plan of attack. We sort the strategy listing we got previously from highest to lowest score, then take each entry from the listing one by one, adding it plus the associated strategy data to the load mapping. After adding each entry, we add the entry and any other entries required by the strategy to a 'unusable' list, which is checked for each future entry we attempt to take. If it's on the 'unusable' list, then we will skip that entry.

We should examine the list for each entry, to determine if there were more than one entries with the same, highest, score. If so, this should be recorded for debugging purposes (this is not shown in the psuedocode).

 strategy_listing = [
          entryid => [(artefact_plugin, input_strategy, score), (artefact_plugin, input_strategy, score)...],
          entryid => [(artefact_plugin, input_strategy, score), (artefact_plugin, input_strategy, score)...]
 unusable_list = []

 sort { b->score <=> a->score } strategy_listing

 for each strategy_listing as entry_id => strategy_data
    next if entry_id in unusable_list
    next if any of strategy_data->other_required_entries in unusable_list
    load_mapping[entry_id] = strategy_data
    push unusable_list, entry_id
    push unusable_list, all from strategy_data->other_required_entries
 end for

This means that while some entries may have strategies, they may be consumed by strategies for other entries that were higher scoring and required them. This is by design.

Interactive Import

With interactive import, we need to present a UI to the user to allow them to choose exactly how data is imported. We basically need to present the strategy listing, and allow the user to pick a strategy for each entry.

There are complications. Firstly, there may be many entries, Secondly, the use of some strategies for certain entries means you can't import other entries using their strategies (this happens when a strategy requires more than one entry). Thirdly, the user probably doesn't want to do much tweaking, and will expect the defaults to be sensible. Fourthly, data from the import may conflict with data the user account already has (e.g. introduction, and other profile fields).

We can use the scoring system again to present reasonable defaults. Through clever UI design and javascript enhancement, we can hopefully do tricks like:

  • Hide entries that are being consumed by other chosen strategies (while telling the user how they can get access to the strategies for those entries)
  • Break up the list into things that want user input vs. things that have no choices and are ready to be imported
  • Allow the user to choose that they don't want to import an entry
  • Provide an easy way for the user to day "Done, start importing right now" without holding them up to make a choice they might not understand

second pass: load all entries into mahara as per load mapping

Now we have a completed load mapping, we can use it to load all the entries. We go through the load mapping, passing each strategy and elements required to the appropriate artefact plugin, and the plugin uses the strategy and elements to create and store the appropriate artefact.

 for each load_mapping as entry_id => strategy_data
 end for


Destroy all objects created in RAM.

Other considerations

Reporting of errors. We probably want to report things like same-score when doing non-interactive import.