How the importer works (wip)

Calendar Importer Concepts

(I did not name these.)

This document describes the broad high level view of the different parts of PlaceCal that are involved in the calendar importer.

Models

Calendar Model

Holds the user's configuration for importing and any user feedback we need to give them. Also has the option for strategy and related partner information

Event Model

Holds an event information that is used on the front end. Is tightly coupled to calendar.

The worker

Importer

The code that wraps the instance of an import job. Is triggered by an ActiveJob task or manually through rake tasks. It is also used by the calendar model to check if the user supplied source URL is something that can be imported from. If something goes wrong with a job run it is detected and the Importer then tries to update the Calendar to be in an error state (so the user can have some feedback).

Event Resolver

Each incoming event has to be carefully constructed according to the calendar strategy. Some events (from ICS) have recurring rules so one 'source event' may have many 'database event records' as we just populate repeating events manually in the db.

Parser

Parser has two jobs: given a calendar source URL find the appropriate parser to handle that domain and to download and extract the event data from the remote end. When it has data it then uses that to create IMPORTER events (not application model Events).

Event

The object that takes the disperate data coming in from the parser and presents it in a uniform manner to the EventResolver.

Tasks

Importer tasks reside in the /lib/tasks/events.rake file and perform these basic actions

  • Scan the Calendar table for calendars that are ready for import and push them into the background worker queue for processing.
  • Scan the Calendar table for calendars and run the job immediately in process.
  • Run an import job on one calendar

Basic algorithm

Starting in CalendarImporter:

  • find calendar
  • flag calendar as in worker
  • hand over calendar to CalendarImporterTask
  • catch any exceptions and mark calendar for error

CalendarImporterTask:

  • find parser for calendar source
  • use parser to download raw event data
  • turn raw event data in EventResolver objects
  • go through each event
    • filter out if not relevant to us (is private, is in past)
    • determine location for calendar strategy
    • save event
  • remove any stale events from calendar

EventResolver:

  • determine location from strategy
    • event
    • event_overide
    • place
    • room number
    • no_location
    • online_only
  • save event
    • scope events with the same UID
    • find any events with non matching start_times and end_times
      • destroy
    • find any events that have same time
    • OR make a new event
    • update event with new information

Calendar data extraction

Parsers

Parsers are instantiated based on source URL so different URLs (domains) can be handled. Downloads the remote source payload taking care of API keys or response format. Takes the resulting payload (as JSON, XML, ICS etc) and instantiates the correct Event (not the model) object.

Events

Events exist to ingest raw event data from the Parser and present it in a uniform manner. See e.g. Events::Base#attributes. Also contains common code for event parsing like the HTML/Markdown sanitizer.

Older Notes:

The below is out of date and the importer has since been refactored out of calendar.rb

Here is a pull request for a new importer that shows which files are added or effected when adding a new type of calendar.

(This is from the “import_events” function, onwards. It’s in like a psuedocode and probably can eventually be transformed into being a flowchart, maybe)

import_events

  • Grab a list of events into parsed_events
  • EXIT if parsed_events doesn’t contain any events
  • FOR each event:
    • Grab a list of occurrences we have seen within the given time period (A list of datetime objects - (? needs verification imo))
    • SKIP if the event data is set as private, or there are no occurrences in the list
    • Add the event data UID to a local list of event UIDs
    • Set the Partner of the event data to the Calendar’s Partner
    • IF the event’s location is either “set to a single preset location defined by the calendar” or “set to a single preset location, with the address field holding a room number”
      • Set the place_id of the event data to the Calendar’s place_id field
    • Otherwise, do (some complicated logic) to give us either a Partner or an Address for the location, and add the relation to the event data
    • Update notices with the error notice (if any) returned from either creating or updating an Event record for the event data we’ve constructed
  • Delete any events that start from today onwards if they are not in the list of event uids
  • Reload and update the Calendar's list of notices, last event checksum value, last import Datetime, and unset critical_error (Critical Error seems to be “there is an error in the importer”)

create or update events

  • Takes an event data
  • Grab all current and future instances of the event matching the event data GID, and if it is a recurring event,
  • Delete all the upcoming events whose start and end times are not in any of the newly scraped events