Importer fetches data from Sisu using the export apis. See example of export api, importer uses only since
(modification ordinal) and limit
parameters. The fetched data is processed in mankelis and saved to PostgreSQL db.
New data from Sisu is fetched once an hour.
For step by step overview of how the importer works, see the following READMEs
Connect to db: docker exec -it sis-importer-db psql -U dev -h importer-db -d importer-db
Password is dev
-
Document state:
documentState
field present in all data defines should the data be used or not. Mainly importer ignores any other data thanACTIVE
(DRAFT
andDELETED
in most cases should be ignored). -
Snapshot vs regular data: Snapshot data is described as follows:
Start of validity for the snapshot. End of validity is defined by the snapshotDateTime of a possible later snapshot
Meaning one needs to find the most recent non-future snapshot date time and active document state to find the correct instance of the given object. For normal data, the same is greatest modification ordinal (=newest version of the object) with active document state.
With snapshot data in the db is multiple rows (versions) of the object whereas with the regular data there is only present the latest version.
-
Manage production and staging environments: use scripts
run_scaled.sh
andwipe_ordinals.sh
located in the server to manage importer. It is important that mankelis are scaled properly in production.
- https://importer.cs.helsinkif/exploder/reset/:table?token= deletes a single table and triggers fetch. See tables here and token from the server.
- https://importer.cs.helsinkif/exploder/force_update?token= triggers a fetch for all tables
- To add new fields to be fetched from Sisu: Modify message handlers. Remember to add any new columns to models importer-mankeli models.
- To fetch new model from Sisu: Create new message handler and service. Finally add new service to index.js and test locally that importing works.
- Debug data coming from specific channel:
- First, comment out any other channels in the serviceIds folder in this file.
- Then, go to importer-mankeli/src/debug. Add a custom debug handler into the customHandlers-folder: view the existing custom handlers for how-to.
- Add your handler to the index.js file in the debug-folder (in the array).
- Run with "npm start". It will log stuff into console. If you want, make it write the relevant logs to a file for much better experience. (And please push that code to repo too)
- Remember to remove any sensitive identifiers in your matcher before pushing code. Also remove the handler from the array, but you can leave the file in the repository if it may be useful in the future.
- ORI (student data) https://sis-helsinki.funidata.fi/ori/docs/index.html
- KORI (course data) https://sis-helsinki.funidata.fi/kori/docs/index.html
- ILMO (course enrolments) https://sis-helsinki.funidata.fi/ilmo/docs/index.html
- OSUVA (study plans) https://sis-helsinki.funidata.fi/osuva/docs/index.html
- ARTO (assessments) https://sis-helsinki.funidata.fi/arto/docs/index.html
- See initial setup in https://version.helsinki.fi/toska/dokumentaatio/-/blob/master/guides/how_to_sis-importer_locally.md - contains too much secret data to have here
npm start
will start the application scaled to 3 mankelis.
Can't wait? Populate db with
./populate-db.sh
The script downloads a daily dump. Go to importer and run the backup script if you need the edgest of the edge.
- To start db, adminer and db-api
./run.sh db up
Shutting down: ./run.sh down
Clearing everything: ./run.sh morning
Importer uses docker network called importer_network
- you can get it by running ./run.sh db up
.
Add the following to your non-importer project to include it in the importer network! May require further config with the individual project.
docker-compose.yaml
---
networks:
default:
external:
name: importer_network
Now accessing importer-db will work from the other project. If it's too much work you can change the default network of importer similarly to the other projects. In that case plz do not commit the change here.
Importer is idempotent. You can delete data, run it multiple times, whatever, and after you run it it'll always generate the same db.
It keeps the status of individual data in redis (see reset-ordinals). You can connect to the redis instance when it's running with
$ docker exec -it <redis-container> redis-cli
By setting the LATEST_XXXX_ORDINAL to 0 (e.g. set LATEST_ENROLMENT_ORDINAL "0") you can refresh data. Or similarly skip data by setting the value high enough. Depends on the data what is a good number.
You can use reset-ordinals.sh as the example. It resets all of the ordinals to 0.
Currently only importer-db-api has some tests. If you want to run them locally, you can run these commands:
-
Start the services
docker compose -f docker-compose.ci.yml -f docker-compose.test.yml up -d
-
Run the tests
docker exec -it sis-importer-db-api npm test
If you want to run the tests in watch mode, you can add
-- --watchAll
to the end of the command. This will automatically rerun the tests when you make changes to the code and save them. -
Stop the services
docker compose -f docker-compose.ci.yml -f docker-compose.test.yml down
If one wants to increase the speed of the importer when developing, set the flag SONIC
to 1
here
http://localhost:5051/?pgsql=importer-db&username=dev&db=importer-db&ns=public
All three services (api, mankeli and db-api) go through individual staging and production github actions workflows, defined in .github/workflows.
Docker images are tagged as production
and staging
and are pulled automatically into the production and staging environments.