Prepare to run studies against EMIS #8

inglesp · 2021-01-04T16:33:31Z

In order to run studies against EMIS with the same workflow that we use with TPP, we need to make changes to several parts of the system. This is my initial braindump of what we need to do and I'm sure I've missed things. Please edit to add your notes, and then let's discuss in a call on Tuesday or Wednesday this week.

job-server (@ghickman)
- users need to be able to select which backend they run jobs
  - initially, we want to hide the ability to run against EMIS by default, and have specific opt-ins for supported repos. Longer term we will need to bake-in the concept of which backends are required or able to run a given study definition (see below). Initial implementation could just be a database flag which is false by default and we manually enable for specific repos.
- /status/ will need to show information about multiple backends
- Think about error reporting / sentry / etc
- questions:
  - should a user be able to run a job against multiple backends? (@sebbacon's best placed to know what users will expect)
    - my (Seb's) view is that UI support would be via an extension of the usual "run" mechanism; "run all" would select everything as it currently does, but there are now N * M tickboxes where N is backends and M is actions. Users could uncheck a specific backend if they wanted (actions would be grouped visually by backend in the UI); if they want to completely exclude a backend from the UI they'd do this by editing their project.yaml (see below). "Supported backends" would show somewhere obvious on a workspace header.
job-runner (@evansd)
- possibly no code changes?
- longer term we will need to bake-in the concept of which backends are required or able to run a given study definition. This will probably be on a per-column basis; you might be able to extract patient ages in emis but not SGSS status, for example. Users should also be able to define which backends are included or excluded in their project.yaml, i.e. to be able to skip TPP backend completely (for example, they just don't need it, so it's faster)
- we'll need to flesh out playbooks/EMIS.md
EMIS infrastructure (@bloodearnest)
- user management (lower priority)
  - set up unix groups for level 2/3/4 access
  - create user accounts
    - level 2 access should be limited to a handful of engineers with NHSE contracts
    - level 3 access should be limited to researchers with appropriate NHSE contracts
    - level 4 access can be wider
      - what are the requirements here?
  - set up directory structure necessary for high/medium privacy outputs
- ensure that we can pull repos from GH
- ensure that we can push output to GH
- ensure users can install opensafely cli tool and os-release script (see docs.opensafely.org)
- harden and buildout software installations
  - consider asking for an ubuntu container within which we have root?
  - job runner as a service - systemd?
  - scripted installation - at least the basics we can build on, supplemented by installation narrative if needed as stopgap
  - backups if necessary
  - log rotation, disk space monitoring, root cron emails setup, etc, if we have root access. If not, conversation with EMIS about what they've set up
- work out how to support viewing, editing, publishing outputs
  - For viewing outputs, a web browser should be fine (pdfs, svgs, html and text)
  - For diffing outputs, command line git may be sufficient but visual would be ideal: Github Desktop or at a push, gitweb?
  - For redacting outputs, a text editor is needed. We could consider mandating VS Code for simplicity, but worth canvassing
  - For publishing outputs, we may need to install Github Desktop, although again we might be able to mandate command line git (if we provide adequate documentatin)
  - I think it boils down to either (a) provisioning a Windows review server with access to L4 data or (b) providing a web browser and expecting command line tool usage. And I think (a) is probably unavoidable
- questions:
  - can we be responsible for creating user accounts?
cohort-extractor (@inglesp)
- questions:
  - do we need to pass EMIS_ORGANISATION_HASH in as an environment variable, or can it be hard-coded?
  - how do we support studies that use backends with different coding systems?
    - it would be wonderful if we could use SNOMED in TPP
  - how can we support more features from the TPP backend (eg date expressions) in EMIS?
  - is there a half-way house for things we're willing to implement now while we await refactor?

The text was updated successfully, but these errors were encountered:

sebbacon · 2021-01-05T13:59:40Z

Discussion so far:

The simplest thing that Works is to provide the value of the BACKEND environment variable to actions.

As this gives people a gun which they can aim at their feet, this should come with best practice guidance:

How to write a study definition with conditionals
How to structure your pipeline so most of its actions are indifferent to which backend they are run on (this might be "write a normalising action as the second pipeline step", for example)
How to test against different backends

ACTION: write up some example project.yaml and consider the implication in implementation, particularly regarding complicating dependency resolution. You would want to take a single tree of multiple dependencies and project that into a single tree as soon as possible and inspect that.

ACTION: We should also add a top-level run_on (or similar) key to project.yaml which can default to tpp (for backwards compatibility).

ACTION: We need to consider that publishing outputs will often result in name collisions; the os-release script should be updated to namespace with the backend.

ACTION: Also consider including backend in jobrunner output paths; will "backends exist" help as a concept for people to understand when doing local runs? There is also an argument for including the backend in filenames to help analysts when they are manually combining and comparing outputs from different backends

ACTION: wireframe the resulting UI to help users visualise what they've requested when multiple backends are involved

ACTION: sysadmin side should be scripts in opensafely/sysadmin repo and sufficient accompanying playbook documentation to make it easy for us to set it up again

ACTION: consider RDP server & ssh authentication in EMIS environment; test Github Desktop in that environment

sebbacon changed the title ~~Preprare to run studies against EMIS~~ Prepare to run studies against EMIS Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prepare to run studies against EMIS #8

Prepare to run studies against EMIS #8

inglesp commented Jan 4, 2021 •

edited by sebbacon

Loading

sebbacon commented Jan 5, 2021 •

edited

Loading

Uh oh!

Prepare to run studies against EMIS #8

Prepare to run studies against EMIS #8

Comments

inglesp commented Jan 4, 2021 • edited by sebbacon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sebbacon commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inglesp commented Jan 4, 2021 •

edited by sebbacon

Loading

sebbacon commented Jan 5, 2021 •

edited

Loading