-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #61 from manics/june2024-notes
June 2024 notes Keynote and Session 1
- Loading branch information
Showing
7 changed files
with
247 additions
and
83 deletions.
There are no files selected for viewing
17 changes: 6 additions & 11 deletions
17
docs/events/wg_workshops/2024-06-05-june-meeting/discussion-data-access-pysyft.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,12 @@ | ||
# Facilitating researcher access to data with PySyft | ||
# Breakout: Facilitating researcher access to data with PySyft | ||
|
||
**Leads**: Dave Buckley (OpenMined) | ||
|
||
## Proposal | ||
## Notes | ||
|
||
### Summary | ||
Dave provided a quick overview of OpenMined as an organisation and their flagship open source product, PySyft | ||
|
||
[OpenMined](https://openmined.org/) will present their [PySyft notebook](https://github.com/OpenMined/PySyft) which allows to "Perform data science on data that remains in someone else's server" | ||
[OpenMined](https://openmined.org) | ||
[PySyft](https://github.com/OpenMined/pysyft) | ||
|
||
### Preparation | ||
|
||
No required preparation beyond an open mind! | ||
|
||
### Target audience | ||
|
||
No specific target audience in mind - anybody interested! | ||
The aim is to facilitate "remote data science" cf. OpenSAFELY, by providing a framework for such services |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
58 changes: 42 additions & 16 deletions
58
docs/events/wg_workshops/2024-06-05-june-meeting/discussion-what-words-to-use.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,45 @@ | ||
# New research: language to use when explaining SDEs and TREs to the public | ||
# Breakout: New research: language to use when explaining SDEs and TREs to the public | ||
|
||
**Leads**: Emma Morgan (Understanding Patient Data) | ||
|
||
## Proposal | ||
|
||
### Summary | ||
|
||
[Understanding Patient Data](https://understandingpatientdata.org.uk/)(UPD) has recently published their [final report](https://understandingpatientdata.org.uk/what-words-use) on the What Words To Use project with Research Works, which focused on exploring the best language to use when explaining Secure Data Environments and Trusted Research Environments to the public. | ||
|
||
During the event UPD will make a 20 minutes presentation on the project and its results, followed by an open discussion with the community. | ||
|
||
### Preparation | ||
|
||
No required preparation beyond an open mind! | ||
|
||
### Target audience | ||
|
||
No specific target audience in mind - anybody interested! | ||
[Understanding Patient Data](https://understandingpatientdata.org.uk/)(UPD) has recently published their [final report](https://understandingpatientdata.org.uk/what-words-use) on the _What Words To Use_ project with Research Works, which focused on exploring the best language to use when explaining Secure Data Environments and Trusted Research Environments to the public. | ||
|
||
## Notes | ||
|
||
- Emma took us through UPD project: how to explain TREs and related terms to the public, and generate some explainer materials. | ||
- Part 1: Rapid evidence review | ||
- Patients supportive of direction to data access through TREs | ||
- limited evidence on specific aspects of TREs | ||
- Commercial use of data sometimes controversial | ||
- Comms around TREs: explaining TREs is hard | ||
- Lack of consistency in terms used (SDE, TRE etc,). Variety of names confusing and needs to be resolved | ||
- 5 Safes useful as conceptual basis | ||
- Benefits of data use key | ||
- Don't assume prior knowledge | ||
- Part 2: workshops | ||
- 7? workshops, 6 participants each, tried to provide a good demographic mix (age, ethnicity, gender, digital exclusion) | ||
- People care about: Is the data identifiable? Who has access? Reassurance that the data is safe. What the data is being used for, and for what purpose/benefit. | ||
- Some consensus in preferences over the use of certain terms/language. | ||
- Part 3: Explainer materials/draft resource: different tiered 'levels' of information for different levels of interest | ||
- 2x workshops | ||
- Interviews with domain expertise to fact-check | ||
- 1st level: Concise description of TRE/SDEs | ||
- 2nd level: Animation being prepared w/story board and voiceover | ||
- 3rd level: more detailed info on specific terms (e.g. 5 Safes) | ||
|
||
## Discussion | ||
|
||
- How might you use this information/resource? | ||
- Honest broker service in NI, will flag this report with team who are leading on some work on public transparency (funding from UKRI). Liked the way the materials are adaptable for own use | ||
- Works in HDR Global, lower and middle income countries, lots of interest in TREs there. Work could be useful across these different regions, approach could be taken and tested across different regions. | ||
- RDS released TRE explainer, will tweak to reflect some of the findings from this work (over use of term 'de-identified'). | ||
- Concerns about methodology, findings or resources that would limit you adopting them? | ||
- What do you think about the balance between transparency and accessibility? | ||
- What other topics related to TREs would benefit from PPIE? | ||
|
||
## Summary | ||
|
||
Presentation then discussion with positive feedback. | ||
There will be an animation that can be voiced over by different TREs with their specifics, accents... | ||
|
||
Concerns about resources: trying to make something for everyone but there will always be gaps |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
35 changes: 35 additions & 0 deletions
35
docs/events/wg_workshops/2024-06-05-june-meeting/keynote-crick-tre.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Keynote: # Crick's TRE: history and approach | ||
|
||
<iframe width="560" height="315" src="https://www.youtube.com/embed/1FqVEP0OVlY?si=9OoPOnnTe90sAvv6" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | ||
|
||
## Q&A | ||
|
||
- How cloud agnostic is it? | ||
- Azure, AWS and Azure not beyond that (limited to what Snowflake supports) | ||
- How much of what we saw is in operation? Everything, working for a year and a half | ||
- focus of the process on legal and agreement aspects via forms | ||
- Can you explain the metadata side of the platform, EG what software is being used? | ||
- Snowflake offers standard data model for all objects in all accounts | ||
- Projects are created in Snowflake sub-accounts | ||
- Data brought out of Snowflake, collated, and replicated as one outside of Snowflake | ||
- You have a lot of integration across providers, IBM, microsoft and AWS. How have you overcome interoperability challenges? | ||
- lot's of these integrations are at a data level, which is easier than a functionality level for those tools | ||
- Do you ever act as data processors on behalf of data providers or is this effectively a data hosting service (where the data comes processed by the data providers/controllers)? | ||
- Both, if you are a data processor on your own right you could bring your own "bedroom" and retain control of it | ||
- Any idea of cost | ||
- https://www.snowflake.com/en/data-cloud/pricing-options/ | ||
- How are you managing audit and compliance across the three cloud platforms? | ||
- Snowflake does it! Objects are created by snowflake on the three clouds, ensuring compliance | ||
- Am I right in thinking you only host consented data? | ||
- How do users see what they are spending in the platform, or how many of their credits they have used? | ||
- Automatic threshold detector for every processing compute, message sent depending on threshold (e.g 75%) | ||
- Who are the Roles (Loader, Processor, etc.) assigned to: study team members, central services, IG specialists? | ||
- They are assigned by the accountable owners (by authorising emails or actions on servicenow) of each collaborating partner or the board for the collaboration account. So a human can have many roles if approved by their organisation or the project. | ||
- Can you provide a quick overview of Snowflake and the core features it provides | ||
- Their website is well documented: I would try these links.... | ||
- https://docs.snowflake.com/en/user-guide/organizations | ||
- https://docs.snowflake.com/en/user-guide/admin-account-identifier | ||
- https://docs.snowflake.com/en/guides-overview-sharing#label-about-direct-share | ||
- https://other-docs.snowflake.com/en/collaboration/provider-listings-auto-fulfillment | ||
- https://other-docs.snowflake.com/en/collaboration/collaboration-listings-about | ||
- https://docs.snowflake.com/en/sql-reference/commands-user-role |
45 changes: 31 additions & 14 deletions
45
docs/events/wg_workshops/2024-06-05-june-meeting/workshop-data-processing-tools.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,43 @@ | ||
# Data processing tools | ||
|
||
Workshop | ||
# Workshop: Data processing tools | ||
|
||
**Leads**: James Friel (University of Dundee), Aida Sanchez (UCL) | ||
|
||
## Proposal | ||
Discussion around data processing, de-identification, and cohort building. | ||
|
||
## Required preparation | ||
|
||
A general understanding of data anonymisation. | ||
The ICO anonymisation guidance & the ADF (anonymisation decision making framework) may be of interest as a grounding in this. | ||
|
||
## Target audience | ||
|
||
People who work in data de-identification and data providers for TREs | ||
|
||
### Prompts | ||
## Prompts | ||
|
||
- Risk appetite to deposit data in a TRE - What level on de-identification is comfortable for use within a TRE? e.g truncation, pseudo-anonymization | ||
- Risk appetite to deposit data in a TRE - What level of de-identification is comfortable for use within a TRE? e.g truncation, pseudo-anonymization | ||
- What do current data processing pipelines look like? And are their pain points in the process? | ||
- What De-identification tools are being used? What has worked? What hasn't? | ||
- What de-identification tools are being used? What has worked? What hasn't? | ||
|
||
### Summary | ||
## Notes | ||
|
||
Intro discussion around data processing, de-identification , and cohort building. | ||
CPRD Clinical Practice Research Datalink | ||
|
||
### Preparation | ||
- https://www.cprd.com/cprd-tre-features-guide-users | ||
|
||
A general understanding of data anonymisation. | ||
The ICO anonymisation guidance & the ADF (anonymisation decision making framework) may be of interest as a grounding in this. | ||
Canon: have non-opensource tools (DICOM, FHIR, CSV, Free Text, 'omics, Pathology) | ||
|
||
- Only available via agreement with https://research.eu.medical.canon/ | ||
|
||
NetCDF, ArcGIS Enterprise, 100+TB data, SPARK to process data | ||
|
||
- Provide data to federated TREs | ||
|
||
Plans for using OpenShift. Possible batch schedulers: | ||
|
||
- https://www.coreweave.com/blog/sunk-slurm-on-kubernetes-implementations | ||
- https://kueue.sigs.k8s.io/ | ||
|
||
### Target audience | ||
## Summary | ||
|
||
People who work in data de-identification and data providers for TRES | ||
General discussion of approaches and tools used |
Oops, something went wrong.