Skip to content

Commit

Permalink
Merge pull request #28 from uk-tre/8-september-report
Browse files Browse the repository at this point in the history
Initial report commit
  • Loading branch information
Davsarper authored Jan 30, 2024
2 parents 8b00a7e + f8ee83f commit 212369f
Show file tree
Hide file tree
Showing 17 changed files with 1,214 additions and 139 deletions.
6 changes: 6 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@
}

# -- Link checker configuration

linkcheck_ignore = [
# GitHub CI linkchecker seems to be blocked
r"https://www.turing.ac.uk/.*"
]

# These pages use in-page JavaScript anchors which aren't seen by the link checker
linkcheck_anchors_ignore_for_url = [
r"https://www\.swansea\.ac\.uk/the-university/location/",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ _Chair: Will Crocombe (RISG Consulting)_
- 3 - weak pseudo
- 4 - public
- Dropping down tiers, things become easier. Turing paper on this - Sheffield used this as the basis of their system for assessing risk.
- https://zenodo.org/record/7754459
- [Alan Turing Institute paper](https://arxiv.org/pdf/1908.08737.pdf)
- Importance of agreed risk classification with federation, and agreement on risk appetite
- [NIST RMF](https://csrc.nist.gov/projects/risk-management/about-rmf)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Current state of the art re data linkage/federation/AI&ML&LLM across infrastructures: federation, governance, safe output methods

## Overview

### Summary

Issues about federation of datasets were discussed, including identifying different datasets across multiple systems, how to collect identifiable information robustly, and how we can link up different approaches across the 4 nations effectively.

There was further discussion on how to effectively check ML models within TREs.

In the case of governance, it was suggested that a project working across multiple TREs should have one singular governance process.

### Next steps

- Create a 'panel' focused on specific type of data/research (e.g. health, crime, financial) who can oversee specific research projects within these fields

## Raw notes

### Data Linkage

#### How do you go about the NHS Number?

- Uses NHS Standard NF5, after 3 they went to manual to track through the system.
- Issues with health and non-health data

#### Names such as Dave / David can cause problems.

- Linksmart is a solution for this.
- Collecting Crime Data

#### Scotland's Approach

- a national ID number

### Federation between datasets

- Identifying with confidence across TREs is important
- Problem: Linking health with something else is problematic to match up and link it with addresses and names
- Separation functions
- Person has all the identifying information, but they do not have the data
- TREs communications between each other need specific criteria, Scotland has 5 TREs
- Having more than two, and introducing a central one is a possibility
- Issues with identifying A-B data sets across multiple systems
- Seeding Death Data -- David and Debra Smith: D. Smith & D. Smith causes gender incompatibility issues
- National Drug Treatment Data -- At source they only collected initials 'D.S.', Gender and MM/YYYY of DOB. Deidentifying can cause linking problems. Education to non-education where they don't have their common 'number' -- how confident can we be that Participant A is the same participant in another TRE? If you're not sharing names & addresses
- Bringing in NHS data and also pseudo anonymise it -- how can you work with it without a key?
- Once you got a data linkage -- bringing the different data types into a data set (TRE). E.g. Linking mental health data and shopping data, if you anonymise that and have their own key -- they can do it anonymously for external sources
- Education data between England, Scotland and Wales might use different notations
- Residential Data can be used as a key
- 'E-child' trying to link the NHS with the Department of Education

### AI & ML

- People misunderstand the terms AI & ML with 'Statistical Modeling'
- Based on risk factors you can determine 70% precision pre-diabetic chance
- Accessing 'clinical like data' with similar terminology to mimic clinic systems
- AI -- Offline AI: you can have an offline machine learning model -- yes
- Would multiple AIs learn the same thing on same data sets? -- no
- You can make it work with a shared API though (Stroke Predicition)
- APRs -- 8-9 expensive centre
- Different type of interpretation of ML, ML data on health 'takes your job', ML data on other scenarios might be socially acceptable
- Pattern finding models are popular and precise, this is lacking in statistical modeling
- At the end of the day, medical data ML is not understood why it gives that result
- Checking models are problematic and difficult, unsure results and unsure contents of the model begs the question of the model's authenticity

### Governance

- Process is repeated a lot, no committee talks to each other and are a separate entity
- Cannot start work unless approved
- Doing a project between TREs, each TRE will have an approval process, ideally a multi TRE Project requires a single approval process, this decision should be approved across the other one

#### What would a solution to this problem look like?

- Current state of the art is the overarching question -- needs a TRE panel to decide what is state of the art
- Single 'panel' on a specialty (e.g. health, crime) who deal with specific projects, additionally members of the national TRE supervision
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Cloud vs on-prem TREs: costs, constraints, pros & cons

## Overview

### Summary

The main decision drivers are security and cost.
Cloud is more flexible for projects with different funding sources and does not require an expensive data centre for research institutions but does not offer the highest levels of security.

A potential solution is a hybrid model where you get a cloud-like infrastructure on an on-prem compute.

Cloud provision via Jisc (as oppose to direct with the cloud provider) can be cheaper and it also handles SSO: https://www.jisc.ac.uk/forms/uk-access-management-federation-sign-up#
Resources: Google RADLab: https://cloud.google.com/blog/topics/public-sector/googles-new-rad-lab-solution-helps-spin-cloud-projects-quickly-and-compliantly

### Next steps

- Develop a roadmap plan for a hybrid, cloud-agnostic model

## Raw Notes

- Compute capacity/ data centres for advanced ML projects is expensive for research institutions
- Credits make it easier to use cloud for projects with different funding sources
- Could a good solution be a hybrid model where you get a cloud-like infrastructure on an on-prem compute
- So could be completely disconnected from internet for high security
- Google have set something like this up at Sanger
- Factors determining on-prem vs cloud
- security
- cost
- Cloud provision via Jisc (as oppose to direct with the cloud provider) can be cheaper and it also handles SSO: https://www.jisc.ac.uk/forms/uk-access-management-federation-sign-up#
- Resources: Google RADLab: https://cloud.google.com/blog/topics/public-sector/googles-new-rad-lab-solution-helps-spin-cloud-projects-quickly-and-compliantly

### Roadmap plan

#### Questions

- What would a solution to this problem look like?
- What resources would be needed (people, time, funds, infrastructure etc.)?
- How can this community support you in getting them?
- What working groups/orgs are already working on this, if any? How can we collaborate with them effectively?

#### Notes

- hybrid model (see above)
- Solution that is cloud-agnostic and could also run on on-prem hardware
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Governance of the UK TRE Community

## Overview

### Summary

The discussion centred about the purpose and governance of the community, trying to reach a balance between conveyors but still provide enough content and direction not to be an “empty” place.

Universal selling point of UK-TRE: Diversity of audience, and pragmatism: people that are doing something.
Danger of just listening is you don’t share your existing knowledge of what will/won’t work.

Should we put out position statements? Say things if you don’t like something? The community should reach a point where what we say is respected.
More powerful than individual submissions.

What should UK-TRE do?
Be careful not to become just a bureaucratic institution that has some funding, people, writes reports.

Maybe a network that feeds up to DARE/HDR/ADR?
USP would be it’s practical, diverse, not duplicative, ideal audience for people at top to bounce ideas off.
Proper focus groups would be much more expensive.

Some funding for the community to organise meetings like this is needed.

### Next steps

- Secure funding for person time for the community
- Establish a steering group for the community

## Raw Notes

- UK-TRE: Aims, purposes, should it take on a political/advocacy role?
- NHS: already have their plans for Governance
- but looking promising so far
- Datapact: Part of Data saves lives policy
- Not policy, but saying how NHS will treat your data
- Don't want to force too much information on public: they'll think you're trying to hide something
- Public engagement: not just telling them what will happen, instead enable citizens to make policy decisions
- Interest in academia about what to do, waiting for NHS to give guidance
- UK-TRE should we lead, not just follow NHS
- Lead, provide input
- TREs are for much more than just healthcare data which NHS focusses on
- Universal selling point of UK-TRE: Diversity of audience, and pragmatism: people that are doing something
- Danger of just listening is you don't share your existing knowledge of what will/won't work
- Should put out position statements? Say things if you don't like something? The community should reach a point where what we say is respected. More powerful than individual submissions.
- Industry groups such as ABPI, BIO
- Provide inputs, write reports, represent a community and a voice
- Organisations need to sign up to show support
- Sign-up to UK-TRE? Or to position statements created by UK-TRE?
- E.g. IET (engineering professional institution) members can say what they're interested in on their profile. IET may respond to a Government consultation by asking members for input, and collating responses.
- Working groups/focus areas
- Needs resource/funding
- Does UKRI have something?
- Beyond UKRI, commercial?
- GA4GH:
- multiple levels of slices of funding
- 100s of organisations across 80 countries
- What should UK-TRE do?
- Be careful not to become just a bureaucratic institution that has some funding, people, writes reports.
- Balance
- Maybe a network that feeds up to DARE/HDR/ADR?
- USP would be it's practical, diverse, not duplicative, ideal audience for people at top to bounce ideas off
- Proper focus groups would be much more expensive
- Some funding for community to organise meetings like this

### Roadmap plan

#### Questions

- What would a solution to this problem look like?
- Ensure meetings remain attractive, not too officious
- Lightning talks good, reduces duplication
- Networking opportunities
- Long lunch
- People willing to invest time to travel
- "Stir people up and let them go"
- Beach! 🏖️
- No different from what we've got now
- More recognisable branding
- A home? What does "home" mean?
- A formal recognisable figurehead
- What resources would be needed (people, time, funds, infrastructure etc.)?
- Funding for someone to be a formal chair of UK-TRE
- Neutral funding for someone to run community, not funded directly by a single institution
- Maybe multiple people? E.g. coordinator, chair, community manager (junior/senior?), technical?
- Elected chairs to propose direction/funding? Probably too much.
- Instead have a steering committee
- How can this community support you in getting them?
- What working groups/orgs are already working on this, if any? How can we collaborate with them effectively?
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Addressing data harmonisation between different datasets: do TREs have a role?

## Raw notes

### Handwritten notes

Transcripted by CMWG team

Data+Analysis=Timely Processing

- Harmonized/OMOPed
- TRE governanced barriers
- Reliability-validated?
- TRE role:cross project share
- DMOPin data sources & adding TRE Specific terms into main repositories
- Mapping tools
- TREs can delegate (CoConnect)
- Discovery
- Feasability
- Clinical input
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Multi-TRE analysis: challenges, governance requirements, federation

## Overview

### Summary

For multi-TRE analysis to work, there needs to be trust between TREs.
This first relies on a shared understanding of what exactly a TRE is and needs to do, and will thrive when SOPs, accreditations and governance methodologies are shared. This would also benefit from shared understandings and laguage around architecture, sensitivity tiering and more.

The more different TREs there are, the more risk of variability and bad practice, which can affect an entire system of federation.
Public trust is a large concern, and concerted effort will need to be made to ensure the public buys into any federated system of TREs.

### Next steps

Next steps focused on the short, medium and long term:

- **Short term**: Define what a TRE is with a PEST framework, a TRE Maturity Model, common language for sensitivity Tiers
- **Medium term**: Review archiectures for working between TREs, identify key roles and responsibilties in a federated landscape
- **Long term**: Focus on PPIE and public perception, how data is held and managed

## Raw notes

Take home message: Its not about the Technology. In fact the more TREs technically enabled the more risk that the TREs are not fit for purpose for true operation and not trusted for federation. Process and Responsibility => Trust

### Roadmap and Next Steps

**Short term**: understanding what we have

- Define what is a TRE, wrt to **multiple** TREs within a PEST framework that highlights issues that are not just technical, for example includes the diversity of TRE models, the business models of TREs, where risk, responsibility and accountability lay, and includes certifiable PROCESS as a core pillar (shared SOPs). Multi-TREs require new Processes.
- Define a TRE Maturity Model that builds on above to develop a more objective model of TRUST, RISK and RESPONSIBILITY for inter-TRE data exchange. Could be used to assess, compare, and facilitate trust between TREs.
- A common language scale for the ‘tiers’ of TREs suitable for different levels of inter-TRE sensitivity.
- Identify and clarify PEST bottlenecks with examples

**Medium Term**: shifting to newer ways

- Review different architectures and processes for working between TREs
- What would be just enough with what we already have (e.g. 5SROCrate as m-TRE middleware using current processes)
- What m-TRE processes would we need to introduce
- The role of trusted intermediaries (brokers, federated analytics services) to take on risk and responsibility and reposition the Data Sharing Agreements. e.g global identity services linking identities and records, who takes responsibility?

**Long Term**: radical shift

- PPIE education outside the PPIE self selecting bubble to counter mistrust of government and conspiracy theory
- Expectation that data is owned by the NHS?
- Rethink of data holdings and services from Data Warehouses to Data Fabrics.

### Notes

- What is a TRE ?
- Are they always repositories for single datasets, popup TRE?
- Not always - many of the environments have multiple users and projects on top of the core dataset, through project-based access through VMs/virtual desktops.
- There is also a requirement for high performance computing for some datatypes (GPU for AI/imaging, workflows etc)
- Do we need federation? Can we avoid multiple TREs?
- Governance requirements vary between data classes - you may need TREs to meet each governance requirements.
- But each TRE is expensive to run, especially assurance, governance, data egress control.
- How do TREs know they can trust each other?
- When workflows have to be shared between environments, it is easier to share between those with similar accreditations - e.g ISO27001.
- Federating TREs requires interoperability at the process level, shared SOPs etc.
- There will be many TREs built from the technological parts - but if there are poorly run ones, they will damage the whole 'brand' and impact on all TRE operators. More TREs, more risk.
- A 'maturity model' could be used to assess, compare, and facilitate trust between TREs.
- Legal obligations on indvidual TRE providers act as a strong constraint on data sharing; but a common list of questions might help.
- Can we develop a new brokered distributed/federated anaytics model?
- We need a new model to allow this.
- TRE-FX type solutions need to be driven by TREs.
- Need a common language scale for the 'tiers' of TREs suitable for different levels of sensitivity.
- People need to query across datasets - there are few cases where you can answer the research question without linking identities and records.
- But a global identity connecting service would be a huge responsibility.
- How do we carry the public with us?
- Estonia have an opt-out system for health records, opt-in for genomics data; but when public confidence drops, opt-outs increase.
- Public perception of risk is a problem.
- In COVID, people were happy to share data.
- Even trust in NHS is not universal now...
- Education outside the TRE 'bubble' to counteract conspiracy theories etc.
- Do trusted data fabrics offer a different view?
- Networks of secure data services based on Enterprise data models.
Loading

0 comments on commit 212369f

Please sign in to comment.