Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Davsarper <[email protected]>
  • Loading branch information
manics and Davsarper authored Jan 25, 2024
1 parent 2961b78 commit 9552177
Show file tree
Hide file tree
Showing 12 changed files with 148 additions and 190 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@ Issues about federation of datasets were discussed, including identifying differ

There was further discussion on how to effectively check ML models within TREs.

In the case of governance, it was suggested that a project working across multiple TREs should have one singular goevrnance process.
In the case of governance, it was suggested that a project working across multiple TREs should have one singular governance process.

### Next steps

- Create a 'panel' focused on specific type of data/reseearch (e.g. health, crime, financial) who can oversee specific research projects within these fields
- Create a 'panel' focused on specific type of data/research (e.g. health, crime, financial) who can oversee specific research projects within these fields

## Raw notes

### Data Linkage

#### How do you go about the NHS Number?

- Uses stanards NHS Standard NF5, after 3 they went to manual to track through the system.
- Uses NHS Standard NF5, after 3 they went to manual to track through the system.
- Issues with health and non-health data

#### Names such as Dave / David can cause problems.
Expand All @@ -44,18 +44,18 @@ In the case of governance, it was suggested that a project working across multip

### Federation between datasets

- Identifiying with confidence across TREs is important
- Identifying with confidence across TREs is important
- Problem: Linking health with something else is problematic to match up and link it with addresses and names
- Separation functions
- Person has all the identifying informations, but they do not have the data
- Person has all the identifying information, but they do not have the data
- TREs communications between each other need specific criteria, Scotland has 5 TREs
- Having more than two, and introducing a central one is a possibility
- Issues with identifying A-B data sets across multiple systems
- Seeding Death Data -- David and Debra Smith: D. Smith & D. Smith causes gender incompatibility issues
- National Drug Treatment Data -- At source they only collected initials 'D.S.', Gender and MM/YYYY of DOB. Deidentifying can cause linking problems. Education to non-education where they don't have their common 'number' -- how confident can we be that Participant A is the same participant in another TRE? If you're not sharing names & addresses
- Bringing in NHS data and audo pseudo anonymise it -- how can you work with it without a key?
- Bringing in NHS data and also pseudo anonymise it -- how can you work with it without a key?
- Once you got a data linkage -- bringing the different data types into a data set (TRE). E.g. Linking mental health data and shopping data, if you anonymise that and have their own key -- they can do it anonymously for external sources
- Education data between England, Scotland and Wales might use different notiations
- Education data between England, Scotland and Wales might use different notations
- Residential Data can be used as a key
- 'E-child' trying to link the NHS with the Department of Education

Expand All @@ -68,7 +68,7 @@ In the case of governance, it was suggested that a project working across multip
- Would multiple AIs learn the same thing on same data sets? -- no
- You can make it work with a shared API though (Stroke Predicition)
- APRs -- 8-9 expensive centre
- different type of interperetation of ML, ML data on health 'takes your job', ML data on other scenarions might be socially acceptable
- Different type of interpretation of ML, ML data on health 'takes your job', ML data on other scenarios might be socially acceptable
- Pattern finding models are popular and precise, this is lacking in statistical modeling
- At the end of the day, medical data ML is not understood why it gives that result
- Checking models are problematic and difficult, unsure results and unsure contents of the model begs the question of the model's authenticity
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
### Summary

The main decision drivers are security and cost.
Cloud is more flexible for projects with different funding sources and does not requires an expensive data centre for research institutions but does not offer the highest levels of security.
Cloud is more flexible for projects with different funding sources and does not require an expensive data centre for research institutions but does not offer the highest levels of security.

A potential solution is a hybrid model where you get a cloud-like infrastructure on an on-prem compute.

Expand All @@ -26,7 +26,7 @@ Resources: Google RADLab: https://cloud.google.com/blog/topics/public-sector/goo

## Raw Notes

- Compute capacity/ data centres for advanced ML projects is expensive for research institusions
- Compute capacity/ data centres for advanced ML projects is expensive for research institutions
- Credits make it easier to use cloud for projects with different funding sources
- Could a good solution be a hybrid model where you get a cloud-like infrastructure on an on-prem compute
- So could be completely disconnected from internet for high security
Expand All @@ -37,15 +37,6 @@ Resources: Google RADLab: https://cloud.google.com/blog/topics/public-sector/goo
- Cloud provision via Jisc (as oppose to direct with the cloud provider) can be cheaper and it also handles SSO: https://www.jisc.ac.uk/forms/uk-access-management-federation-sign-up#
- Resources: Google RADLab: https://cloud.google.com/blog/topics/public-sector/googles-new-rad-lab-solution-helps-spin-cloud-projects-quickly-and-compliantly

![image](https://hackmd.io/_uploads/B1pxknNHT.png)

![20231003_191704](https://hackmd.io/_uploads/HynV9oNHa.jpg)

![image](https://hackmd.io/_uploads/rktMJh4rT.png)

![image](https://hackmd.io/_uploads/SJQpRjEr6.png)

![image](https://hackmd.io/_uploads/Byfu5s4Ba.png)

## Roadmap plan

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Some funding for the community to organise meetings like this is needed.

## Raw Notes

- UK-TRE: Aims, purposes, should it take on a polictical/advocacy role?
- UK-TRE: Aims, purposes, should it take on a political/advocacy role?
- NHS: already have their plans for Governance
- but looking promising so far
- Datapact: Part of Data saves lives policy
Expand Down Expand Up @@ -71,7 +71,6 @@ Some funding for the community to organise meetings like this is needed.
- USP would be it's practical, diverse, not duplicative, ideal audience for people at top to bounce ideas off
- Proper focus groups would be much more expensive
- Some funding for community to organise meetings like this
-

## Roadmap plan

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@

### Handwritten notes

![image](https://hackmd.io/_uploads/SyGfhoNra.png)
Transcription of the above by David:
Transcripted by CMWG team

Data+Analysis=Timely Processing

Expand All @@ -29,7 +28,7 @@ Data+Analysis=Timely Processing
- TRE role:cross project share
- DMOPin data sources & adding TRE Specific terms into main repositories
- Mapping tools
- TREs can delegate (Coconnect
- TREs can delegate (CoConnect)
- Discovery
- Feasability)
- Feasability
- Clinical input
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

### Summary

For multi-TRE amalysis to work, there needs to be trust between TREs.
For multi-TRE analysis to work, there needs to be trust between TREs.
This first relies on a shared understanding of what exactly a TRE is and needs to do, and will thrive when SOPs, accreditations and governance methodologies are shared. This would also benefit from shared understandings and laguage around architecture, sensitivity tiering and more.

The more different TREs there are, the more risk of variability and bad practice, which can affect an entire system of federation.
Expand All @@ -21,7 +21,7 @@ Public trust is a large concern, and concerted effort will need to be made to en
Next steps focused on the short, medium and long term:

- **Short term**: Define what a TRE is with a PEST framework, a TRE Maturity Model, common language for sensitivity Tiers
- **Medium term**: Review archiectures for working between TREs, identify key roles and resonsibilties in a federated landscape
- **Medium term**: Review archiectures for working between TREs, identify key roles and responsibilties in a federated landscape
- **Long term**: Focus on PPIE and public perception, how data is held and managed

## Raw notes
Expand All @@ -31,49 +31,48 @@ Take home message: Its not about the Technology. In fact the more TREs technical
- Roadmap and Next Steps

**Short term**: understanding what we have
-- define what is a TRE, wrt to **multiple** TREs within a PEST framework that highlights issues that are not just technical, for example includes the diversity of TRE models, the business models of TREs, where risk, responsibility and accountability lay, and includes certifiable PROCESS as a core pillar (shared SOPs). Multi-TREs require new Processes.
-- define a TRE Maturity Model that builds on above to develop a more objective model of TRUST, RISK and RESPONSIBILITY for inter-TRE data exchange. Could be used to assess, compare, and facilitate trust between TREs.
-- a common language scale for the ‘tiers’ of TREs suitable for different levels of inter-TRE sensitivity.
-- identify and clarify PEST bottlenecks with examples
- Define what is a TRE, wrt to **multiple** TREs within a PEST framework that highlights issues that are not just technical, for example includes the diversity of TRE models, the business models of TREs, where risk, responsibility and accountability lay, and includes certifiable PROCESS as a core pillar (shared SOPs). Multi-TREs require new Processes.
- Define a TRE Maturity Model that builds on above to develop a more objective model of TRUST, RISK and RESPONSIBILITY for inter-TRE data exchange. Could be used to assess, compare, and facilitate trust between TREs.
- A common language scale for the ‘tiers’ of TREs suitable for different levels of inter-TRE sensitivity.
- Identify and clarify PEST bottlenecks with examples

**Medium Term**: shifting to newer ways
-- review different architectures and processes for working between TREs
-- what would be just enough with what we already have (e.g. 5SROCrate as m-TRE middleware using current processes)
-- what m-TRE processes would we need to introduce
-- the role of trusted intermediaries (brokers, federated analytics services) to take on risk and responsibility and reposition the Data Sharing Agreements.
–- e.g global identity services linking identities and records, who takes responsibility?
- Review different architectures and processes for working between TREs
- What would be just enough with what we already have (e.g. 5SROCrate as m-TRE middleware using current processes)
- What m-TRE processes would we need to introduce
- The role of trusted intermediaries (brokers, federated analytics services) to take on risk and responsibility and reposition the Data Sharing Agreements. e.g global identity services linking identities and records, who takes responsibility?

**Long Term**: radical shift
-- PPIE education outside the PPIE self selecting bubble to counter mistrust of government and conspiracy theory
-- Expectation that data is owned by the NHS?
-- rethink of data holdings and services from Data Warehouses to Data Fabrics.
- PPIE education outside the PPIE self selecting bubble to counter mistrust of government and conspiracy theory
- Expectation that data is owned by the NHS?
- Rethink of data holdings and services from Data Warehouses to Data Fabrics.

**Notes**

- What is a TRE ?
-- Are they always repositories for single datasets, popup TRE?
-- Not always - many of the environments have multiple users and projects on top of the core dataset, through project-based access through VMs/virtual desktops.
-- There is also a requirement for high performance computing for some datatypes (GPU for AI/imaging, workflows etc)
- Are they always repositories for single datasets, popup TRE?
- Not always - many of the environments have multiple users and projects on top of the core dataset, through project-based access through VMs/virtual desktops.
- There is also a requirement for high performance computing for some datatypes (GPU for AI/imaging, workflows etc)
- Do we need federation? Can we avoid multiple TREs?
-- Governance requirements vary between data classes - you may need TREs to meet each governance requirements.
-- But each TRE is expensive to run, especially assurance, governance, data egress control.
- Governance requirements vary between data classes - you may need TREs to meet each governance requirements.
- But each TRE is expensive to run, especially assurance, governance, data egress control.
- How do TREs know they can trust each other?
-- When workflows have to be shared between environments, it is easier to share between those with similar accreditations - e.g ISO27001.
-- Federating TREs requires interoperability at the process level, shared SOPs etc.
-- There will be many TREs built from the technological parts - but if there are poorly run ones, they will damage the whole 'brand' and impact on all TRE operators. More TREs, more risk.
-- A 'maturity model' could be used to assess, compare, and facilitate trust between TREs.
-- Legal obligations on indvidual TRE providers act as a strong constraint on data sharing; but a common list of questions might help.
- When workflows have to be shared between environments, it is easier to share between those with similar accreditations - e.g ISO27001.
- Federating TREs requires interoperability at the process level, shared SOPs etc.
- There will be many TREs built from the technological parts - but if there are poorly run ones, they will damage the whole 'brand' and impact on all TRE operators. More TREs, more risk.
- A 'maturity model' could be used to assess, compare, and facilitate trust between TREs.
- Legal obligations on indvidual TRE providers act as a strong constraint on data sharing; but a common list of questions might help.
- Can we develop a new brokered distributed/federated anaytics model?
-- We need a new model to allow this.
-- TRE-FX type solutions need to be driven by TREs.
-- Need a common language scale for the 'tiers' of TREs suitable for different levels of sensitivity.
-- People need to query across datasets - there are few cases where you can answer the research question without linking identities and records.
-- But a global identity connecting service would be a huge responsibility.
- We need a new model to allow this.
- TRE-FX type solutions need to be driven by TREs.
- Need a common language scale for the 'tiers' of TREs suitable for different levels of sensitivity.
- People need to query across datasets - there are few cases where you can answer the research question without linking identities and records.
- But a global identity connecting service would be a huge responsibility.
- How do we carry the public with us?
-- Estonia have an opt-out system for health records, opt-in for genomics data; but when public confidence drops, opt-outs increase.
-- Public perception of risk is a problem.
-- In COVID, people were happy to share data.
-- Even trust in NHS is not universal now...
-- Education outside the TRE 'bubble' to counteract conspiracy theories etc.
- Estonia have an opt-out system for health records, opt-in for genomics data; but when public confidence drops, opt-outs increase.
- Public perception of risk is a problem.
- In COVID, people were happy to share data.
- Even trust in NHS is not universal now...
- Education outside the TRE 'bubble' to counteract conspiracy theories etc.
- Do trusted data fabrics offer a different view?
-- Networks of secure data services based on Enterprise data models.
- Networks of secure data services based on Enterprise data models.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This discussion made evident the multiplicity of current efforts and the difficu
The discussion identified many potential areas of work and collaboration, the use of NHS data held in SDEs via specific TREs can expand the utility of this data but requires a lot of coordination, not only between independent TREs but also across regions and institutions.
Challenges arise on how to make this coordination and alignment effective, reconcile different interests (commercial and public) and ensure public and clinicians' trust.

The role of HDRUK and the UK TRE Community is seeing as a positive influence.
The role of HDRUK and the UK TRE Community is seen as a positive influence.

### Next steps

Expand Down Expand Up @@ -89,17 +89,17 @@ The role of HDRUK and the UK TRE Community is seeing as a positive influence.
Potential areas

1. Building upon successes; learnings of TREs for SDEs
2. Remit of NHS and other data soruces for SDEs
2. Remit of NHS and other data sources for SDEs
3. What it means to continue to manage Independent TREs (and to build solutions for TREs)
4. Addressing data silos as well as cultural components to evolve TREs/SDEs
5. Data exhange opportunities and the potential for increased NHS data in existing TREs
6. Guiding principles for integration across TREs
7. National (English) GP data ccess oportunity
8. Cohesive strategies/approaches with SEDs launches
9. POtential roles for other stakeholders and access to open source assets
7. National (English) GP data access oportunity
8. Cohesive strategies/approaches with SDEs launches
9. Potential roles for other stakeholders and access to open source assets
10. Aligning to standards/protocols for data access, designing commercial & sustainability cases
11. Outstanding questions for data controllers vs data providers
12. SDEs: variety of build, buy supplier decissions underway
12. SDEs: variety of build, buy supplier decisions underway
13. SDE potential limitations: reconciling differences across regions based on decentralized protocols (eg. neutral grant for data); governance challenges-risk of redundancies, inefficiencies, slow decision making; need for coordinated mechanism for cross-SDF initiatives
14. Big question about incentives: collaborative vs competitive mindest. Clarity on who is tackling what, and how to coordinate effectively. difficulty introducing several new entitites at once, at pace
15. Dual focus on local stakeholder engagement/approval alongside central policy development. Specific focus on touchpoints (eg. de-ID then re-ID) and how to manage at scale- a la section 251
Expand All @@ -123,7 +123,7 @@ Roadmap
Workpackages/What would be helpful:

1. Clear vision/value story on why TRE+SNSDE add/evolves
2. One apger on key protocols, ways of working + frameworks to strenghten consistency of messaging
2. One pager on key protocols, ways of working + frameworks to strengthen consistency of messaging
3. Alignment of related data programmes (eg R+D vs FDP)
4. Community of practice + shared assets/lessons/insights so SDEs build on TRE success to date
5. User (eg. researcher) assets: needs, goals, decisions, pain points, requirements
Expand All @@ -137,8 +137,3 @@ D. SNSDEs
E. Local researchers
F. Common entities/stakeholders in health data space

![image](https://hackmd.io/_uploads/HyoIi2ES6.png)

![image](https://hackmd.io/_uploads/ryacinVBT.png)

![image](https://hackmd.io/_uploads/rkvFohEHa.png)
Loading

0 comments on commit 9552177

Please sign in to comment.