Skip to content

Commit

Permalink
Add internal links
Browse files Browse the repository at this point in the history
  • Loading branch information
woodthom2 committed Jun 21, 2024
1 parent 3b2a603 commit 7a08df0
Show file tree
Hide file tree
Showing 27 changed files with 73 additions and 51 deletions.
2 changes: 1 addition & 1 deletion content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ blocks:
copy: Really useful! Would have been a great tool and saved me a lot of time when I was trying to externally validate my risk prediction model in two cohorts.
author:
name: Researcher at Kings College London
title: on using Harmony the first time
title: on using Harmony the first times
image: /images/testimonial-user.svg

- block: feature-2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,21 @@ aliases: "/blog/back-to-the-future-retrospectively-harmonizing-questionnaire-dat

Now more than ever, the international research community are keen to determine whether their findings replicate across different contexts. For instance, if a researcher discovers a potentially important association between two variables, they may wish to see whether this association is present in other populations (e.g. different countries, or different generations). In an ideal world, this would be achieved by conducting follow-up studies that are harmonized by design. In other words, the exact same methodologies and measures would be used in a new sample, in order to determine whether the findings can be replicated. Such direct replication is often challenging however, with research funders often preferring novel lines of inquiry.

As an alternative to direct replication, researchers may choose to reach out to others in the field who either have access to, or are in the process of collecting, comparable data. Indeed many researchers, particularly those in the life and social sciences, routinely make use of large, ongoing studies that collect a variety of data for multiple purposes (e.g. longitudinal population studies). In practice however, much of our research is designed and carried out in silos – with different research groups tackling similar research questions using widely different designs and measures. Even if a researcher is successful in identifying data that are similar to their original work, minor differences in the design or measures may limit the comparability. What are researchers to do in such situations?
As an alternative to direct replication, researchers may choose to reach out to others in the field who either have access to, or are in the process of collecting, comparable data. Indeed many researchers, particularly those in the life and social sciences, routinely make use of large, ongoing studies that collect a variety of data for multiple purposes (e.g. [longitudinal](/item-harmonisation/harmony-a-free-ai-tool-to-merge-longitudinal-studies) population studies). In practice however, much of our research is designed and carried out in silos – with different research groups tackling similar research questions using widely different designs and measures. Even if a researcher is successful in identifying data that are similar to their original work, minor differences in the design or measures may limit the comparability. What are researchers to do in such situations?

One increasingly popular option is retrospective harmonization. This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of education and depression, and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonization can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
One increasingly popular option is retrospective harmonization. This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation-in-education) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonization can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.

However, not all constructs can be measured with such simple, categorical questions. Take the above outcome variable (depression) for instance. Depression is a complex, heterogeneous experience, characterized by a multitude of symptoms that can be experienced to various degrees and in different combinations. In large-scale surveys, depression is typically measured with standardized questionnaires – participants are asked to report on a range of symptoms, their responses are assigned numerical values, and these are summed to form a “total depression score” for each individual. Although this remains the most viable and plausible strategy for measuring something as complex as depression, there is no “gold standard” questionnaire that is universally adopted by researchers. Instead, there are well over 200 established depression scales. In a [recent review](https://www.closer.ac.uk/wp-content/uploads/210715-Harmonisation-measurement-properties-mental-health-measures-british-cohorts.pdf) (McElroy et al., 2020), we noted that the content of these questionnaires can differ markedly, e.g. different symptoms are assessed, or different response options are used.

How can researchers harmonize such complex measures? One option would be to standardize scores within each data set, thus transforming everyone’s raw score to a rank ordering within their given sample. Although straightforward, this approach has a number of weaknesses. First and foremost, you are assuming that both questionnaires are measuring the same underlying construct, and are measuring it equally well. Second, by standardizing a measure within a [cohort](/item-harmonisation/harmony-a-free-ai-tool-to-merge-cohort-studies), you are removing all information about the mean and standard deviation, making it impossible to compare the average level of a construct across datasets.

An alternative approach is to apply retrospective harmonization at the item-level. Although questionnaires can differ considerably on the number and nature of questions asked, there is often considerable overlap at the [semantic](https://harmonydata.ac.uk/semantic-text-matching-with-deep-learning-transformer-models)/content level. Let’s return to our earlier example of depression. Although there are many different questionnaires that can be used to assess this experience, they often ask the same types of questions. Below is an example of content overlap in two of the most common measures of psychological [distress](https://harmonydata.ac.uk/how-far-can-we-go-with-harmony-testing-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe) used in children, the Revised Children’s Anxiety and Depression Scale (RCADS), and the Mood and Feelings Questionnaire (MFQ).
An alternative approach is to apply retrospective harmonization at the item-level. Although questionnaires can differ considerably on the number and nature of questions asked, there is often considerable overlap at the [semantic](https://harmonydata.ac.uk/semantic-text-matching-with-deep-learning-transformer-models)/content level. Let’s return to our earlier example of depression. Although there are many different questionnaires that can be used to assess this experience, they often ask the same types of questions. Below is an example of content overlap in two of the most common measures of psychological [distress](https://harmonydata.ac.uk/how-far-can-we-go-with-harmony-testing-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe) used in children, the Revised Children’s [Anxiety](/harmonisation-validation/patient-reported-outcome-measure-information-system-promis-anxiety-subscale) and Depression Scale (RCADS), and the Mood and Feelings Questionnaire (MFQ).

{{< image src="images/blog/blog-pic-1.png" alt="img" >}}

By identifying, recoding, and testing the equivalence of subsets of items from different questionnaires (for guidelines see our previous report), researchers can derive harmonized sub-scales that are directly comparable across studies. Our group has previously used this approach to study trends in mental health across different generations (Gondek et al., 2021), and examine how socio-economic deprivation impacted adolescent mental health across different [cohorts](/item-harmonisation/harmony-a-free-ai-tool-for-cross-cohort-research) (McElroy et al., 2022).
By identifying, recoding, and testing the equivalence of subsets of [items](/item-harmonisation/harmony-a-free-ai-tool-for-longitudinal-study-in-psychology) from different questionnaires (for guidelines see our previous report), researchers can derive harmonized sub-scales that are directly comparable across studies. Our group has previously used this approach to study trends in mental health across different generations (Gondek et al., 2021), and examine how socio-economic deprivation impacted adolescent mental health across different [cohorts](/item-harmonisation/harmony-a-free-ai-tool-for-cross-cohort-research) (McElroy et al., 2022).

One of the main challenges to retrospectively harmonizing questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by Wellcome, in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonization process. We plan to test the utility of this tool by using it to harmonize measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.
One of the main challenges to retrospectively harmonizing questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by [Wellcome](/radio-podcast-about-wellcome-data-prize), in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonization process. We plan to test the utility of this tool by using it to harmonize measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.

Follow this blog for updates on our Harmony project!

Expand Down
8 changes: 4 additions & 4 deletions content/en/blog/clinical-trial-research-data-harmonisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ draft: true
# Clinical trial and research data harmonization principles


In the realm of healthcare, clinical trials serve as the bedrock of evidence-based medicine, guiding decisions that affect patient care and public health policies. However, the effectiveness of these trials hinges on the quality and compatibility of the data they generate. The harmonization of clinical research data emerges as a pivotal endeavor in ensuring the integrity and interpretability of trial outcomes. In this blog post, we delve into the principles of clinical data harmonization, exploring its significance in clinical trials and elucidating the strategies.
In the realm of healthcare, clinical trials serve as the bedrock of evidence-based medicine, guiding decisions that affect patient care and public health policies. However, the effectiveness of these trials hinges on the quality and compatibility of the data they generate. The harmonization of clinical research data emerges as a pivotal endeavor in ensuring the integrity and interpretability of trial [outcomes](/harmonisation-validation/medical-outcomes-survey-sleep-quality-subscale). In this blog post, we delve into the principles of clinical data harmonization, exploring its significance in clinical trials and elucidating the strategies.




## Introduction

Clinical trials and research data harmonization principles are indispensable for fostering consistency and coherence in the realm of medical investigations. Through the standardization of data formats and models, such as CDISC's SDTM and ADaM, the integration of diverse datasets from multiple sites becomes seamless, promoting collective analysis. The establishment of common data elements (CDEs) and adherence to metadata standards mitigate discrepancies in data interpretation, enabling cross-study comparisons and meta-analyses.Collaborative initiatives, such as the OHDSI consortium, exemplify the power of collective resources and expertise in harmonizing and analyzing large-scale healthcare data across institutions. Ethical and regulatory considerations remain integral, ensuring data harmonization practices align with privacy regulations and ethical standards. Furthermore, longitudinal data harmonization efforts align data collection methods, time points, and variables, enabling meaningful analysis of trends and treatment effectiveness over time.The application of data harmonization principles in clinical trials and research not only streamlines data management but also upholds ethical standards and facilitates comprehensive analyses crucial for advancing medical knowledge and improving patient outcomes.
Clinical trials and research data harmonization principles are indispensable for fostering consistency and coherence in the realm of medical investigations. Through the standardization of data formats and models, such as CDISC's SDTM and ADaM, the integration of diverse datasets from multiple sites becomes seamless, promoting collective analysis. The establishment of common data elements (CDEs) and adherence to metadata standards mitigate discrepancies in data interpretation, enabling cross-study comparisons and meta-analyses.Collaborative initiatives, such as the OHDSI consortium, exemplify the power of collective resources and expertise in harmonizing and analyzing large-scale healthcare data across institutions. Ethical and regulatory considerations remain integral, ensuring data harmonization practices align with privacy regulations and ethical standards. Furthermore, [longitudinal](/item-harmonisation/harmony-a-free-ai-tool-to-merge-longitudinal-studies) data harmonization efforts align data collection methods, time points, and variables, enabling meaningful analysis of trends and treatment effectiveness over time.The application of data harmonization principles in clinical trials and research not only streamlines data management but also upholds ethical standards and facilitates comprehensive analyses crucial for advancing medical knowledge and improving patient outcomes.


## Understanding Clinical Data Harmonization
Expand All @@ -35,7 +35,7 @@ Clinical data harmonization entails the process of integrating, standardizing, a
Clinical data harmonization plays a pivotal role in advancing clinical research by addressing key challenges associated with data variability and heterogeneity. The importance of clinical data harmonization can be highlighted through various crucial aspects:

1. **Facilitating Interoperability:**
- Clinical trials often involve collaboration among multiple institutions, each employing different data collection systems and formats. Harmonizing data ensures seamless interoperability, enabling efficient integration of diverse datasets. This fosters collaboration, enhances the exchange of information, and reduces obstacles to data sharing, ultimately contributing to a more interconnected research environment.
- Clinical trials often involve collaboration among multiple institutions, each employing different data collection systems and formats. Harmonizing data ensures seamless interoperability, enabling efficient integration of diverse datasets. This fosters collaboration, enhances the exchange of information, and reduces obstacles to data sharing, ultimately [contributing](/contributing-to-harmony) to a more interconnected research environment.

2. **Ensuring Consistency:**
- Standardizing data elements and definitions across trials is essential for ensuring consistency in measurements and assessments. Consistent data facilitates accurate comparisons between studies, allowing researchers to draw meaningful conclusions and make informed decisions. Without harmonization, variations in data definitions and measurement units could lead to misinterpretations and compromises in the reliability of research outcomes.
Expand Down Expand Up @@ -71,7 +71,7 @@ The Clinical Data Harmonization Playbook, developed by the Center for Data to He
- **Comprehensive Documentation:** The use of data dictionaries and other documentation tools ensures comprehensive coverage of data elements, mappings, and transformations. This documentation is essential for maintaining data quality and reproducibility.

3. **Data Governance and Quality Assurance:**
- **Governance Frameworks:** Robust governance frameworks contribute to responsible and ethical data practices. Oversight of data harmonization processes helps address regulatory compliance and minimizes risks related to privacy, security, and data integrity.
- **Governance [Frameworks](/data-harmonisation-tools-frameworks):** Robust governance frameworks contribute to responsible and ethical data practices. Oversight of data harmonization processes helps address regulatory compliance and minimizes risks related to privacy, security, and data integrity.

- **Quality Assurance Checks:** Regular quality assurance checks and audits are essential for upholding data quality standards. These measures reduce the likelihood of errors, biases, and discrepancies, enhancing the reliability of harmonized datasets.

Expand Down
Loading

0 comments on commit 7a08df0

Please sign in to comment.