-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
388ac1e
commit a58a9da
Showing
8 changed files
with
454 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: Contact Us | ||
--- | ||
|
||
{{< htmlcode >}} | ||
|
||
|
||
<iframe src="https://docs.google.com/forms/d/e/1FAIpQLSe3CoLKMb3nzy7KIpebn2xvkd3CBNMLCK_dB0CWUhQY-QP5vA/viewform?embedded=true" width="640" height="784" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe> | ||
|
||
|
||
{{</ htmlcode >}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: Fair data | ||
--- | ||
|
||
## What are the FAIR Data Principles? | ||
|
||
The [FAIR principles](https://www.go-fair.org/fair-principles/) are a set of guidelines for making data more **discoverable**, **accessible**, **interoperable**, and **reusable**. The principles are intended to help organizations and individuals maximise the value of their data by making it easier to find, access, and use. | ||
|
||
The FAIR principles were published in *Nature* in 2016 to address the difficulties in reproducing scientific research. Funding organisations, publishers, and governmental agencies are increasingly beginning to require data management plans for data generated in research. This means that if you find a scientific paper and want to reproduce the study, you should be able to reproduce the research with minimal friction. | ||
|
||
![img](/assets/img/fair.png) | ||
|
||
## How is Harmony following FAIR principles? | ||
|
||
### Harmony’s data is Findable | ||
|
||
The Harmony project is registered with the [Open Science Foundation](https://osf.io/bct6k/). With the exception of copyrighted protocols such as Beck’s Anxiety Inventory, all datasets (protocols) used in the development and testing of Harmony is available on our [Github repository](https://github.com/harmonydata/harmony) which is public access. Datasets not provided in raw form is provided in a [shell script](https://github.com/harmonydata/harmony/blob/main/data/raw_pdf/download_raw_pdfs.sh) which downloads documents from the web. The evaluation set from McElroy et al is provided in the Github repository. | ||
|
||
- F1. (Meta)data are assigned a globally unique and persistent identifier – The unique identifier for the Harmony project is https://osf.io/bct6k/ with the Open Science Foundation. The OSF profile links to Harmony’s [Github page](https://github.com/harmonydata). The Github repository contains a folder of [hard coded questionnaires](https://github.com/harmonydata/harmony/tree/main/front_end/hard_coded_questionnaires) where each questionnaire is in CSV format which serves as the unique ID. For raw PDF questionnaires available on the internet, a shell script is supplied which downloads each data file to an exact filename which serves as a unique identifier. | ||
- F2. Data are described with rich metadata – The OSF profile contains all relevant metadata on the project. The spreadsheet [Final harmonised item tool EM.xlsx](https://github.com/harmonydata/harmony/blob/main/data/Final harmonised item tool EM.xlsx) in the repository has a descriptions tab. | ||
- F3. Metadata clearly and explicitly include the identifier of the data they describe – the OSF profile links to Github and the Github URL is the unique identifier of the Github repository. All references to a questionnaire refer to the file name in the same format such as **GHQ 12 English**. | ||
- F4. (Meta)data are registered or indexed in a searchable resource – the OSF profile is searchable. In the Github repository, the files are downloaded by the shell script into a folder and there is a script to extract all data into a txt format which is searchable. | ||
|
||
### Harmony’s data is Accessible | ||
|
||
Since our dataset is public access on the Github repository, once a user has cloned (downloaded) the repository and run the shell script, all documents will be on their computer. | ||
|
||
- A1. (Meta)data are retrievable by their identifier using a standardised communications protocol – Harmony can be downloaded by cloning the Github repository. The script to download any extra questionnaires not supplied with Harmony is included in the Github repository. | ||
- A2. Metadata are accessible, even when the data are no longer available – since the unique ID of Harmony is the OSF profile, if Harmony were to be hosted elsewhere the OSF profile would remain with relevant metadata. The list of protocols for testing is included in the shell script. All protocols without open-access restrictions are included in [this folder](https://github.com/harmonydata/harmony/tree/main/front_end/hard_coded_questionnaires). | ||
|
||
### Harmony’s data is Interoperable | ||
|
||
Data is downloaded in PDF format and the library Apache Tika is used to convert to raw text format. There are no interoperability issues with raw text format. | ||
|
||
- I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. – all questionnaires are converted to a single structure where the text of a question is in one column. | ||
- I2. (Meta)data use vocabularies that follow FAIR principles. – You can read the data schema [here](https://github.com/harmonydata/harmony/blob/main/README.md#data-schema). | ||
- I3. (Meta)data include qualified references to other (meta)data – this item is not applicable since no datasets build on other datasets. | ||
|
||
### Harmony’s data is Reusable | ||
|
||
Harmony is released under the [MIT License](https://github.com/harmonydata/harmony/blob/main/LICENSE), which allows commercial use, modification, distribution, and private use of the tool and data. | ||
|
||
- R1. (Meta)data are richly described with a plurality of accurate and relevant attributes – our [project Github page](https://github.com/harmonydata) has information about the project, while the source repository has a [LICENCE](https://github.com/harmonydata/harmony/blob/main/LICENSE) and [README.md](https://github.com/harmonydata/harmony/blob/main/README.md) containing all relevant information about the project and reusability. | ||
|
||
## References | ||
|
||
Wilkinson, Mark D., et al. “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific data 3.1 (2016): 1-9. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
--- | ||
title: Frequently Asked Questions | ||
--- | ||
|
||
## What is harmonisation? | ||
|
||
Harmonisation means taking variables from different studies and manipulating them to make them comparable. | ||
|
||
For example, if we have datasets of depression from different countries, which is typically measured using a questionnaire, how can we harmonise two depression questionnaires? Typically this is a manual process – we would look at the content and find common elements between the questionnaires. | ||
|
||
For an example of a pre-existing harmonisation tool, please see: | ||
|
||
McElroy, E., Villadsen, A., Patalay, P., Goodman, A., Richards, M., Northstone, K., Fearon, P., Tibber, M., Gondek, D., & Ploubidis, G.B. (2020). [Harmonisation and Measurement Properties of Mental Health Measures in Six British Cohorts](https://www.closer.ac.uk/wp-content/uploads/210715-Harmonisation-measurement-properties-mental-health-measures-british-cohorts.pdf). London, UK: CLOSER. | ||
|
||
## What does Harmony do? | ||
|
||
Harmony is a tool that helps researchers automate the process of harmonisation using [natural language processing](https://fastdatascience.com/what-is-nlp/). | ||
|
||
## How do I cite Harmony? | ||
|
||
If you would like to cite the tool alone, you can cite: | ||
|
||
Wood, T.A., McElroy, E., Moltrecht, B., Ploubidis, G.B., Scopel Hoffmann, M., Harmony [Computer software], Version 1.0, accessed at https://harmonydata.ac.uk/app. Ulster University (2022) | ||
|
||
A BibTeX entry for LaTeX users is | ||
|
||
``` | ||
@unpublished{harmony, | ||
AUTHOR = {Wood, T.A., McElroy, E., Moltrecht, B., Ploubidis, G.B., Scopel Hoffman, M.}, | ||
TITLE = {Harmony (Computer software), Version 1.0}, | ||
YEAR = {2022}, | ||
Note = {To appear}, | ||
url = {https://harmonydata.ac.uk/app} | ||
} | ||
``` | ||
|
||
You can also cite the wider Harmony project which is registered with the [Open Science Foundation](https://osf.io/bct6k/): | ||
|
||
McElroy, E., Moltrecht, B., Scopel Hoffmann, M., Wood, T. A., & Ploubidis, G. (2023, January 6). Harmony – A global platform for contextual harmonisation, translation and cooperation in mental health research. Retrieved from osf.io/bct6k | ||
|
||
``` | ||
@misc{McElroy_Moltrecht_Scopel Hoffmann_Wood_Ploubidis_2023, | ||
title={Harmony - A global platform for contextual harmonisation, translation and cooperation in mental health research}, | ||
url={osf.io/bct6k}, | ||
publisher={OSF}, | ||
author={McElroy, Eoin and Moltrecht, Bettina and Scopel Hoffmann, Mauricio and Wood, Thomas A and Ploubidis, George}, | ||
year={2023}, | ||
month={Jan} | ||
} | ||
``` | ||
|
||
## Does Harmony store my data? | ||
|
||
If you upload a questionnaire or instrument, Harmony does not store or save it. You can read more on our [Privacy Policy page](https://harmonydata.ac.uk/privacy-policy/). | ||
|
||
## How does Harmony work? | ||
|
||
Harmony passes the text of each questionnaire item through a neural network called Sentence-BERT, in order to convert it into a vector. The similarity of two texts is then measured as the similarity between their vectors. Two identical texts have a similarity of 100% while two completely different texts have a similarity of 0%. You can read more in this [technical blog post](https://harmonydata.ac.uk/how-does-harmony-work/) and you can even download and run Harmony’s [source code](https://github.com/harmonydata/harmony). | ||
|
||
## How reliable is Harmony? | ||
|
||
Harmony was able to reconstruct the matches of the questionnaire harmonisation tool developed by McElroy et al in 2020 with the following AUC scores: childhood **84%**, adulthood **80%**. Harmony was able to match the questions of the English and Portuguese GAD-7 instruments with AUC **100%** and the Portuguese CBCL and SDQ with AUC **89%**. You can read more in [this blog post](https://harmonydata.ac.uk/measuring-the-performance-of-nlp-algorithms/). | ||
|
||
## What do the numbers mean? | ||
|
||
The numbers are the cosine similarity of document vectors. The cosine similarity of two vectors can range from -1 to 1 based on the angle between the two vectors being compared. We have converted these to percentages. We have also used a preprocessing stage to convert positive sentences to negative and vice-versa (e.g. _I feel anxious_ -> _I do not feel anxious_). If the match between two sentences improves once this preprocessing has been applied, then the items are assigned a negative similarity. | ||
|
||
## Does Harmony give p-values? | ||
|
||
At this time Harmony does not give p-values. Harmony matches vectors using a cosine score and p-values are not applicable in this context. | ||
|
||
## How should I report the numbers from Harmony in my paper? | ||
|
||
Items were matched on content using the online tool [Harmony](https://harmonydata.ac.uk/), which matches items by converting text to vectors using a transformer neural network ([Reimers & Gurevych, 2019](https://arxiv.org/abs/1908.10084)). Harmony produces a cosine score ranging from +/- 1, with values closer to 1 indicating a closer match. | ||
|
||
## How does Harmony compare to human harmonisation? | ||
|
||
If you imagine as a human, trying to match items in a questionnaire, you might decide that “I feel depressed” and “I feel sad” are similar. If you had to place them on the surface of a sphere, you might place them close to each other. Whereas different concepts might be far from each other. | ||
|
||
We can represent any concept as a vector of length 1, pointing to the surface of a sphere. Concepts that are similar have vectors close together. The cosine score of two vectors that are close together is close to 1. | ||
|
||
![img](/assets/img/sphere.svg) | ||
|
||
## Who made Harmony? | ||
|
||
The Python code of Harmony was written by [Thomas Wood](https://freelancedatascientist.net/) (Fast Data Science) in collaboration with Eoin McElroy, Bettina Moltrecht, George Ploubidis, and Mauricio Scopel Hoffman. | ||
|
||
## Does Harmony comply with FAIR data principles? | ||
|
||
We have developed Harmony as an open-source and open science initiative, paying attention to the [FAIR Guiding Principles for scientific data management and stewardship](https://www.go-fair.org/fair-principles/) (**F**indability, **A**ccessibility, **I**nteroperability, and **R**euse of digital assets). You can read more on our [FAIR data page](https://harmonydata.ac.uk/fair-data/). | ||
|
||
## What do other researchers say about Harmony? | ||
|
||
We recently did a user-testing at UCL’s Centre for Longitudinal Studies with psychology researchers from several universities. After the session, one postdoctoral researcher said: | ||
|
||
![img](/assets/img/quote.png) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
title: Our team | ||
layout: team | ||
--- |
Oops, something went wrong.