Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for More Information on Hardcoded References in Preprocessing #1

Open
ShaunFChen opened this issue Jul 2, 2023 · 1 comment

Comments

@ShaunFChen
Copy link

ShaunFChen commented Jul 2, 2023

Hello,

I have been exploring your NeuralCVD repository for our study and appreciate the considerable effort put into this tool. We believe it has a potential to make a significant contribution to our research. However, I have been encountering some difficulties during the preprocessing step.

The tool appears to have hardcoded references to files under:

path = "/data/analysis/ag-reils/steinfej/code/umbrella/pre/ukbb"
data_path = "/data/analysis/ag-reils/ag-reils-shared/cardioRS/data"

in the subfolder named mapping, also:

codes_gp_records = pd.read_feather(f"{data_path}/1_decoded/codes_gp_diagnoses_210119.feather").drop("level", axis=1)
codes_hospital_records = pd.read_feather(f"{data_path}/1_decoded/codes_hes_diagnoses_210120.feather")

which didn't include in the output of "0_decode_ukbb.ipynb".

While I understand that the UK Biobank codings are used in your tool, and I'm able to obtain those, there are other datasets which are not clear to me: atc, phecodes, snomed_cor_list, and athena_vocabulary_covid. I am having difficulty confirming the consistency of these data and their format with what the tool requires. In order to correctly run the tool and ensure the validity of our results, it's crucial that we have the same version and format of these specific datasets. Unfortunately, the current resources do not provide sufficient details to accurately reproduce this setup.

As a result, I kindly request you to share these referenced data directly, if it's possible and within compliance.

However, if direct access is not feasible due to any constraints, could you please provide further information on how to obtain or generate these datasets? This ideally includes the specific versions of these datasets, the expected formats, and any preprocessing steps required for compatibility with NeuralCVD.

Your assistance will greatly aid us in overcoming this roadblock, and will facilitate the effective use of this tool in our research.

Thank you for your time and for your invaluable contributions to the field.

Best regards,
Shaun

@DhanushB2000
Copy link

Thank you for developing such a good code snippet of exploring the UKBiobank data.
I was also looking into this comprehensive code for getting familiarised to work with UKBB data.
It would be great if you could share the files that were used for the code (like as mentioned in the previous comment as well as).
Your help is much appreciated, requesting you to share those files.

Regards,
Dhanush

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants