Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify why some compounds have multiple replicates #85

Open
ChenyuWang-Monica opened this issue Dec 2, 2023 · 2 comments
Open

Clarify why some compounds have multiple replicates #85

ChenyuWang-Monica opened this issue Dec 2, 2023 · 2 comments
Assignees
Labels
cpg0016 faq Document this issue in an FAQ

Comments

@ChenyuWang-Monica
Copy link

When I'm counting the replicates of each compound in the COMPOUND plates, I have a few questions:

  1. The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.

  2. The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?

  3. There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?

Thanks!

@shntnu shntnu added the cpg0016 label Dec 8, 2023
@shntnu shntnu changed the title Compounds with multiple replicates Clarify why some compounds have multiple replicates Dec 8, 2023
@niranjchandrasekaran
Copy link
Contributor

Hi @ChenyuWang-Monica, my answers are below

The top ten compounds have >6000 replicates. Among them are DMSO, the empty well (JCP2022_999999), and 8 positive controls. However, when I compare the InChIKey of the 8 positive controls with those given in https://github.com/jump-cellpainting/JUMP-Target/tree/master#positive-control-compounds, one of them disagrees: JCP2022_025848 (GJFCONYVAUNLKB-UHFFFAOYSA-N) has 8127 replicates but is not listed as a positive control; dexamethasone (UREBDLICKHMUKA-CXSFZGCWSA-N) listed as a positive control doesn't appear in the metadata compound.csv.gz.

We have been having some issues with matching InChIKeys between what we previously released in the JUMP-Target repo and what we released in this repo. But I can confirm that JCP2022_025848 is dexamethasone. The mapping between JCP2022 IDs and compound names are below.

Metadata_JCP2022 Metadata_InChIKey poscon_pert_iname JUMP_Target_InChIKey
JCP2022_085227 SRVFFFJZQVENJC-UHFFFAOYSA-N aloxistatin SRVFFFJZQVENJC-IHRRRGAJSA-N
JCP2022_037716 IVUGFMLRJOCGAS-UHFFFAOYSA-N AMG900 IVUGFMLRJOCGAS-UHFFFAOYSA-N
JCP2022_025848 GJFCONYVAUNLKB-UHFFFAOYSA-N dexamethasone UREBDLICKHMUKA-CXSFZGCWSA-N
JCP2022_046054 KPBNHDGDUADAGP-UHFFFAOYSA-N FK-866 KPBNHDGDUADAGP-VAWYXSNFSA-N
JCP2022_035095 IHLVSLOZUHKNMQ-UHFFFAOYSA-N LY2109761 IHLVSLOZUHKNMQ-UHFFFAOYSA-N
JCP2022_064022 OINGHOPGNMYCAB-UHFFFAOYSA-N NVS-PAK1-1 OINGHOPGNMYCAB-INIZCTEOSA-N
JCP2022_050797 LOUPRKONTZGTKE-UHFFFAOYSA-N quinidine LOUPRKONTZGTKE-LHHVKLHASA-N
JCP2022_012818 CQKBSRPVZZLCJE-UHFFFAOYSA-N TC-S-7004 CQKBSRPVZZLCJE-UHFFFAOYSA-N

The 11th-ranked compound JCP2022_033954 has 1594 replicates. Is it also a positive control or what is it aiming for?

Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in #30 (comment), but I don't know whether we flagged this as a metadata error or not.

There are many compounds with multiple replicates (for example over 10 but less than 60). Why do they have much more replicates than the common case as mentioned in the paper (i.e. about 5)?

In general, most compounds should have five replicates, but there are some exceptions and I have listed some of them below.

  • There is an overlap of compounds between source 7, who did not exchange compounds with the other sources, and the other sources. These compounds will have more than 5 replicates.
  • Around 2000 compounds are common between wave 1 and wave 2 sources (you can find more information about the two waves of sources in the manuscript). These compounds will also have more than 5 replicates.
  • One of the sources needed to have multiple replicates of the compounds that they nominated. When these compounds were exchanged with other sources, we ended up with more than 5 replicates of these compounds.

@shntnu shntnu added the faq Document this issue in an FAQ label Dec 19, 2023
@shntnu
Copy link
Contributor

shntnu commented Dec 19, 2023

Thanks for bringing this to our attention. I believe this is a metadata issue. Most of these wells come from a single source (source_9) and all the wells are in columns 1, 24, 25 or 48. @shntnu you had noticed the number of replicates in #30 (comment), but I don't know whether we flagged this as a metadata error or not.

Indeed – not sure why this was the case. I'll follow up in that internal issue and loop back here

@shntnu shntnu mentioned this issue Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpg0016 faq Document this issue in an FAQ
Projects
None yet
Development

No branches or pull requests

3 participants