Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic generation of SMILES for PTMs #199

Merged
merged 8 commits into from
Aug 8, 2024
Merged

dynamic generation of SMILES for PTMs #199

merged 8 commits into from
Aug 8, 2024

Conversation

boopthesnoot
Copy link
Contributor

You can find the description and examples in docs/nbs/tutorial_smiles.ipynb

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GeorgWa
Copy link
Collaborator

GeorgWa commented Jul 17, 2024

I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml:
https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml

So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry: sum/composition and smiles.

In the modifications.tsv you can just add a smiles column.

extra_requirements/development.txt Outdated Show resolved Hide resolved
tests/run_tests.sh Outdated Show resolved Hide resolved
alphabase/smiles/smiles.py Outdated Show resolved Hide resolved
docs/nbs/tutorial_smiles.ipynb Show resolved Hide resolved
tests/test_smiles.py Show resolved Hide resolved
@GeorgWa
Copy link
Collaborator

GeorgWa commented Jul 17, 2024

@boopthesnoot Have a look at #200.
I created an extra and a decorator for rdkit.

@boopthesnoot
Copy link
Contributor Author

I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml: https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml

So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry: sum/composition and smiles.

In the modifications.tsv you can just add a smiles column.

@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere.
By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)

@GeorgWa
Copy link
Collaborator

GeorgWa commented Jul 17, 2024

@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere. By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)

We could resolve this by looking up the localizer @Any N-Term. Alternatively we can also introduce a second column location = {'N','C','AA'} which would use dynamic or fixed smiles depending of the value.

In alphabase the modification names likeDimethyl@K are the primary keys across all applications. I think this primary key should only be defined once. Furthermore, the master record in modifications.tsv is updated automatically from unimod if more modifications are added. This way everything will stay in sync.

@jalew188
Copy link
Collaborator

jalew188 commented Jul 17, 2024

@GeorgWa But the smiles in the modifications.tsv will be a mess, some of them will be AA's with PTMs, some of them will be terminal modifications only, without the AA, and we would still have to store which is which somewhere. By adding a key for each of the AAs in amino_acids.yaml we'll still have double bookkeeping of the atomic composition because we can infer it from SMILES. Ofc it would mean having a rdkit dependency for the whole package x)

We could resolve this by looking up the localizer @Any N-Term. Alternatively we can also introduce a second column location = {'N','C','AA'} which would use dynamic or fixed smiles depending of the value.

In alphabase the modification names likeDimethyl@K are the primary keys across all applications. I think this primary key should only be defined once. Furthermore, the master record in modifications.tsv is updated automatically from unimod if more modifications are added. This way everything will stay in sync.

Yes, I think we should use only one PTM and AA defination file to avoid ambiguity in the future.

@jalew188
Copy link
Collaborator

I would suggest to integrate the data with the modification.tsv and the amino_acids.yaml: https://github.com/MannLabs/alphabase/blob/737f25c79c34de5182140543bf887ab61d7e53d5/alphabase/constants/const_files/amino_acid.yaml

So we don't have to add a new constants folder and need to have double book keeping of modification names. I think we can add two keys for each entry: sum/composition and smiles.

In the modifications.tsv you can just add a smiles column.

We should use aa.tsv instead of aa.yaml for AAs, similar to modification.tsv

alphabase/constants/aa.py Outdated Show resolved Hide resolved
alphabase/constants/atom.py Outdated Show resolved Hide resolved
alphabase/constants/atom.py Show resolved Hide resolved
alphabase/constants/atom.py Show resolved Hide resolved
alphabase/constants/atom.py Show resolved Hide resolved
alphabase/constants/atom.py Outdated Show resolved Hide resolved
alphabase/constants/atom.py Outdated Show resolved Hide resolved
alphabase/constants/modification.py Show resolved Hide resolved
alphabase/smiles/smiles.py Outdated Show resolved Hide resolved
extra_requirements/development.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@GeorgWa GeorgWa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with numpy style docstrings and without import level actions we are good to merge 👍🏻

alphabase/smiles/smiles.py Outdated Show resolved Hide resolved
alphabase/smiles/smiles.py Outdated Show resolved Hide resolved
alphabase/constants/modification.py Show resolved Hide resolved
Copy link
Collaborator

@jalew188 jalew188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GeorgWa
Copy link
Collaborator

GeorgWa commented Aug 5, 2024

I just catched that the dtype of unimod column in the modification.tsv changed to float. Can we move back?

Copy link
Contributor

@mschwoer mschwoer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! this PR definitely puts smiles in peoples faces ;-)

alphabase/constants/atom.py Show resolved Hide resolved
alphabase/smiles/smiles.py Outdated Show resolved Hide resolved
alphabase/smiles/smiles.py Show resolved Hide resolved
alphabase/constants/aa.py Outdated Show resolved Hide resolved
alphabase/constants/atom.py Show resolved Hide resolved
@boopthesnoot boopthesnoot merged commit 8962685 into development Aug 8, 2024
2 checks passed
@boopthesnoot boopthesnoot deleted the smiles branch August 8, 2024 14:21
@boopthesnoot boopthesnoot restored the smiles branch September 23, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants