Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BINDetect not giving out error when the motif file is "deformed" #248

Open
johannesnicolaus opened this issue Dec 16, 2023 · 2 comments
Open

Comments

@johannesnicolaus
Copy link

Might be a continuation of issue #78. When I tried to run BINDetect using "pfm" motif file created by gimmemotifs, i get a problem where

The pfm file looks something like:

>GM.5.0.Sox.0001
0.7213  0.0793  0.1103  0.0891
0.9259  0.0072  0.0062  0.0607
0.0048  0.9203  0.0077  0.0672
0.9859  0.0030  0.0030  0.0081
0.9778  0.0043  0.0128  0.0051
0.1484  0.0050  0.0168  0.8299
>GM.5.0.Homeodomain.0001
0.8870  0.0000  0.0178  0.0951
0.1156  0.2033  0.6629  0.0181
0.0017  0.7452  0.0809  0.1722
0.0011  0.0003  0.0003  0.9983
0.0026  0.0141  0.9721  0.0111
0.0000  0.0189  0.0054  0.9758
0.0006  0.9983  0.0006  0.0006
0.9170  0.0140  0.0046  0.0644
0.2228  0.2421  0.3300  0.2051
0.3621  0.1054  0.2208  0.3116
0.5727  0.0104  0.1741  0.2428

For example, I have 1796 motifs in the pfm file, but I got the following warning:

2023-12-16 10:23:46 (1569572) [INFO]	Reading motifs from file
2023-12-16 10:23:47 (1569572) [INFO]	- Read 5531 motifs
2023-12-16 10:23:47 (1569572) [WARNING]	The motif output names (as given by --naming) are not unique.
2023-12-16 10:23:47 (1569572) [WARNING]	The following names occur more than once: ['_']
2023-12-16 10:23:47 (1569572) [WARNING]	These motifs will be renamed with '_1', '_2' etc. To prevent this renaming, please make the names of the input --motifs unique

And I got results with the directories named as such:

__1     __1413  __1829  __2243  __2659  __3073  __3489  __541  __957

or

GM.5.0.Sox.0001_GM.5.0.Sox.0001
GM.5.0.Sox.0002_GM.5.0.Sox.0002
GM.5.0.Sox.0003_GM.5.0.Sox.0003
GM.5.0.Sox.0004_GM.5.0.Sox.0004
GM.5.0.Sox.0005_GM.5.0.Sox.0005
GM.5.0.Sox.0006_GM.5.0.Sox.0006
GM.5.0.Sox.0007_GM.5.0.Sox.0007
GM.5.0.Sox.0008_GM.5.0.Sox.0008
GM.5.0.Sox.0009_GM.5.0.Sox.0009

Maybe this pfm file is not a standard pfm file, but maybe it would be nice if BINDetect gives an error that the motif file is not standard.

My current workaround is to run chen2meme, because it seems that it is a chen motif file. Now BINDetect seems to work fine.

@msbentsen
Copy link
Member

Hi @johannesnicolaus

Thank you for this issue - indeed it looks related to #78. There seems to be a bug in the reading of these files using biopython, which creates additional "empty" motifs with "_"-names. We have now changed it to manually parse and check the length, and will then write an error in case a deformed motif is found:
image

The code is not thoroughly tested yet, but you can have a look already by installing the version directly from the dev branch as:
pip install git+https://github.com/loosolab/TOBIAS@dev

After testing, the functionality will be included in the next version of TOBIAS. Hope that helps 🙏

@johannesnicolaus
Copy link
Author

Perfect, thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants