Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly sped up filtering and minor bug fixes #51

Merged
merged 63 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
c8fa14d
minor example update
maylinnp Sep 16, 2024
4d1e69e
updated installation instructions
maylinnp Sep 16, 2024
ace5b46
added mock import for matplotlib so all modules and methods show
maylinnp Sep 16, 2024
8f272df
removed default debug level for cli
maylinnp Sep 16, 2024
4f03310
removed default writing of log file
maylinnp Sep 16, 2024
678a94f
updated log file change for debug mode
maylinnp Sep 16, 2024
77074e5
updated remaining md to rst and some instruction changes
maylinnp Sep 16, 2024
78de167
updated installation instructions
maylinnp Sep 16, 2024
0f4e124
updated how the scripts are referred to (removing .py and paths)
maylinnp Sep 16, 2024
689ada9
updated how the scripts are referred to (removing .py and paths)
maylinnp Sep 16, 2024
e72b519
added instructions on how to finalize db write manually
maylinnp Sep 16, 2024
3d789bb
updated readme
maylinnp Sep 16, 2024
08f4919
corrected minor sentence structure error
maylinnp Sep 16, 2024
11c22e7
fix spelling error in test dir
maylinnp Sep 16, 2024
293fb64
update what python versions ringtail will work with
maylinnp Sep 16, 2024
12561a8
no longer count failed files in the final processed file talley
maylinnp Sep 16, 2024
d5fdfb2
bug fix for adding interaction while writing vina results
maylinnp Sep 17, 2024
d261acd
added readthedocs link
maylinnp Sep 17, 2024
fd5b83d
writes bitvector table and use it for interaction filtering, no other…
maylinnp Sep 20, 2024
7bf8d45
updated rdkit method to use bitvector table
maylinnp Sep 21, 2024
5fc22c1
updated rdkit method to use bitvector table
maylinnp Sep 21, 2024
438029f
added two more indices and renamed the existing ones
maylinnp Sep 23, 2024
10cac12
annotated what methods need refactoring
maylinnp Sep 23, 2024
846edc1
removed variables for indexing columns on the fly, as it is not used …
maylinnp Sep 23, 2024
cec03f6
made filtering_window local variable to only method using it
maylinnp Sep 23, 2024
6870764
writes logger warning if an unrecognized interaction count filter is …
maylinnp Sep 23, 2024
54688d5
made separate method for formatting filters for db query use
maylinnp Sep 23, 2024
857a976
started major rewrite of the filtering query for interactions and ene…
maylinnp Sep 24, 2024
42561b0
only require ligand operator if specifying ligand substruct
maylinnp Sep 24, 2024
ce9d67f
rewrote unclustered query
maylinnp Sep 25, 2024
1102a37
rewritten entire filtering query constructor including for clustering
maylinnp Sep 25, 2024
d3c88d2
fix so max_miss works in storageman
maylinnp Sep 25, 2024
57a2e98
updated filter dict to compare to since defaults for ligand filters c…
maylinnp Sep 25, 2024
07501b4
enumerating interaction combinations now work
maylinnp Sep 25, 2024
1b28cb1
removed nonsensical filter value
maylinnp Sep 25, 2024
329861b
remove bitvector table again
maylinnp Sep 26, 2024
f02e067
switched position of two tests
maylinnp Sep 26, 2024
a0b2a4f
fixed table ref for mfpt clustering
maylinnp Sep 26, 2024
3276f4a
made similar ligand output context managed by storageman
maylinnp Sep 26, 2024
4b4f3df
fixed bug with ligand filtering
maylinnp Sep 26, 2024
8fbd6b0
added test for enumerated interaction combinations
maylinnp Sep 26, 2024
99028a0
defaulting ligand operator to 'OR'
maylinnp Sep 26, 2024
3dc4581
cast ligand filter values to string while writing log
maylinnp Sep 26, 2024
93af3f0
added ligand filters to pytest
maylinnp Sep 26, 2024
8596bf5
added bug fix for ligand filter keywords
maylinnp Sep 27, 2024
73094b7
fixed test bug in compared dict
maylinnp Sep 27, 2024
6c23b86
removed debug mode from tests
maylinnp Sep 27, 2024
c14e867
fixed ligand substruct bug and handling of ligand filters
maylinnp Sep 27, 2024
a2e4117
ligand operator set to default OR until changed in code
maylinnp Sep 27, 2024
07aeb6d
changed order ligand operator appears in
maylinnp Sep 27, 2024
a66e684
fixed bug with dropping bookmark
maylinnp Sep 27, 2024
7d91542
fixed drop bookmark bug, removed some TODOs and fixed a bug in a query
maylinnp Sep 27, 2024
cc6c436
removed warning for nonunion bookmark if enumerate_intearction_combs …
maylinnp Sep 30, 2024
d361932
updated doc strings and removed todos, redundant print statements, etc
maylinnp Oct 3, 2024
3ac0f4b
added two semicolons and removed a todo
maylinnp Oct 4, 2024
e6e44cf
added note about visidata and chemicalite bookmarks
maylinnp Oct 4, 2024
6ab9009
fixed bug with ligand_substruct_pos and updated docs
maylinnp Oct 4, 2024
3149391
updated doc for ligand_max_atoms
maylinnp Oct 4, 2024
ce24f8b
bug fix: max number of heavy atoms uses correct chemicalite method
maylinnp Oct 4, 2024
f858b54
updated docs with bug fix
maylinnp Oct 4, 2024
5bc9e83
updated doc string for ligand filters
maylinnp Oct 7, 2024
e756591
added create indices to database update method
maylinnp Oct 7, 2024
386b838
updated code version references to 2.1.0 (db version is stil 2.0.0)
maylinnp Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
include README.md
include LICENSE
include docs/*
include tests/*
include test/*
231 changes: 132 additions & 99 deletions README.md

Large diffs are not rendered by default.

27 changes: 20 additions & 7 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ By default (for DLGs), Ringtail will store the best-scored (lowest energy) bindi
rtc.add_results_from_files( file_path = "path2"
max_poses = 5)

Iteratively appending to a database
------------------------------------
When results are added to the database, there is a final step where some tables are indexed, and some database properties saved. If you are adding data iteratively through e.g., a for-loop and adding some number at files at once, it is time-consuming (and not necessary) to do this every iteration. Instead, you can invoke the keyword ``finalize=False``, and run the finalization method separately at the end:

.. code-block:: python

for folder in enumerate("path_with_many_folders"):
rtc.add_results_from_files( file_path = folder,
finalize = False)

rtc.finalize_write()

Filtering
**********

Expand Down Expand Up @@ -157,16 +169,17 @@ The ``max_miss`` keywords allows the user to filter by given interactions exclud

Ligand filters
===============
Several filters pertaining to the SMARTS structure of the ligand can be used. For example, the ``ligand_substruct_pos`` keyword may be used to filter for a specific ligand substructure (specified with a SMARTS string) to be placed within some distance of a given cartesian coordinate. The format for this option is ``"<SMARTS pattern: str>" <index of atom in SMARTS: int> <cutoff distance: float> <target x coord: float> <target y coord: float> <target z coord: float>``.
ligand_name: Specify ligand name(s). Will combine name filters with 'OR'.
ligand_substruct: SMARTS pattern(s) for substructure matching.
ligand_substruct_pos: SMARTS pattern(s) for substructure matching, e.g., ['[Oh]C', 0, 1.2, -5.5, 10.0, 15.5] -> [smart_string, index_of_positioned_atom, cutoff_distance, x, y, z].
ligand_max_atoms: Maximum number of heavy atoms a ligand may have.
ligand_operator: Logical join operator for multiple SMARTS.
Several filters pertaining to the SMARTS structure of the ligand can be used. For example, ligands can be filtered for presence of certain substrctures specified by their SMARTS string using ``ligand_substruct``, as well as their ligand name contaning a specific phrase ``ligand_name``. The ligand name search will include any ligand names that contain the specified phrase, and does not look for exact matches only. Use the keyword ``ligand_operator`` to determine if the ligand filters should be evaluated as this ``OR`` that (default), or combined with ``AND``. ``ligand_max_atoms`` can be used to specify maximum number of heavy atoms a ligand may have.

.. code-block:: python

rtc.filter(ligand_substruct=["[Oh]C", "C=O"], ligand_name="cool_ligand",ligand_operator="AND", ligand_max_atoms=5)

The ``ligand_substruct_pos`` option may be used to filter for a specific ligand substructure to be placed within some distance of a given cartesian coordinate. The format for this option using the API is as a list of the six elements: ``["<SMARTS pattern: str>"," <index of atom in SMARTS: int>, <cutoff distance: float>, <target x coord: float>, <target y coord: float>, <target z coord: float>]``. If seachring for more than one ``ligand_substruct_pos`` make the value a list of lists.

.. code-block:: python

rtc.filter(ligand_substruct=["[Oh]C"], ligand_substruct_pos=["[Oh]C", 0, 1.2, -5.5, 10.0, 15.5])
rtc.filter(ligand_name="_1", ligand_substruct_pos=[["C=O", 1, 10, 102, 106, 154], ['[C][Oh]', 1, 10, 102, 106, 154]])


Clustering
Expand Down
20 changes: 16 additions & 4 deletions docs/source/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,18 @@
Changes in Ringtail
######################

Changes in 2.0.0: fully developed API
Changes in 2.1.0: enhanced filtering speed
******************************************
Enhancements to the code base
==============================
* The format of the queries produced to filter the database have been completely rewritten, reducing filtering time by at least a factor of 10 compared to 1.1.0. Extra indices were added to three of the tables to support the faster filtering speeds.

Bug fixes
===========
* The use of the keywords `--ligand_name`, `--ligand_substruct`, and `--ligand_substruct_pos` had ambiguous behavior where if they were invoked more than once, only the last filter value would be used (as opposed to concatenating the values). They now will work by supplying multiple values to one keyword, as well as one or more values to two or more keywords. Further, `ligand_substruct_pos` now takes input as one string (`"[C][Oh] 1 1.5 -20 42 -7.1"`)as opposed to one string and five numbers (`"[C][Oh]"" 1 1.5 -20 42 -7.1`).
* `--ligand_max_atoms` counted all atoms in the ligand, including hydrogens. With bug fix it counts only heavy atoms(not hydrogens).

Changes in 2.x: fully developed API
***************************************

Changes in keywords used for the command line tool
Expand All @@ -24,18 +35,19 @@ Enhancements to the codebase
==============================
* Fully developed API can use python for scripting exclusively (see :ref:`API <api>` page for full description)
* Can add docking results directly without using file system (for vina only as output comes as a string).
* The Ringtail log is now written to a logging file in addition to STDOUT
* The Ringtail log is now written to a logging file in addition to STDOUT if log level is det to "DEBUG".

Changes to code behavior
=========================
* Interaction tables: one new table has been added (`Interactions`) which references the interaction id from `Interaction_indices`, while the table `Interaction_bitvectors` has been discontinued.
* A new method to update an existing database 1.1.0 (or 1.0.0) to 2.0.0 is included. However, if the existing database was created with the duplicate handling option, there is a chance of inconsistent behavior of anything involving interactions as the Pose_ID was not used as an explicit foreign key in db v1.0.0 and v1.1.0 (see Bug fixes below).
* A new method to update an existing database 1.1.0 (or 1.0.0) to 2.0 is included. However, if the existing database was created with the duplicate handling option, there is a chance of inconsistent behavior of anything involving interactions as the Pose_ID was not used as an explicit foreign key in db v1.0.0 and v1.1.0 (see Bug fixes below).

Bug fixes
===========
* The option `duplicate_handling` could previously only be applied during database creation and produced inconsistent table behavior. Option can now be applied at any time results are added to a database, and will create internally consistent tables. **Please note: if you have created tables in the past and invoking the keyword `duplicate_handling` you may have errors in the "Interaction_bitvectors" table (<2.0.0). These errors cannot be recovered, and we recommend you re-make the database with Ringtail 2.0.0.**
* The option `duplicate_handling` could previously only be applied during database creation and produced inconsistent table behavior. Option can now be applied at any time results are added to a database, and will create internally consistent tables. **Please note: if you have created tables in the past and invoking the keyword `duplicate_handling` you may have errors in the "Interaction_bitvectors" table (<2.0). These errors cannot be recovered, and we recommend you re-make the database with Ringtail 2.0.**
* Writing SDFs from filtering bookmarks: will check that bookmark exists and has data before writing, and will now produce SDFs for any bookmarks existing bookmarks. If the bookmark results from a filtering where `max_miss` &lt; 0 it will note if the non-union bookmark is used, and if the base name for such bookmarks is provided it will default to the `basename_union` bookmark for writing the SDFs.
* Output from filtering using `max_miss` and `output_all_poses=False`(default) now producing expected behavior of outputting only one pose per ligand. Filtering for interactions `max_miss` allows any given pose for a ligand to miss `max_miss` interactions and still be considered to pass the filter. Previously, in the resulting `union` bookmark and `output_log` text file some ligands would present with more than one pose, although the option to `output_all_poses` was `False` (and thus the expectation would be one pose outputted per ligand). This would give the wrong count for how many ligands passed a filter, as some were counted more than once.
* The use of the keywords `--ligand_name`, `--ligand_substruct`, and `--ligand_substruct_pos` had ambiguous behavior where if they were invoked more than once, only the last filter value would be used (as opposed to concatenating the values). They now will work by supplying multiple values to one keyword, as well as one or more values to two or more keywords. Further, `ligand_substruct_pos` now takes input as one string (`"[C][Oh] 1 1.5 -20 42 -7.1"`)as opposed to one string and five numbers (`"[C][Oh]"" 1 1.5 -20 42 -7.1`).

Changes in 1.1.0: enhanced database performance
***********************************************
Expand Down
Loading
Loading