Skip to content

Conversation

tomgoddard
Copy link
Contributor

Added a new option --merge-a3m with values 0 or 1 to colabfold_search. The current colabfold_search combines unpaired and paired sequences in a single a3m file for multiple sequence input. Paired sequences are concatenated. This a3m file is inconvenient to parse. If "--merge-a3m 0" is used then the unpaired and paired .a3m files created by "--unpack 1" are not merged. The separate files are easier to work with. The use-case that motivated this option was converting the colabfold_search output into Boltz .csv format MSA files.

The default behavior is to merge "--merge-a3m 1" maintaining the previous colabfold_search default behavior.

This pull request also fixed a small bug when merging is done it left some but not all of the unpaired .a3m files. It appears this was unintentional because the first unpaired .a3m file, named 0.a3m is not preserved because it is inadvertently overwritten when merging because the same file name is used {job_number}.a3m. So with a two sequence MSA the results directory would contain files 1.a3m (unpaired sequence 1 alignment) and merged file job_name.a3m. This pull request removes all the unpaired files if "--merge-a3m 1" is used so the results directory only contains job_name.a3m. This is a change to the default behavior since merging is done by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant