Add --merge-a3m option to colabfold_search. #769
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added a new option --merge-a3m with values 0 or 1 to colabfold_search. The current colabfold_search combines unpaired and paired sequences in a single a3m file for multiple sequence input. Paired sequences are concatenated. This a3m file is inconvenient to parse. If "--merge-a3m 0" is used then the unpaired and paired .a3m files created by "--unpack 1" are not merged. The separate files are easier to work with. The use-case that motivated this option was converting the colabfold_search output into Boltz .csv format MSA files.
The default behavior is to merge "--merge-a3m 1" maintaining the previous colabfold_search default behavior.
This pull request also fixed a small bug when merging is done it left some but not all of the unpaired .a3m files. It appears this was unintentional because the first unpaired .a3m file, named 0.a3m is not preserved because it is inadvertently overwritten when merging because the same file name is used {job_number}.a3m. So with a two sequence MSA the results directory would contain files 1.a3m (unpaired sequence 1 alignment) and merged file job_name.a3m. This pull request removes all the unpaired files if "--merge-a3m 1" is used so the results directory only contains job_name.a3m. This is a change to the default behavior since merging is done by default.