The goal of this project is to automate the incorporation of the 3D structural information into the Rfam seed alignments using the following workflow:
-
The Rfam-PDB mapping file is used to find out which PDB files need to be added to which seed alignments.
-
The 3D structural annotations are downloaded from the RNA 3D Hub database which regularly annotates all RNA 3D structures using FR3D.
-
The PDB sequences and secondary structures in dot-bracket notation are iteratively added to the Rfam seed alignments using the cmalign Infernal program.
-
The PDB accessions are replaced with RNAcentral identifiers in the final alignments.
-
Download the repository or use
git clone
. -
Start an interactive session using Docker:
docker-compose run rfam
Alternatively, follow instructions in the Dockerfile to install locally.
-
To update one or more Rfam families:
add_3d.py RF00162 add_3d.py RF00162 RF00507
Use
--nocache
to force recomputing the output and download the latest PDB-Rfam and PDB-RNAcentral mapping files. -
To update all families:
add_3d.py all --nocache
-
To get FR3D secondary structure for a PDB id:
fr3d_2d.py 2QUS_B >2QUS_B GGGAGCCCUGUCACCGGAUGUGCUUUCCGGUCUGAUGAGUCCGUGAGGACAAAACAGGGCUCCCGAAUU .((((((((((.((((((.....{.))))))(....).((((...}))))...))))))))))......
The updated seed alignments with the added 3D structures will be in the output
folder (see precomputed results).
It is possible to manually add mapping between Rfam accessions and PDB ids to pdb_full_region_curated.txt. This step is needed in order to analyse PDB sequences that do not match Rfam covariance models automatically. This can happen when a PDB sequence gets a bit score below the Rfam threshold because it is much shorter than the corresponding Rfam model.
Please feel free to raise an issue to report any problems with the code or the data.
We would like to thank Sri Devan Appasamy and Craig Zirbel for developing an RNA 3D Hub API to provide FR3D annotations for RNA 3D structures.