Skip to content

A pipeline for adding RNA 3D structures to Rfam seed alignments

License

Notifications You must be signed in to change notification settings

Rfam/rfam-3d-seed-alignments

Repository files navigation

Rfam 3D Seed Alignments

The goal of this project is to automate the incorporation of the 3D structural information into the Rfam seed alignments using the following workflow:

  • The Rfam-PDB mapping file is used to find out which PDB files need to be added to which seed alignments.

  • The 3D structural annotations are downloaded from the RNA 3D Hub database which regularly annotates all RNA 3D structures using FR3D.

  • The PDB sequences and secondary structures in dot-bracket notation are iteratively added to the Rfam seed alignments using the cmalign Infernal program.

  • The PDB accessions are replaced with RNAcentral identifiers in the final alignments.

Installation

  • Download the repository or use git clone.

  • Start an interactive session using Docker:

    docker-compose run rfam
    

Alternatively, follow instructions in the Dockerfile to install locally.

Usage

  • To update one or more Rfam families:

    add_3d.py RF00162
    add_3d.py RF00162 RF00507
    

    Use --nocache to force recomputing the output and download the latest PDB-Rfam and PDB-RNAcentral mapping files.

  • To update all families:

    add_3d.py all --nocache
    
  • To get FR3D secondary structure for a PDB id:

    fr3d_2d.py 2QUS_B
    >2QUS_B
    GGGAGCCCUGUCACCGGAUGUGCUUUCCGGUCUGAUGAGUCCGUGAGGACAAAACAGGGCUCCCGAAUU
    .((((((((((.((((((.....{.))))))(....).((((...}))))...))))))))))......
    

The updated seed alignments with the added 3D structures will be in the output folder (see precomputed results).

Manually curated Rfam-PDB mapping file

It is possible to manually add mapping between Rfam accessions and PDB ids to pdb_full_region_curated.txt. This step is needed in order to analyse PDB sequences that do not match Rfam covariance models automatically. This can happen when a PDB sequence gets a bit score below the Rfam threshold because it is much shorter than the corresponding Rfam model.

Feedback

Please feel free to raise an issue to report any problems with the code or the data.

Acknowledgements

We would like to thank Sri Devan Appasamy and Craig Zirbel for developing an RNA 3D Hub API to provide FR3D annotations for RNA 3D structures.