ISMB 2014 Poster: Systematic detection of internal symmetry in proteins
Symmetry is a common and significant feature of protein structures. Symmetry has been found to be important for understanding protein evolution, DNA binding, allosteric regulation, cooperativity, and folding. We have compiled a census of internal symmetry, conducted using the novel CE-Symm algorithm. We find that internal symmetry is present in at least 18% of superfamilies. To elucidate the relationship between symmetry and protein function, the census is analyzed with respect to structural classification, enzyme activity, and ligand binding. The CE-Symm algorithm was benchmarked against a manually curated set of ~1000 domains. Myers-Turnbull, D., Bliven, S. E., Rose, P. W., Aziz, Z. K., Youkharibache, P., Bourne, P. E., & Prli, A. (2014). Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of Molecular Biology, 426(11), 22552268. PMID: 24681267 This poster was presented at the 22nd Annual International Conference on Intelligent Systems for Molecular Biology (2014).
1.Case Studies! ! Glyoxalase I is a dimer in both C. acetobutylicum and E. coli. However, the arrangement of the chains is quite different. Detecting the C2 internal symmetry in each chain reveals that the two active sites are each composed of two structural repeats, giving an overall dihedral symmetry. A monomer from 1,2-dihydroxy-naphthalene dioxygenase also contains four copies of the repeat in the same arrangement. ABC transporters are responsible for transporting a wide range of metabolites across the cell membrane. The Vitamin B12 transporter BtuCD is a heterodimer with C2 quaternary symmetry. It binds a single BtuF subunit on the periplasmic face. BtuF has two-fold pseudosymmetry, which binds each of the two BtuC subunits and induces a slight asymmetry in the overall conformation. Systematic detection of internal symmetry in proteins Spencer E. Bliven1,8,*, Douglas Myers-Turnbull2, Peter W. Rose3,7, Zaid K. Aziz4, Philippe Youkharibache6, Philip E. Bourne5,7, Andreas Prli3,7,** Bioinformatics and Systems Biology Program1, Department of Computer Science and Engineering2, San Diego Supercomputer Center3, and Department of Chemistry and Biochemistry4, Skaggs School of Pharmacy and Pharmaceutical Sciences5, University of California San Diego. InPharmatics Corporation6. RCSB PDB7. Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health8. *firstname.lastname@example.org CE-Symm! The CE-Symm algorithm has been created to detect internal symmetry in proteins. It is available as a stand-alone command line tool, as part of the BioJava software library,14 and as a web server (see Availability). CE-Symm first identifies structurally similar regions within the protein structure. It then refines this alignment to improve the correspondence between structural repeats. 1. Identify structurally similar regions! The CE-Symm algorithm starts by identifying a non-trivial structural alignment between a protein and itself using Combinatorial Extension10 (CE). This uses the dynamic programming and progressive refinement o f C E , b u t w i t h t wo modifications. 1.A strong penalty term is added to self-aligned residues to prevent the trivial 0 rotation from dominating. 2.The alignment matrix is duplicated in the manner of Uliel et al.11 to account for the circular permutation which is introduced when comparing a symmetric protein against a rotated copy of itself. (Left) Fibroblast growth factor 1 [3JUT], colored to show internal symmetry. (Right) Dot plot showing equivalent residues within the protein. Red lines correspond to a 120 clockwise rotation of the protein around the 3-fold axis, and cyan to the 240 rotation. After duplicating the matrix, each alignment forms a sequential diagonal line which can be fully detected by CE. Gray shading indicates regions near the diagonal which are penalized by the scoring function. Poster first presented at the 22nd Annual International Conference on Intelligent Systems for Molecular Biology (2014). The RCSB PDB is supported by the National Science Foundation [NSF DBI 0829586]; National Institute of General Medical Sciences; Office of Science, Department of Energy; National Library of Medicine; National Cancer Institute; National Institute of Neurological Disorders and Stroke; and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is a member of the wwPDB. This research was supported by the Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Abstract! Symmetry is a common and significant feature of protein structures. Symmetry has been found to be important for understanding protein evolution, DNA binding, allosteric regulation, cooperativity, and folding. We have compiled a census of internal symmetry, conducted using the novel CE-Symm algorithm. We find that internal symmetry is present in at least 18% of superfamilies. To elucidate the relationship between symmetry and protein function, the census is analyzed with respect to structural classification, enzyme activity, and ligand binding. The CE-Symm algorithm was benchmarked against a manually curated set of ~1000 domains. ! Myers-Turnbull, D., Bliven, S. E., Rose, P. W., Aziz, Z. K., Youkharibache, P., Bourne, P. E., & Prli, A. (2014). Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of Molecular Biology, 426(11), 22552268. PMID: 24681267 1. Lee, J. & Blaber, M. PNAS 108, 126130 (2011). 2. Monod, J. et al. J Mol Biol 12, 88118 (1965). 3. Juo, Z. S. et al. J Mol Biol 261, 239254 (1996). 4. Goodsell, D. S. & Olson, A. J. Annu Rev Biophys Biomol Struct 29, 105153 (2000). 5. Gosavi, S. et al. J Mol Biol 357, 986996 (2006). 6. Fortenberry, C. et al. J Am Chem Soc 133, 18026 18029 (2011). 7. Murray, K. B. et al. J Mol Biol 316, 341363 (2002). 8. Kim, C. et al. BMC Bioinformatics 11, 303 (2010). 9. Guerler, A. et al. J Chem Inf Model 49, 21472151 (2009). 10. Shindyalov, I. N. & Bourne, P. E. Protein Eng 11, 739747 (1998). 11. Uliel, S. et al. Bioinformatics 15, 930936 (1999). 12. Abraham, A.-L. et al. J Mol Biol 394, 522534 (2009). 13. Zhang, Y., & Skolnick, J. Proteins: Structure, Function, and Bioinformatics, 57(4), 702710 (2004). 14. Prli, A. et al. Bioinformatics, 28(20), 26932695 (2012). 15. Neuwald, A. F. Nucleic Acids Research, 33(11), 36143628 (2005). 16. Zuccola, H. J., Filman, D. J., Coen, D. M., & Hogle, J. M. Cell, 5(2), 267278 (2000). References Quaternary Structure Symmetry! Quaternary symmetry consists of multiple identical polypeptide chains arranged in a symmetric fashion. Such symmetry is extremely common in proteins, occurring in approximately 80% of structures in the Protein Data Bank (PDB). Detecting quaternary symmetry relies on accurate assignment of the correct biological assembly for each protein. The PDB now annotates protein structures with their quaternary symmetry (Peter Rose et al., in preparation). For quaternary symmetry, only the subunits in the biological assembly are considered. The subunits may surround either a crystallographic axis (for crystal structures) or a non-crystallographic axis. However, because the equivalent chains are identical, a one-to-one relationship exists between atoms in each symmetry unit. Internal Symmetry! Proteins can also have internal symmetry, when a single chain contains two or more equivalent structural repeats. The repeats generally will differ in the exact sequence, but have substantially similar structures. Internal symmetry is sometimes styled as pseudosymmetry to reflect that the equivalence between repeats is generally at the level of residues or secondary structure elements rather than precise coordinates, as with quaternary symmetry. GTP cyclohydrolase I [1A8R] D5 Rhinovirus 2 [3DPR] Icosahedral Hemoglobin [4HHB] C2 (but pseudo D2) AmtB Ammonia Channel [1U7G] C3 Symmetry & Function! Both quaternary and internal symmetry are linked to a wide range of protein functions. Ligand Binding! Ligands often bind near the axis of symmetry. Of symmetric domains with ligands, 63% have the ligand within 5 of the axis of symmetry; in 37% it is within 1. Symmetric proteins often bind symmetric ligands, such as metals. DNA binding proteins often utilize symmetry. Many transcription factors are symmetric dimers and recognize palindromic sequences. The TATA binding protein (right) is an internally symmetric monomer which has evolved to recognize a non-palindromic sequence.3 AllostericRegulation! Cooperativity can arise from coordinated movements in symmetric subunits.2 This mechanism holds for both quaternary symmetry (e.g. in hemoglobin) and for internally symmetric proteins.4 Protein Folding! Internal symmetry can smooth the folding landscape and reduce folding time.5 Internal repeats can fold quasi-independently Misfolding of one repeat can trigger degradation of the whole protein, unlike in quaternary symmetric complexes. Experimental Tools! Aid the computational design of large proteins6 Improve search for distance homologs15 CE-Symm Availability! Web server: source.rcsb.org/jfatcatserver/symmetry.jsp Download & Source code: github.com/rcsb/symmetry (LGPL) Screenshot of the CE-Symm interface, showing a two-fold axis of EPSP synthase [1G6S]. TATA Binding Evolution! Internal symmetry can arise from quaternary symmetry by gene duplication or fusion. Thus, in addition to the many functional implications of symmetry, identifying protein symmetry can provide information about the evolutionary history of a protein. Such fission and fusion events often preserve the overall structure and function of the active complex.1 Many proteins with higher order symmetry appear to have undergone several duplication events. For instance, DNA clamps are composed of 12 structural repeats arranged in a ring. Pairs of these repeats form domains with the processivity fold, which can also be found in non-ring conformations in some species.16 Six such domains form a complete ring, but they are fused together into either two (bacteria) or three (eukaryotes, archae, and viruses) chains. Dimeric bacterial clamp: DNA polymerase III beta subunit from E. Coli [1mmi] Trimeric eukaryotic clamp: proliferating cell nuclear antigen from humans [1vym] Trimeric clamp, colored to show the 12 structural repeats Single domain, as viewed from the center of the ring. 12-mer 6-mer Eukaryotic Trimer Bacterial Dimer Benchmark! A benchmark of 1007 proteins from different SCOP superfamilies was created by manually inspecting each for internal symmetry. Structures with less than 4 secondary structure were omitted from the benchmark. 24% of the superfamilies were found to have internal symmetry or large structural repeats. Comparison of CE-Symm and SymD8 performance. Dots represent default thresholds for determining symmetry. Order Number of Superfamilies % symmetric Asymmetric 766 76.10% Rotational 2 166 16.5% 3 10 1.0% 4 2 0.2% 5 3 0.3% 6 9 0.9% 7 9 0.9% 8 21 2.1% Dihedral 2 2 0.2% 4 1 0.1% Helical 2 9 0.9% 3 2 0.2% Non-integral 2 0.2% Superhelical 2 0.2% Translational 3 0.3% AUC=.95, .87 Census! CE-Symm was run on every domain in SCOPe 2.03. The census is available at source.rcsb.org/jfatcatserver/scopResults.jsp. Percentage of SCOP superfamiles with internal symmetry, as detected by CE-Symm SCOP class Number of Superfamilies % symmetric 507 18.5% 354 24.6% / 244 16.8% + 551 14.3% multi-domain 66 4.5% membrane 109 23.8% All classes 1831 18.0% Percentage of internal symmetry detected by CE-Symm in domains annotated with Enzyme Commission numbers. Glyoxalase I from Clostridium acetobutylicum [3HDP] (Nickel; Dimer) Glyoxalase I from E. coli [1F9Z] (Nickel; Dimer) 1,2-dihydroxy-naphthalene dioxygenase from Pseudomonas sp. strain C18 [2EHZ] (Iron; Octamer) Pseudo D2 symmetry of the complex, colored to show the four repeats. Vitamin B12 transporter BtuCD from E. coli, in complex with periplasmic-binding protein BtuF (pink) [PDB:4FI3]. Ferredoxin-like [d2j5aa1] C2 Beta-trefoil [3JUT] C3 Beta-trefoil [3JUT] C3 Beta-trefoil [3JUT] C3