Alternative to Homo-oligomerisation: The Creation of Local Symmetry in Proteins by Internal Amplification

  • Published on
    25-Oct-2016

  • View
    212

  • Download
    0

Transcript

eth2Microbial EvolutionaryGenomics, Institut Pasteur,received in revised form11 September 2009;accepted 15 September 2009Available online19 September 2009ivehto i. Wstructures and properties could arise from genetic amplifications leading tointernal symmetrical repeats. For this, we identified internal structurallyisymmetry.1 Such symmetrical structures often result typically corresponds to symmetrical interactions.5doi:10.1016/j.jmb.2009.09.031Available online at wwwfrom the homomeric association of elements that arenot themselves symmetrical.2 While the reasons forthis pervasive symmetry remain speculative, severalhypotheses have been proposed. First, the symmet-rical state could be the lowest-energy state and thusprovides more stability.3 Second, symmetry pro-vides a simple way of building oligomers with aEscherichia coli proteins show an average oligomer-isation state of 4 and only a minority of proteins isfound in monomeric form. In general, the singlemost frequent complex state of a protein might be adimer, most frequently a homodimer with a one-symmetry rotation axis (6070% of all knowncomplexes).6 Within the remaining complexes,Edited by M. SternbergIntroductionMost proteins are biologicalform of oligomers containdefined number of elements anCorresponding author. Atelier de BUniversit Paris 6, Boite courrier 12075252 Paris cedex 05, France. E-maiannela@abi.snv.jussieu.fr.Abbreviation used: ssb, single-str0022-2836/$ - see front matter 2009 Eprocesses because they show significant sequence similarity. Symmetricalrepeats tend to have a fixed number of copies corresponding to theirrotational symmetry order, that is, two for 180 rotation axis, whereasasymmetrical repeats are in longer proteins and show copy numbervariability. When possible, we confirmed that proteins with symmetricalrepeats folding as an n-mer have homologues lacking the repeat with ahigher oligomerisation number corresponding to the rotation symmetryorder of the repeat. Phylogenetic analyses of these protein families suggestthat typically, but not always, symmetrical repeats arise in one single eventfrom proteins that are homo-oligomers. These results suggest thatoligomerisation and amplification of internal sequences can interplay inevolutionary terms because they result in functional analogues when thelatter exhibit rotational symmetry. 2009 Elsevier Ltd. All rights reserved.Keywords: evolution; oligomerisation; genetics; symmetry; repeatsactive only in theng some sort ofaggregation.1 It has also been proposed that thefolding of symmetrical structures faces fewer kineticbarriers.4 Furthermore, simulations on randomlydocked complexes show that the lowest energyReceived 23 February 2009; 180. These repeats were most likely created by genetic amplificationParis, Francein proteins tend to be symmetrical, we found that about half of the largeinternal repeats are symmetrical, most frequently around a rotation axis ofCNRS, URA2171, F-75015repeats in a nonredundant Protein Data Bank subset. While testing if repeatsAlternative to Homo-oligomLocal Symmetry in ProteinsAnne-Laure Abraham1,2, Jol Po1Atelier de BioInformatique,Universit Pierre et MarieCurieParis 06, F-75005 Paris,FranceThe biologically actoligomerisation. Suchas been proposedallosteric regulationd therefore avoidsioinformatique2, 4 place Jussieu,l address:anded DNA binding.lsevier Ltd. All rights reserverisation: The Creation ofby Internal Amplificationier1 and Eduardo P.C. Rocha1,2state of many proteins requires their prior homo-complexes are typically symmetrical, a feature thatncrease their stability and facilitate the evolution ofe wished to examine the possibility that similarJ. Mol. Biol. (2009) 394, 522534.sciencedirect.comthere are yet 20% of homomeric interactions.Homotetramers are less frequent than homodimers(1520%), while homotrimers, homohexamers andhomo-octamers are even rarer.1,7 A minor fraction ofproteins is found in the form of very long polymersor higher-order oligomers. Hence, associationsamong proteins leading to symmetrical structuresare thought to play key roles in biological systems.d.could inflate or play down the number of repeats. Tocheck for this effect, we identified the presence ofTransient or assortative interactions betweenproteins require the existence of independentmolecules. Hence, a protein or a complex thatparticipates in different complexes is expected tobe coded by independent genes to allow for suchmodularity. But why are monomers not replaced bylonger molecules for the vast number of proteinsthat establish long stable interactions within a singlehomomeric complex? Several evolutionary hypoth-eses have been put forward to answer this question:(1) Nature shows abundant examples where build-ing by accumulation of construction bricks isadaptive.8 (2) Assuming a constant error rate anda faulty protein elimination mechanism, it could bemore efficient to construct multiple small subunitsthan larger ones.9 However, it is unclear if theremoval of mistranslated proteins is quick enough toprevent the establishment of a misfolded complex,knowing that such associations often lead tonegative dominant phenotypes.10 (3) The possibilityof associating and dissociating subunits creates apotential for function enhancement and regulation.(4) It has been argued that oligomerised proteins aremore evolutionarily constrained and thus subject tomore stringent selection.2 However, among func-tionally equivalent objects, evolution often favourselements that can easily evolve to adapt overelements that do not tolerate mutations.11 This isbecause the purge of deleterious mutations involvesthe elimination of individuals carrying them fromthe population. This leads to higher genetic load,and it is therefore deleterious in most circumstances.Close repeats occur spontaneously at high rates inboth eukaryotes and prokaryotes and may result induplication of structural domains.12,13 Amplifica-tions can also arise by exon shuffling in eukaryotes,14where proteins are indeed three times more likely tocontain internal repeats.12 Around 14% of proteinshave been found to have long internal repeats.12These repeats have important evolutionary rolesand are present even in small bacterial genomes.1518Accordingly, recent works have shown relativelyhigh frequencies of domain gain, loss and duplicationin proteins.1921 Yet, there has been little work on adirect consequence of such events: that homo-oligomers could be replaced by symmetrical internalstructures created by intragenic partial duplications.We conducted a study to test this idea. First, weidentified internal repeats within a nonredundantdata bank of protein structures.We then classed themas symmetrical or asymmetrical. It must be empha-sised at this stage that what we call a symmetricalrepeat is a set of structural elements in a given protein(i.e., copies of a repeat) that can be superimposedwith a low resulting RMSD after a given symmetryoperation. Many of these structural repeats cannot bestrictly symmetrical, since in general they do not havestrictly identical sequences. They should thus becalled pseudo-symmetrical. Yet, for simplicity, weput together symmetrical and pseudo-symmetricalSymmetric Repeats and Oligomersrepeats under the same term. We classed repeat-containing proteins according to their structuralfeatures, to separate -rich proteins, very repetitiveinternal repeats in the one structure per familydata set of Astral. We found 3% of proteinscontaining long structural repeats. This is close tothe ratio found with the less than 50% identitydata set in Astral. Among the 172 proteins contain-ing repeats, there are 103 different folds. This clearlyshows that our results are not dominated by one or afew folds being overrepresented in the data set.Since we used very stringent length and similaritycriteria to identify internal repeats, we investigated ifwe would have found more than 172 proteins withmore typical significance thresholds. The defaultvalues of Swelfe, score N250 and relative RMSD b0.5(see Materials and Methods), are estimated toconservatively result in a p value of 103.22,24proteins and the group of other proteins more likelyto show features resembling that of homo-oligomers.We then used the latter set to search for homologousproteins with different multiplicities of the repeat,that is, proteins with just one copy of the repeatedmotif, proteins with two copies of the repeat andproteins with higher number of copies. Naturally,proteins with just one copy of the motif do not have arepeat. We found that proteins with one single copyof the repeat tended to have a doubled state of homo-oligomerisation relative to proteins with two copiesof the repeat. We then analysed the evolution ofthe families of proteins with elements containing andlacking the symmetrical repeat in a phylogeneticframework.ResultsProteins contain long symmetrical repeatsWe searched for structural repeats longer than 50residues among 8657 protein structures of the Astraldata bank. We focused on long repeats becausethese have extremely low likelihoods of arising byrandom assembling of residues. Repeats wereidentified with Swelfe,22 which uses dynamicprogramming to find optimal repeated substruc-tures while weighting matches according to thefrequency of angles in the Protein Data Bank(PDB). This allows downplaying the role of veryfrequent angles involved in archetypical second-ary structural elements such as -helices or -sheets.We found 172 proteins containing long structuralrepeats. They correspond to 2% of the data set (cf.Supplementary Table 1). We included in ouranalysis proteins of the Astral data bank with lessthan 50% sequence identity among themselves.23This avoids making multiple hits among closelyrelated structures. We kept entire proteins, and notonly domains, for our analyses. If some families offolds were overrepresented in our data set, this523Using these parameters, we found internal structuralrepeats in 1900 structures, that is, in 22% of theset. We wish to test a very specific hypothesis andFig. 1. Histogram of rotationangles allowing the superimposi-tion of the two copies of the repeatfor the 172 proteins containingrepeats. Repeats with a 180 angleare very numerous and correspond524 Symmetric Repeats and Oligomersmake a proof of principle. Hence, in the remaininganalyses we preferred the use of the smaller but veryreliable data set of long repeats, even if thisrepresents only a small sample of the overall numberof repeats.We calculated the superimposition angle of thetwo copies of the internal repeats (see Materials andMethods and Supplementary Table 1). The histo-gram of rotation angles shows that rotations of180 vastly outnumber all others (Fig. 1). There are61 repeats with a rotation angle between 170 and180, indicating that 35% of long structural repeatshave a 2-fold symmetry axis (C2). The 2-foldsymmetry axis is pervasive among homodimers,which are the most abundant homo-oligomers.Hence, this large group of repeats is especiallyinteresting to study in the framework of ourhypothesis that sequence amplifications providethe opportunity to generate symmetrical structuresanalogous to that of homo-oligomers. We also foundsmaller peaks at rotation angles of 120, 90, and60, which correspond to 3-, 4- and 6-foldsymmetry (Fig. 1). While the number of proteins islow, the number of repeats with these rotationalangles is higher than expected if distribution wasuniform in the range 0160 (p=0.03, p=0.06 forangles 170, Wilcoxon tests). There are 89 proteinscontaining pairs of repeats (51% of the set) withverified rotation angles of 180, 120, 90 or 60,showing that many of the long internal repeats aresymmetrical under an axis of rotation. This is inagreement with our hypothesis that internal ampli-fications can give rise to symmetrical elementssharing structural resemblanceswith homo-oligomers,especially homodimers.Classifying proteins according to their structureWe clustered proteins with internal repeats intothree groups: -helix-rich proteins, very repetitiveproteins and other proteins (Fig. 2, SupplementaryTable 1, see Materials and Methods for details).Since we suspected that these groups of proteinsunveiled essentially different biological histories, weanalysed them separately. By construction, the firstgroup contains proteins with more than 85% of angles in the range 4065, which correspond to theto a 2-fold symmetry. There are alsosome small peaks at 60, 90 and120 that might correspond to 6, 4and 3-fold rotation symmetry.angles found in -helices. We compared thesimilarity score between the pairs of copies of therepeats in this group with those of the other twogroups (Fig. 3): Scores corresponding to -helix-richrepeats are significantly lower than the othersFig. 2. Number of proteins withrepeats that are symmetrical (2- to6-fold) (inner circle in bold) or not(outer circle) in the three categoriesof repeats.(p=2.4109, Wilcoxon one-sided test). Moreover,this group of proteins often showed negativesequence similarity scores between the two copiesof the repeat. This means that the amino acidsequences of the two copies of the structural repeatare so dissimilar that they cannot be alignedmeaningfully. This suggests that these structuralelements are either very distant homologues whosesequences are saturated with changes or structuralanalogues resulting from convergent evolution.25 -Helices are very abundant because they result froma large variety of protein sequences and are thusparticularly prone to convergent evolution. Thisclass contains diverse functions. For example, thePDB entry 1cii corresponds to a protein of the colicinfamily. This ion-channel-forming protein kills bac-terial cells by co-opting their active transport path-ways and forming voltage-gate ion-conductingchannels across the plasma membrane of thebacteria. The domain made up of two helices (160amino acids long) that are nearly symmetricallySymmetric Repeats and Oligomersrepeated (rotation angle of 162) enables themolecule to span the periplasmic space and contactsimultaneously the outer and plasma membrane.26Other -rich proteins include the botulinum neuro-toxin type B (PDB entry 1s0e, with a symmetry angleof 175),27 which is a very potent toxin to humansand causes paralysis, and a bacterioferritin (1nf4,with a symmetry angle of 176), which is able tostore two iron ions.28The second group of proteins with internal repeatscontains very repetitive proteins. They were identi-fied by visual inspection of structures containing atleast six copies of the repeat. Repetitive proteinscontaining many -helices are in the first group(-helix-rich proteins). Some of these very repetitiveproteins bind other proteins. For example, theGroucho protein (1gxr)29 is a transcriptional co-repressor that interacts with DNA-bound transcrip-tion factors and histones. It contains a seven-bladed-propeller WD40 repeat domain. The -propeller isalso found on a surface layer protein of Methano-sarcina (1l0q).30 Tropomodulin (1io0)31 is a proteinthat blocks the elongation and depolymerisationof actintropomyosin. The domain that binds actincontains a leucine-rich repeat. Ankyrins (1n11)32 linkmembraneproteins to the spectrinactin cytoskeleton.The N-terminal domain binds the membrane andconsists of 24 ANK repeats. Very repetitive proteinshave probably not arisen by one single duplicationevent and, as expected, they rarely show 2-foldrotational symmetry (among the 16 proteins, only 3have a 2-fold symmetry).The remaining 99 proteins were put together in thethird group, the majority of which contains struc-tural repeats whose copies have significant sequencesimilarity. They have thus most likely arisen byinternal amplifications of genetic material and weshall focus on their analysis. Among these 99proteins we found 83 different folds, showing thatthis group contains very little redundancy in thisrespect. Around half (47) of these proteins containrepeats with 2-, 3- or 5-fold rotational symmetry(no 4- or 6-fold was observed in this group). TheseFig. 3. Sequence similaritycorresponding to structural repeats.The difference between the twogroups is significant (Wilcoxon one-sided test, p=2.4 109).525symmetries were visually checked in order toremove spurious symmetrical proteins: Some repeatsare superimposed by a 60, 90 or 120 rotation anglebut do not correspond to 6-, 4- or 3-fold symmetry.We have tested if the repeats of this groupcorrespond to duplications of Pfam domains, be-cause these domains often correspond to functionalunits. Most repeats correspond approximately to theduplication of one Pfam domain; that is, each copy ofthe repeat overlaps along at least 70% of its lengthwith one single domain (same or very similar name)(61 out of 99; Supplementary Table 2). In some cases,one single Pfam domain contains two copies of arepeat (20), and in others, a repeat copy contains twoPfam domains (2). The remaining 16 cases includeproteins without domains or proteins with Pfamdomains that do not overlap the repeats. There isthus a significant overlap between the structuralrepeats in this group and those of the Pfam domains,both among symmetrical and nonsymmetricalrepeats. However, about a third of the repeatscould not have been found by the analysis of Pfamdomain duplications. Hence, our approach provides526a different perspective on domain repeats from theones previously published on Pfam domains. Con-trary to the latter, it also allows the inclusion ofinformation about the structural relative positioningof the different copies of a repeat within the protein.No specific functional bias in proteins withsymmetrical repeatsIs there any function overrepresented amongproteins with repeats? To answer this question, weextracted the GO terms from Gene Ontology33 ofproteins with and without repeats. Using a previ-ously published method, we then identified over-and underrepresented terms.34 We only found oneoverrepresented term: Calcium-dependent phospho-lipid binding (Go term 5544) is overrepresentedamong proteins containing nonsymmetrical repeats.No term is over- or underrepresented in proteinswith symmetrical repeats (at a pb0.05). This suggeststhat symmetrical repeats are not strongly associatedwith particular cellular components, biological pro-cesses or molecular functions. Naturally, the lownumber of proteins precludes the statistically mean-ingful identification of weak associations.Many enzymes in the PDB are composed ofseveral monomers: There are 7242 enzymes withknown dimeric, tetrameric, hexameric or octamericquaternary structure, according to the PQS (ProteinQuaternary Structure) data bank, among the 22,912enzymes of the PDB (about 31%). Sometimes theassociation or dissociation of these monomers canregulate enzyme activity. In this case, a duplicationoverruling the necessity of oligomerisation andcreating an enzyme in one single peptide chaincould be selectively deleterious. We compared ifenzymes were less frequent among proteins contain-ing repeats (see Supplementary Table 3) or, withinthese, among proteins containing symmetricalrepeats. EC numbers of enzymes were taken fromthe Enzyme Structures Database of PDBsum. Weassumed that structures with EC numbers wereenzymes and that the remaining were not enzymes.Enzymes are underrepresented among proteinscontaining repeats (pb0.005, 2 test). However thisdifference is due to proteins in the group of highlyrepetitive proteins (p=0.02) and to some extent tothe group of -helix-rich proteins (p=0.06). Wefound no significant bias among the third group ofproteins (p=0.2). Within the third group we couldnot find any significant difference in terms of theproportion of enzymes found in proteins withsymmetrical and nonsymmetrical repeats. Hence,there is no evidence for an over- or underrepresen-tation of symmetrical repeats among enzymes,possibly because many proteins with symmetricalrepeats are also found in the form of complexes, asshown below.http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/data/pdb_ECHomologous proteins with different repeatcopy numberIf proteins with internal symmetrical repeats canwork as analogues of homo-oligomers, one wouldexpect to find variants of these proteins withdifferent number of copies of the repeat and makingcomplexes with accordingly different number ofunits. We started by testing the first hypothesis, thatis, that homologues of these proteins exist withdifferent number of copies of the repeat. For the twogroups of highly repetitive proteins and -helix-richrepeat proteins, the existence of different repeatmultiplicity is largely expected by the repetitivenature of the former and by the high frequency of -helices in the latter. We thus conservatively did theanalysis on the third group of 99 proteins. Few of thesequences of naturally existing proteins are presentin the sequence data banks and even fewer have aknown structure. Since most of the structuralrepeats of this group show significant sequencesimilarity, we could search for homologues both atthe structural level, in the PDB Cluster50 set, and atthe sequence level, in the TrEMBL data bank. Theuse of the latter enlarged very significantly the dataset size and alleviated the problem associated withthe very large number of PDB structures that onlycover partially the protein sequence. We searchedfor homologues that aligned well along all theprotein, except in the repeat regions. We confirmedthat many proteins have homologues with fewercopies of the repeat both among proteins withsymmetrical and nonsymmetrical repeats (Table 1).This shows that functional forms of proteins existwith varying number of copies. This is consistentwith our hypothesis that proteins with symmetricalrepeats might be functional analogues of homo-oligomers. To better establish this point, we inves-tigated the differences between proteins withsymmetrical and nonsymmetrical repeats in termsof their functions and their quaternary structure inrelation to that of proteins with fewer copies of therepeat.Differences between symmetrical andnonsymmetrical proteinsBy looking for proteins with fewer or more copies,we found some very significant differences betweenproteins with symmetrical and nonsymmetricalrepeats. We observed very few cases of furtheramplifications among repeats with rotational sym-metry (4 out of 47) and a significantly larger numberamong the others (14 out of 52, pb0.02, 2 test). Thisis expected from our hypothesis: such furtheramplifications in proteins with rotational symmet-rical repeats are likely to result in disruption of thesymmetry and thus be counterselected. On the otherhand, proteins with nonsymmetrical repeats areexpected to be less constrained in this respect.Symmetric Repeats and OligomersSome proteins are found with varying numbers ofrepeats' copies (Table 1): 14 proteins have homo-logues with at least two other multiplicities (oneighahoer oof cofrs oSymmetric Repeats and Oligomerscopy and/or more than two copies). Among these,only 2 correspond to proteins with symmetricalrepeats, which is much less than expected by chance(p=0.0087, Fisher exact test). About a fourth (12 outof 52) of the proteins with nonsymmetrical repeatshave at least two other repeat multiplicities.We then checked if proteins with symmetrical andnonsymmetrical repeats were of different lengths.We did not find any significant difference by lookingat the length of the structures (SupplementaryFig. 1). However many PDB structures are partialand do not correspond to the entire protein. Wetherefore compared the length of the protein codingsequences in our TrEMBL-based sequence data bankand found that coding sequences of proteins withsymmetrical repeats are significantly shorter thanthat of the others (see Supplementary Fig. 1,Wilcoxon one sided test, p=2.5105). We thencalculated the number of copies of the repeats incoding sequences and found that it was strictlysuperior to two for 10 proteins with nonsymmetricalrepeats and for only 2 proteins with symmetricalrepeats. In short, proteins with nonsymmetricalrepeats tend to be longer and have a highervariability in repeat copy-number.Proteins with symmetrical repeats havehomologues with fewer copies and higheroligomerisation stateTable 1. Number of homologous proteins with lower or hPDB subsetQuery totalDifferentmultiplicitySymmetrical proteins 47 26Nonsymmetrical proteins 52 31Some proteins have both homologues with lower multiplicity anda At least one homologous protein exists with a different numbb At least one homologous protein exists with a lower numberc At least one homologous protein exists with a higher numberd At least two homologous proteins exist with different numbeWe then tested if proteins with one single copy ofthe repeat had an oligomerisation state double tothat of homologous proteins with two symmetricalcopies of the repeat. We searched the PQS databaseand the primary literature for the quaternarystructure of PDB structures with symmetricalrepeats having a counterpart with one single copyof the repeat. We found 11 pairs of homologousproteins with known quaternary structure in whichone protein contains one copy and the other containstwo copies of the repeat (Supplementary Table 4).Six pairs accurately matched our predictions (seeFig. 4 for two examples). Within the five remainingpairs, three are made up of partial structures andtherefore they cannot be used as clear-cut examples. ftp://ftp.ebi.ac.uk/pub/databases/msd/pqs/In one of the two remaining pairs both proteins aremonomeric. This is the only example in cleardisagreement with our prediction. The protein withonly one copy of the repeat (2bv2) is a betagamma-crystallin of Ciona intestinalis, which split from thevertebrate lineage before the evolution of the lens.352bv2 is very similar to domains of vertebratebetagamma-crystallin, but it is composed of onlyone monomeric domain. The vertebrate betagamma-crystallins may have evolved from a single-domainprotein. In the last pair, the two repeats cover two-thirds of the overall structure (1w25)36 and corre-spond to a response regulator receiver domain thatis a well-conserved domain of the two-componentsignal transduction system. Probably these pairs ofhomologues have different functions and cannot beseen as functional homologues.Among the six pairs of proteins consistent with ourpredictions (see Figs. 4 and 5 and SupplementaryFig. 2), one protein (1gtt) with two copies of therepeat is a monomer, whereas the protein without arepeat is a homodimer. In two other cases (1aln and1ddz), the former is a homodimer and the latter is ahomotetramer. The last three families of proteins(2gvh, 1h9m and 1knc) are homotrimers when theyhave repeats, and the corresponding homologueswithout repeats are hexamers. These results are inexact agreement with our hypothesis that symmet-rical amplifications play a role in the evolution of theoligomerisation state of proteins.er multiplicity in sequence database and in the Cluster50Homologous proteinsLowermultiplicitybHighermultiplicitycHigh numberof multiplicityd23 4 228 14 12mologues with higher multiplicity.f copies of the repeat.opies of the repeat.copies of the repeat.f copies of the repeat.527Phylogenetic analysis of protein familiesWe analysed the evolution of the six proteinfamilies for which we found homologues with andwithout the symmetrical repeat, that is, proteinswith two copies or one copy of the structural repeat.For each pair we searched for the presence ofhomologues with one or two copies of the repeat in849 genomes (57 archaea, 746 bacteria and 46eukaryota) (see Materials and Methods). We onlyused complete genomes to ensure that lack ofhomologues does not result from incomplete se-quencing but only from absence from the genome.For two protein families (families of 1ddz and 1h9m)we could only find homologues for one type ofproteins, either with or without the repeat (Fig. 6a).This means that for some of the structures, we could528not find a single occurrence of a homologous gene incompletely sequenced genomes. Accordingly, theproteins lacking homologues in complete genomesFig. 4. Example of two proteins with a 2-fold symmetry anof 1ddz and 1i6o. (b) 1i6o, -carbonic anhydrase from E. coli (-carbonic anhydrase fromP. purpureum (two chains, in yellowmolybdatetungstate binding protein from S. ovata (six chain(f) 1h9m, molybdate-binding protein from A. vinelandii (threSymmetric Repeats and Oligomershave a PDB representative from unsequencedgenomes, Porphyridium purpureum (1ddz) and Azo-tobacter vinelandii (no repeat homologues of 1h9m).d their homologues lacking the repeat. (a) superimpositionfour chains, in blue, slate, forest and lime green). (c) 1ddz,and orange). (d) Superimposition of 1h9mand 1fr3. (e) 1fr3,s, in blue, slate, forest, lime green, cyan and green-cyan).e chains, in yellow, yellow-orange and orange).general, the proteins without the repeat arose firstSymmetric Repeats and OligomersIn three other cases (1gtt, 1aln, and 2gvh), theproteins with one single copy of the repeat are muchmore frequent and have a broader phylogeneticspan than the proteins with two symmetrical copiesof the repeat. This suggests that the formercorrespond to the ancestral form. In two of thesecases, proteins lacking the repeat exist in archaea,bacteria and eukaryotic genomes, suggesting a veryancient origin.Homologues of the PDB entry 1knc with orwithout the repeat exist in very diverse bacterialgenomes. To analyse this case in more detail, wealigned the proteins of this family, removed theunaligned parts, and produced a phylogenetic tree(see Materials and Methods). We could not root thetree with a paralog, because we could not find onesufficiently similar. Therefore, we used a midpointroot, which should effectively correspond to the rootif proteins evolved at similar rates in their homol-ogous regions. This tree shows three clades withhigh support, suggesting that the form of the proteinwithout the repeat is ancestral and that repeats aroseFig. 5. Quaternary structure of pairs of homologueswith two and one copy of 180 symmetrical repeats.and that repeats tend to be monophyletic, that is,belong to one single lineage.Yet, we found exceptions for both trends. Thegroup of 1h9m is very surprising, as it shows thatthe protein with the repeat is widespread ingenomes, whereas we could not find the gene for aprotein without the repeat in any complete genome.This is most parsimoniously explained by claimingancestrality of the protein with the repeat. Yet, sincethe two copies of the repeat share sequencesimilarity, they have probably arisen from anancient duplication, which requires that a proteinwithout the repeat preceded the extant protein withthe repeat. Two scenarios could explain this appar-ently paradoxical result. (1) The ancestral proteinwithout repeats disappeared after duplication, and arecent deletion created a new one-copy repeatprotein that is now present in few closely relatedorganisms. (2) The protein without repeatsremained in some, but few, genomes. Since bothPDB entries without repeats are from closely relatedFirmicutes (Clostridium pasteurianum and Sporomusaovata), the first hypothesis seems more likely. Theother unexpected case concerns 2gvh, a familywhere the protein with the repeat is rare amongsequenced genomes, and phylogenetic evidencestrongly suggests it has appeared three timesindependently, once within archaea (present inSulfolobus), once within -Proteobacteria (presentin Maricaulis maris), and once in Actinobacteria(present in Arthrobacter). Note that this cannot beexplained by lateral transfer, because the phyloge-netic trees in Fig. 6b are for the protein itself, not forthe species phylogeny. Thus, the proteins from thesethree clades are effectively very different and clusterwith proteins lacking the repeat. Hence, while ourresults suggest that extant repeats generally arosefrom ancestral proteins lacking repeats, this may notalways be the case. Homologous proteins withrepeats may have multiple origins and they mayhave given rise to proteins that lost one of the copiesof the repeat.DiscussionWe have shown that long internal structuralrepeats often show symmetry along a rotation axis.We have also shown that when proteins withsymmetrical repeats have homologues lackingrepeats and both have a known quaternary struc-later and only once in evolution (Fig. 6b). To test thefrequency with which other repeats arose inevolutionary history, we repeated the phylogeneticanalysis for the other three groups of proteins. Onecase, 1gtt, corresponds to a repeat that arose onceand is present exclusively in enterobacteria. Homo-logues of the entry 1aln also show a monophyleticorigin for the repeat. These results suggest that, in529ture, the latter have a correspondingly higher homo-oligomerisation state. This is in agreement with ourhypothesis that internal duplications may lead to thes w) Cas insede nbeFig. 6. Evolutionary analysis of the six families of proteinPresence of homologues in major clades of the tree of life. (bhomologues of repeat-containing and repeat-lacking proteinparentheses in the header indicate the number of proteins uto the number of sequences in the clade. The numbers in thand the numbers in the leaves represent (for 2gvh) the numrepeat.530replacement of an n homo-oligomer by an n/2homo-oligomer preserving the overall structure. Wealso show that among the two groups, the proteinwithout repeats tends to be the ancestral proteins, inagreement with the idea that such repeats arise frominternal genetic amplifications from preexistingproteins. Furthermore, we found that in mostcases, the homo-oligomerisation state of proteinswith symmetrical repeats is higher than 1, that is,such proteins still form oligomers, and typicallyhomo-oligomers, but with fewer elements. Thismeans that some purposed advantages of homo-oligomers, such as facilitation of allostery, may stillapply. Overall, these results suggest that suchproteins arise from preexisting proteins that alreadyfolded as homo-oligomers. The alternative would bea genetic amplification leading to a symmetricalrepeat within a functional protein concomitantlygenerating de novo a structure resembling a homo-oligomer. This seems less likely, especially in thelight of our phylogenetic analysis. We thus favour ascenario where symmetrical repeats arise in proteinsthat have previously evolved toward the capacity ofestablishing homo-oligomers. Such amplificationsdecrease the oligomerisation number, but do notrequire the creation of interfaces de novo, since ananalogous protein complex preexisted thanks to theassembly of a larger number of monomers. Inter-faces between the n/2 homomers remain the sameas their corresponding parts in the n homomers. Ifsuch amplifications lead to a protein capable offolding correctly and performing the same function,ith symmetrical repeats and homologues lacking them. (a)rtoons of the phylogenetic trees of the four families havinggenomes (see Materials andMethods). Numbers betweento make the tree. The height of the triangles is proportionalodes are aLRT statistics and bootstrap values (out of 100)r of proteins in the clade with or without the symmetricalSymmetric Repeats and Oligomersthen the fixation of this change in populations mayoccur by purely neutral mechanisms. Naturally,such fixation will be much more likely, and quicker,if it leads to increased fitness. Since geneticamplifications occur at high rates in genomes, theprocess may occur several times in parallel. Thismay explain the multiple emergence of symmetricalrepeats in the family of acyl-CoA hydrolase (2gvh),as shown in Fig. 6. Yet, since only one out of sixfamilies showed occurrence of multiple independentgenetic amplifications leading to symmetricalrepeats, it is possible that this constitutes a relativelyrare event, or that most such events are deleteriousand are removed by natural selection.Following these results, several questions comeimmediately to mind: Why are these repeatssymmetrical? What are the consequences of theirappearance? Are there advantages in diminishingoligomerisation number at the cost of increasingprotein size?As a reviewer pointed out to us, there is probablyno selective advantage in symmetry. Yet, aroundhalf of the large internal repeats are symmetrical.This seems much too large to be due to chance. Onecould explain the existence of repeat symmetry in atleast two ways: because symmetry is associatedwith selective features or because of evolutionarycontingency. Symmetry could be intrinsically ad-vantageous because of lower-energy interactions forsymmetrical complexes versus nonsymmetricalones.5 The second possibility results directly fromthe model described in the previous paragraph:domains. Hence, in this case, the internal amplifica-Repeats arising within homo-oligomers will notdisrupt protein function if, among other constrains,they have little impact on protein structure. Ifamplifications are large, they can be facilitated ifthey lead to a protein that has a structure close tothat of the former homo-oligomer. The largeoverrepresentation of repeats with C2 symmetrysuggests that in general, this process involves onevery large amplification, leading to a proteinresembling the association of two monomers of thehomologous protein lacking the repeat. Such ampli-fication will then often be symmetrical just becausehomo-oligomers tend to be symmetrical. If historicalcontingency was the only reason leading to sym-metry, one would expect that such symmetry couldevolve and eventually disappear. Yet, that mighttake a long time, since protein structures evolveslowly. Such a process has been proposed for theevolution of single-stranded DNA binding (ssb)proteins. While in most bacteria ssb protein has oneOB-fold and folds as a tetramer, the ssb protein ofDeinococcus radiodurans is a dimer and has a nearlysymmetrical large repeat doubling its OB-folds.37This has been proposed to contribute to its extraor-dinary radio-resistance. Thus, in this case, aninternal amplification led to a nearly symmetricalrepeat that folds into a complex with half themonomers. The small deviation from a 180 rotationaxis might have resulted from the amplificationitself or from the relaxed structural evolution fromsymmetry allowed by the presence of the two OB-folds in the same peptide.We analysed the literature to understand theconsequences of the creation of symmetrical repeatsin the six protein families having homologues withand without repeats and for which quaternarystructure is known. Some studies on these proteinshave previously noted the existence of symmetricalrepeats and found or proposed functional conse-quences for them. For example, it has beensuggested that cytidine deaminase (1aln) arose byduplication.38 The ensuing evolution led to the lossof its zinc-coordinating residues and thereby itscatalytic activity at the C-terminal domain. Indeed,the tetrameric protein has four active sites (one permonomer) compared to two active sites for thedimeric protein. Interestingly, while we found theprotein containing the symmetrical repeat in 200genomes and the protein lacking the repeat in 118genomes, only 2 genomes contain both (Pseudoalter-omonas haloplanktis and Serratia proteamaculans). Thissuggests that the evolution of the proteins with andwithout the repeat has not obliterated their func-tional redundancy.Is there a specific advantage for a symmetricalrepeat over an equivalent homo-oligomer? Internalsymmetrical repeats have an element of symmetrythat leaves the rest of the protein free to evolve awayfrom the constraints of perfect symmetry. This slightasymmetry can be structural as in the abovemen-Symmetric Repeats and Oligomerstioned case of the D. radiodurans ssb protein, but itcan also be functional. For example, the internalamplification in HpcE (an isomerase, 1gtt) allowedtion led to the creation of a new catalytic capabilitythat allows the same protein to accomplish twoconsecutive steps in a pathway. There are otherroutes to create asymmetry in homo-oligomers, butthey involve full gene duplication followed bydivergence and adaptation.40 The relative frequencyof these different scenarios and how likely they areof producing a functionally relevant proteins isunclear. One may speculate on other advantages oflarger proteins with symmetrical repeats oversmaller oligomerising proteins: (1) Oligomers areunstable at low concentrations. (2) Misfolded homo-oligomers may result in protein aggregation,41which originates more than 30 different patholog-ical states in humans. Accordingly, homo-oligomersendure stronger selection against the propensity toaggregate than the other proteins.42 (4) Homo-oligomers are wholly symmetrical structures,2which could render them less robust to mutations.Inversely, homo-oligomers probably have severaladvantages over proteins with internal symmetricalrepeats, as they allow modularity and they may beeasier to create if indeed symmetrical repeats arisefrom proteins that are already homo-oligomers.Further work will be necessary to ascertain thesespeculations.Homo-oligomerisation and internal repeats mayalso have complementary roles. This could explainwhy we found homo-oligomers with monomerscontaining internal symmetrical repeats. While thecurrent structural PDB has been shown to stronglyunderrepresent proteins containing repeats,43 theincrease of available structural data will allow testingof the different possible evolutionary scenarios,leading to the selection of internal symmetricalamplifications or homo-oligomers. It will also allowquantification of the predictive potential of ourobservations. If a protein has a symmetrical repeat,it will be important to know the probability that ahomologue lacking the repeat folds symmetrically asa homo-oligomer. Conversely, our work shows thathomologues of homo-oligomers with large symmet-rical repeats will tend to have lower oligomerisationnumbers.Materials and MethodsIdentification of long repeatsWe used Swelfe22 to find internal repeats in proteinstructures of the Astral compendium.23 Parameters werethe evolution of a novel active site for a comple-mentary reaction.39 The protein consists of two verysimilar domains, one catalyzing a decarboxylationand the other an isomerization; these two reactionsare two consecutive steps in the breaking down ofhomoprotocatechuate, an aromatic compound. Theactive site is found at the same place in the two531adapted to find long repeats (length of repeat N50 angles, score N200, gap open penalty=70, gap extensionpenalty=30, relative RMSD b0.5). We kept only thesequences.ments were then inspected visually with SEAVIEW51 tothis manuscript. We thank Etienne Loire for theSymmetric Repeats and OligomersThree-dimensional structuresWe searched for similar elements of the structural repeatin the nonredundant data bank Cluster50 from the PDB46(less than 50% sequence identity) with Swelfe (with theoption for comparing two sequences or two structures).We did not use Astral in this analysis because it filtersredundant domains and thus automatically eliminates theproteins with varying numbers of repeats. Proteins withelements matching the structural repeats were thenselected according to their length: it must differ from thequery protein from at least 0.5 times the length of therepeat. Prospective proteins were then superimposed withthe initial proteins with Pymol47 to check that they containdifferent numbers of the repeat.Finding full amino acid sequences corresponding toPDB entriesThe amino acid sequence corresponding to the structuralrepeat was compared with all proteins in UniProtKB/TrEMBL48 (Supplementary Fig. 3). A first rapid compar-ison was made with BlastP49 (with default parameters) toretrieve candidate proteins from the data bank. Then weused Swelfe to find proteins containing at least one copyof the repeat. The length of the matched copies must be atleast three-fourths that of the reference copy. After that, anend-gap free global alignment (using the BLOSUM62matrix, gap opening 1.2, gap extension 0.8) was achievedbetween complete sequences with repeats and proteinshaving at least one copy of the repeat. We kept theproteins leading to alignments with N40% similarity. Forone-copy proteins, we removed proteins longer than theoriginal protein minus half of the length of the repeat. Wethen checked that positions corresponding to one copy ofthe repeat were aligned (less than 20% of gaps) and thatpositions corresponding to the other copy matched withgaps (more than 80% of gaps).To find homologous proteins with higher number ofhighest-scoring repeat in each structure. Out of the 188proteins containing such long repeats, we suppressed allwith more than 40% sequence similarity with otherproteins. This left 172 proteins for further analysis. UsingHMMER2, we identified Pfam domains in the proteins.44Symmetrical long repeatsTo assess the symmetry of the repeats, we computedthe angle allowing best superimposition of the two copiesof the repeat in the protein structure with the Zuker andSomorjai program.45 Repeats obtained by a rotation of anangle higher than 170 or lower than 170 correspond toa 2-fold symmetry. We considered an interval of 5/+5around 60, 90, 120 to compute the number ofproteins with repeats showing 6-, 4- and 3-fold symmetry,respectively.Looking for one-copy or n-copy proteinsWe searched for proteins with varying numbers ofrepeat copies both in structures and in amino acid532copies of the repeat, we kept proteins longer than the sumof the length of query protein and 0.5 times the repeatlength. Then, using Swelfe, we checked that these proteinsprogram to compute biases in GO terms.Supplementary DataSupplementary data associated with this articlecan be found, in the online version, at doi:10.1016/j.jmb.2009.09.031References1. Goodsell, D. S. & Olson, A. J. (2000). Structuralsymmetry and protein function. Annu. Rev. Biophys.Biomol. Struct. 29, 105153.2. Monod, J., Wyman, J. & Changeux, J. P. (1965). Onthe nature of allosteric transitions: a plausible model.J. Mol. Biol. 12, 88118.3. Blundell, T. L. & Srinivasan, N. (1996). Symmetry,stability, and dynamics of multidomain and multi-component protein systems. Proc. Natl Acad. Sci. USA,93, 1424314248.4. Wolynes, P. G. (1996). Symmetry and the energylandscapes of biomolecules. Proc. Natl Acad. Sci. USA,93, 1424914255.remove regions that were poorly aligned or exclusive totwo-copy repeats. The multiple alignments were thenused as input to PHYML52 to make the phylogenetic treesusing maximum likelihood with the WAG matrix and agamma correction for variable evolutionary rates. Robust-ness of the branches was tested with the aLRT test ofPHYML and with nonparametric bootstrap usingPuzzle.53 Trees were drawn using FigTree.AcknowledgementsA.-L.A. acknowledges support from the ConseilRgional d'Ile de France. This project was partlysupported by the PROTEUSproject fromANR-06-CIS(Calcul Intensif et Simulation) and ACI EVOLREP.We thank three anonymous reviewers for comments,criticisms and suggestions that significantly improvedcontain more copies of the repeat than the original protein.The length of the matched copies must be at least threequarters of that of the query repeat. When homologousproteins with the same number of copies of the repeatwere found, only one was kept.Phylogenetic analysisHomologous sequences with one or two copies of therepeat were searched with BLAST49 against 849 completeproteomes (57 archaea, 746 bacteria and 46 eukaryota).Hits with e-value b1e-10 were conserved. The length ofhomologous sequences differed by less than 20% from thatof the query sequence. The sets of homologues with one-and two-copy repeats were put together and aligned withMUSCLE with default parameters.50 The multiple align-http://tree.bio.ed.ac.uk/software/figtree/5. Andre, I., Strauss, C. E., Kaplan, D. B., Bradley, P. &Baker, D. (2008). Emergence of symmetry in homo-oligomeric biological assemblies. Proc. Natl Acad. Sci.USA, 105, 1614816152.6. Levy, E. D., Boeri Erba, E., Robinson, C. V. &Teichmann, S. A. (2008). Assembly reflects evolutionof protein complexes. Nature, 453, 12621265.7. Levy, E. D., Pereira-Leal, J. B., Chothia, C. &Teichmann, S. A. (2006). 3D complex: a structuralclassification of protein complexes. PLoS Comput. Biol.2, e155.8. Pereira-Leal, J. B., Levy, E.D.&Teichmann, S. A. (2006).The origins and evolution of functional modules:lessons from protein complexes. Philos. Trans. R. Soc.London, Ser. B, 361, 507517.9. Goodsell, D. S. & Olson, A. J. (1993). Soluble proteins:size, shape and function. Trends Biochem. Sci. 18,6568.10. Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F.,Formigli, L., Zurdo, J. et al. (2002). Inherent toxicity ofaggregates implies a common mechanism for proteinmisfolding diseases. Nature, 416, 507511.11. Kirschner, M. & Gerhart, J. (1998). Evolvability. Proc.Natl. Acad. Sci. USA, 95, 84208427.12. Marcotte, E. M., Pellegrini, M., Yeates, T. O. &Eisenberg, D. (1999). A census of protein repeats.J. Mol. Biol. 293, 151160.13. Rocha, E. P. (2003). An appraisal of the potential forillegitimate recombination in bacterial genomes andits consequences: from duplications to genome reduc-tion. Genome Res. 13, 11231132.14. Bjorklund, A. K., Ekman, D. & Elofsson, A. (2006).Expansion of protein domain repeats. PLoS Comput.Biol. 2, e114.15. Treangen, T. J., Abraham, A. L., Touchon, M. & Rocha,E. P. (2009). Genesis, effects and fates of repeats inprokaryotic genomes. FEMS Microbiol. Rev. 33,539571.16. Lavorgna, G., Patthy, L. & Boncinelli, E. (2001). Wereprotein internal repeats formed by bricolage? TrendsGenet. 17, 120123.17. Apic, G., Gough, J. & Teichmann, S. A. (2001). Aninsight into domain combinations. Bioinformatics, 17(Suppl. 1), S83S89.18. Andrade, M. A., Perez-Iratxeta, C. & Ponting, C. P.(2001). Protein repeats: structures, functions, andevolution. J. Struct. Biol. 134, 117131.19. Pereira-Leal, J. B. & Teichmann, S. A. (2005). Novelspecificities emerge by stepwise duplication of func-tional modules. Genome Res. 15, 552559.20. Bjorklund, A. K., Ekman, D., Light, S., Frey-Skott, J. &Elofsson, A. (2005). Domain rearrangements in proteinevolution. J. Mol. Biol. 353, 911923.21. Pasek, S., Risler, J. L. & Brezellec, P. (2006). Genefusion/fission is a major contributor to evolution ofmulti-domain bacterial proteins. Bioinformatics, 22,14181423.22. Abraham, A. L., Rocha, E. P. & Pothier, J. (2008).Swelfe: a detector of internal repeats in sequences andstructures. Bioinformatics, 24, 15361537.23. Chandonia, J. M., Hon, G., Walker, N. S., Lo Conte, L.,Koehl, P., Levitt, M. & Brenner, S. E. (2004). TheASTRAL Compendium in 2004. Nucleic Acids Res. 32,D189D192.24. Betancourt, M. R. & Skolnick, J. (2001). Universalsimilarity measure for comparing protein structures.Biopolymers, 59, 305309.Symmetric Repeats and Oligomers25. Cheng, H., Kim, B. H. & Grishin, N. V. (2008).Discrimination between distant homologs and struc-tural analogs: lessons from manually constructed,reliable data sets. J. Mol. Biol. 377, 12651278.26. Wiener, M., Freymann, D., Ghosh, P. & Stroud, R. M.(1997). Crystal structure of colicin Ia. Nature, 385,461464.27. Eswaramoorthy, S., Kumaran, D., Keller, J. & Swami-nathan, S. (2004). Role of metals in the biologicalactivity of Clostridium botulinum neurotoxins.Biochemistry, 43, 22092216.28. Macedo, S., Romao, C. V., Mitchell, E., Matias, P. M.,Liu, M. Y., Xavier, A. V. et al. (2003). The nature of thedi-iron site in the bacterioferritin from Desulfovibriodesulfuricans. Nat. Struct. Biol. 10, 285290.29. Pickles, L. M., Roe, S. M., Hemingway, E. J., Stifani, S.& Pearl, L. H. (2002). Crystal structure of theC-terminal WD40 repeat domain of the humanGroucho/TLE1 transcriptional corepressor. Structure,10, 751761.30. Jing, H., Takagi, J., Liu, J. H., Lindgren, S., Zhang,R. G., Joachimiak, A. et al. (2002). Archaeal surfacelayer proteins contain beta propeller, PKD, and betahelix domains and are related to metazoan cellsurface proteins. Structure, 10, 14531464.31. Krieger, I., Kostyukova, A., Yamashita, A., Nitanai, Y.& Maeda, Y. (2002). Crystal structure of the C-terminal half of tropomodulin and structural basis ofactin filament pointed-end capping. Biophys. J. 83,27162725.32. Michaely, P., Tomchick, D. R., Machius, M. &Anderson, R. G. (2002). Crystal structure of a 12ANK repeat stack from human ankyrinR. EMBO J. 21,63876396.33. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D.,Butler, H., Cherry, J. M. et al. (2000). Gene ontology:tool for the unification of biology. The Gene OntologyConsortium. Nat. Genet. 25, 2529.34. Loire, E., Praz, F., Higuet, D., Netter, P. & Achaz, G.(2009). Hypermutability of genes in Homo sapiens dueto the hosting of long mono-SSR. Mol. Biol. Evol. 26,111121.35. Shimeld, S. M., Purkiss, A. G., Dirks, R. P., Bateman,O. A., Slingsby, C. & Lubsen, N. H. (2005).Urochordate betagamma-crystallin and the evolu-tionary origin of the vertebrate eye lens. Curr. Biol.15, 16841689.36. Chan, C., Paul, R., Samoray, D., Amiot, N. C., Giese,B., Jenal, U. & Schirmer, T. (2004). Structural basis ofactivity and allosteric control of diguanylate cyclase.Proc. Natl Acad. Sci. USA, 101, 1708417089.37. Bernstein, D. A., Eggington, J. M., Killoran, M. P.,Misic, A. M., Cox, M. M. & Keck, J. L. (2004). Crystalstructure of the Deinococcus radiodurans single-strand-ed DNA-binding protein suggests a mechanism forcoping with DNA damage. Proc. Natl Acad. Sci. USA,101, 85758580.38. Xie, K., Sowden, M. P., Dance, G. S., Torelli, A. T.,Smith, H. C. &Wedekind, J. E. (2004). The structure ofa yeast RNA-editing deaminase provides insight intothe fold and function of activation-induced deaminaseand APOBEC-1. Proc. Natl Acad. Sci. USA, 101,81148119.39. Tame, J. R., Namba, K., Dodson, E. J. & Roper, D. I.(2002). The crystal structure of HpcE, a bifunctionaldecarboxylase/isomerase with a multifunctional fold.Biochemistry, 41, 29822989.40. Pereira-Leal, J. B., Levy, E. D., Kamp, C. & Teichmann,S. A. (2007). Evolution of protein complexes by533duplication of homomeric interactions. Genome Biol.8, R51.41. Ding, F., Dokholyan, N. V., Buldyrev, S. V., Stanley,H. E. & Shakhnovich, E. I. (2002). Molecular dynamicssimulation of the SH3 domain aggregation suggests ageneric amyloidogenesis mechanism. J. Mol. Biol. 324,851857.42. Chen, Y. & Dokholyan, N. V. (2008). Natural selectionagainst protein aggregation on self-interacting andessential proteins in yeast, fly, and worm. Mol. Biol.Evol. 25, 15301533.43. Peng, K., Obradovic, Z. & Vucetic, S. (2004). Exploringbias in the Protein Data Bank using contrast classifiers.Pac. Symp. Biocomput., 435446.44. Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut,S. J., Hotz, H. R. et al. (2008). The Pfam protein familiesdatabase. Nucleic Acids Res. 36, D281288.45. Zuker, M. & Somorjai, R. L. (1989). The alignment ofprotein structures in three dimensions. Bull. Math.Biol. 51, 5578.46. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G.,Bhat, T. N., Weissig, H. et al. (2000). The Protein DataBank. Nucleic Acids Res. 28, 235242.47. Ordog, R. (2008). PyDeT, a PyMOL plug-in forvisualizing geometric concepts around proteins.Bioinformation, 2, 346347.48. Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A.,Barker, W. C., Boeckmann, B. et al. (2006). TheUniversal Protein Resource (UniProt): an expandinguniverse of protein information. Nucleic Acids Res. 34,D187D191.49. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang,J., Zhang, Z., Miller, W. & Lipman, D. J. (1997).Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids. Res.25, 33893402.50. Edgar, R. C. (2004). MUSCLE: a multiple sequencealignment method with reduced time and spacecomplexity. BMC Bioinformatics, 5, 113.51. Galtier, N., Gouy, M. & Gautier, C. (1996). SEAVIEWand PHYLO_WIN: two graphic tools for sequencealignment and molecular phylogeny. Comput. Appl.Biosci. 12, 543548.52. Guindon, S. & Gascuel, O. (2003). A simple, fast, andaccurate algorithm to estimate large phylogenies bymaximum likelihood. Syst. Biol. 52, 696704.53. Schmidt, H. A., Strimmer, K., Vingron, M. & vonHaeseler, A. (2002). TREE-PUZZLE: maximum likeli-hood phylogenetic analysis using quartets and parallelcomputing. Bioinformatics, 18, 502504.534 Symmetric Repeats and OligomersAlternative to Homo-oligomerisation: The Creation of Local Symmetry in Proteins by Internal Amp.....IntroductionResultsProteins contain long symmetrical repeatsClassifying proteins according to their structureNo specific functional bias in proteins with symmetrical repeatsHomologous proteins with different repeat copy numberDifferences between symmetrical and nonsymmetrical proteinsProteins with symmetrical repeats have homologues with fewer copies and higher oligomerisatio.....Phylogenetic analysis of protein familiesDiscussionMaterials and MethodsIdentification of long repeatsSymmetrical long repeatsLooking for one-copy or n-copy proteinsThree-dimensional structuresFinding full amino acid sequences corresponding to PDB entriesPhylogenetic analysisAcknowledgementsSupplementary DataReferences