The near-symmetry of proteins

  • Published on
    18-Mar-2017

  • View
    212

  • Download
    0

Transcript

TITLE PAGE The near-symmetry of proteins Proteins symmetry Maayan Bonjack and David Avnir* Institute of Chemistry and The Lise Meitner Minerva Center for Computational Quantum Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel Key words: symmetry measure, CSM, rotational symmetry, symmetric proteins, symmetry maps, near-symmetry. *Correspondence to: David Avnir. Institute of Chemistry, The Hebrew University, The Edmond J. Safra Campus - Givat Ram, Jerusalem, IL 9190401. E-mail: david.avnir@mail.huji.ac.il Research Article Proteins: Structure, Function and BioinformaticsDOI 10.1002/prot.24706This article has been accepted for publication and undergone full peer review but has not beenthrough the copyediting, typesetting, pagination and proofreading process which may lead todifferences between this version and the Version of Record. Please cite this article as anAccepted Article, doi: 10.1002/prot.24706 2014 Wiley Periodicals, Inc.Received: Aug 14, 2014; Revised: Oct 06, 2014; Accepted: Oct 18, 20142 ABSTRACT The majority of protein oligomers form clusters which are nearly symmetric. Understanding of that imperfection, its origins, and perhaps also its advantages requires the conversion of the currently used vague qualitative descriptive language of the near-symmetry into an accurate quantitative measure that will allow to answer questions such as: 'What is the degree of symmetry deviation of the protein?', 'how do these deviations compare within a family of proteins?', and so on. We developed quantitative methods to answer this type of questions, which are capable of analyzing the whole protein, its backbone or selected portions of it, down to comparison of symmetry-related specific amino-acids, and which are capable of visualizing the various levels of symmetry deviations in the form of symmetry maps. We have applied these methods on an extensive list of homomers and heteromers and found that apparently all proteins never reach perfect symmetry. Strikingly, even homomeric protein clusters are never ideally symmetric. We also found that the main burden of symmetry distortion is on the amino-acids near the symmetry axis; that it is mainly the more hydrophilic amino-acids that take place in symmetry-distortive interactions; and more. The remarkable ability of heteromers to preserve near-symmetry, despite the different sequences, was also shown and analyzed. The comprehensive literature on the suggested advantages symmetric oligomerizations raises a yet-unsolved key question: If symmetry is so advantageous, why do proteins stop shy of perfect symmetry? Some tentative answers to be tested in further studies are suggested in a concluding outlook. Page 2 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics3 INTRODUCTION The majority of oligomeric proteins appear as clusters of sub-units which are symmetric at least to some degree17. The symmetry point-groups of these clusters are chiral (because the building blocks are chiral8), and the most prevalent ones are the cyclic Cn and the dihedral Dn symmetries1,5. Proteins symmetry research continuous to be a central theme in structural biochemistry, and some recent examples are collected in Ref. 9. Understanding the abundance of this structural feature symmetry has received much attention, because at first glance it is not clear why this feature is important for proteins function, that is, why has symmetric clustering evolved at all. It has been proposed15 that symmetric oligomerization contributes to an increase in protein stability because interactions between subunits are maximized; that it contributes to an unwanted excessive aggregation; that it enhances the coding efficiency, because less information is needed when the subunits are identical; that it reduces synthetic errors, because as the size of the protein unit grows, the more prone it is to such errors; and that it contributes to more efficient cooperative regulation, because protein with several identical subunits may display allosteric effects. Interestingly, despite of these key advantages, the vast majority of protein clusters fall short of perfect symmetry14,7,10. This appears not only in clusters where the subunits are not identical (hetero-oligomers), but even in homo-oligomers. The settling for imperfect symmetry in clusters built by even the same units is a key phenomenon which, to the best of our knowledge, still awaits detailed quantification, exploration of the structural origins of this imperfection, and exploration of possible advantageous of not reaching the ideal symmetry. We believe that understanding of these issues can gain from replacing the current qualitative language that has been used quasi-symmetry, pseudo-symmetry, approximate symmetry, near-symmetry, etc.14,7,10 by quantitative descriptors of symmetry that will allow new-type of questions such as: how much of a given symmetry is there in a protein Page 3 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics4 cluster?; by how much is one cluster more symmetric than the other?; what is the distribution of symmetry values within a family of related proteins?; how did the symmetry content change with evolution?; which portions of a protein carry most of the burden of the symmetry deviation?; what is the distribution of symmetry distortion within the protein?; and so on. The use of quantitative descriptors of symmetry the Continuous Symmetry Measure (CSM)) has already proven very useful in identifying and understanding a host of symmetry-related phenomena1114, and some encouraging indications for its usefulness in protein structure analyses exist. For instance, Keinan et al15 have shown a correlation between the inhibition activity of HIV protease and the changes that the near C2-symmetry of the inhibitors undergoes upon complexation at the active site. Also of relevance for this introduction is the modeling work of Andre et al which quantified the symmetry of various modeled assemblies of proteins2. The CSM quantitative symmetry analysis of proteins suffered for two decades from the magnitude of the problem: The large number of atoms with all of their potential permutations (a key step in the calculations) has drastically limited the ability to apply CSM analyses to proteins. The computational problem was recently solved11, decreasing the dependence of the computation time on the number of atoms, N, from N! to N2; this has opened the gates for a detailed symmetry analysis of proteins. Here we describe the development of different variations on the basic tool to quantify and analyze the symmetry content of protein oligomers, tailored to the various analysis needs: Symmetry analyses of the whole protein or of selected fractions of it, using either all atoms or only the backbone atoms, and the formation of symmetry maps that reveal at a glance the deviation degree of various regions from perfect symmetry. Using these methods we show - on a quantitative level - that very often the burden of deviation from symmetry lies near the symmetry axis and in the border regions of the clustered oligomers, and propose why this should be so; we find that the hydrophilic amino-acids are more likely to carry conformational symmetry distortion; and more. Page 4 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics5 METHODS The computational tools The Continuous Symmetry Measure (CSM) 16,17 is a method for quantifying the degree of symmetry of a given object. According to the CSM approach, the G-symmetry point group content of an object is the minimal distance between two objects: an original structure, Q, and a G-symmetric structure, , which consists of the same atoms and connectivity and is the closest to the original distorted structure. This minimal distance of the object's vertices from the desired G-symmetry is defined by (1) = where are the coordinates of the ith atom of the original studied molecule, are the coordinates of the ith atom of the nearest structure which has the desired symmetry, the denominator is the root mean square size normalization factor of the original centered structure ( = || ), and N is the number of analyzed atoms in the structure (see Supporting Information, Fig. S1, for additional details). It should be emphasized that this measure is inherently different than the rmsd analyses of the degree of similarity these tools do not evaluate the symmetry itself as a structural parameter, which is the key issue of this report. The range of the symmetry scale is 0 1 and is expanded by a factor of 100 for convenience (0 100). If a structure is of perfect G-symmetry, then S(G)=0 and as the structure distorts from the perfect symmetry, S(G) increases. The maximal value of 100 is obtained when the nearest structure with the nearest G-symmetry reduces to a single point (for instance, when one wishes to find the degree of inversion of HCl). When G is the symmetry point group Cn, the maximal possible value is 50 because the nearest structure collapses to a line in space (for instance, when one wishes to find the degree of pentagonality of a tetragon). S(G) is a global parameter, and therefore allows the comparison of various structures and Page 5 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics6 various symmetries on the same scale. Eq. (1) is a special distance function in that the target structure is not known but is searched. This involves several minimizations which are carried out both analytically and numerically (see Ref.'s 16,18,19 for details). As mentioned above, the most time-consuming minimization is over the possible permutations between the atoms of the studied structure and the atoms of the target symmetric structure (which is a mathematical construct, not necessarily a possible molecule) in search for the optimal permutation, namely the one that minimizes S(G). Ref. 11 describes in detail how this problem is solved, resulting calculation times which scale reasonably with N2. Typically, computing the S(G) value of a protein of 70 KDalton takes less than five minutes on an Intel Core i5-3470 @ 3.20 GHz processor. The majority of our analyses concentrate on the most abundant oligomeric near-symmetry, namely of Cn. Few examples use the less abundant point-group symmetry Dn, using a recently developed computational tool developed for that symmetry20. The analyzed proteins data The selection of proteins for analysis with near Cn- or Dn-symmetries was based on direct literature search. The selection criterion of the proteins was the explicit qualitative description of symmetry or of near-symmetry, approximate symmetry, etc. in manuscripts1,3,4,10,21-24 and in the proteins' original crystallographic reports. The homodimers which are mentioned in the last section of the "results and discussion" were selected from the database "3D Complex: a structural classification of protein complexes"25, and here the selection criterion was the mentioning of C2 symmetry. The coordinates of the analyzed proteins (Q in eq. (1)) were taken from the crystallographic Protein Data Bank (PDB)26. We did not use any data which was derived by assuming complete symmetry (these structures are by definition of S(G)=0 value); the only crystallographic asymmetric units taken contain the full oligomer. Water molecules were removed, as well as non-protein components such as Page 6 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics7 phosphate, magnesium or glycerol. Cofactors were included if their number reflected the protein near-cyclic point group. The number of atoms in each of the subunits might not be equivalent, either because the protein is a hetero-oligomer, or because the crystallographic data is missing information. Computational methods of addressing these situations are described below. Regarding the quality of the analyzed data, we followed the recommendations of Wlodawer et al27 in selecting most of the structures analyzed below; that is, we chose structures with a resolution of less than 3.0 and with an R factor of less than 0.25. In few cases we used for explanatory purposes structures of lower resolution if the specific structure was mentioned as symmetrical in the original report. We estimated the error of the CSM values by using the crystallographic B factor from the given crystallographic data of the structure. We then translated B factor into the U factor and checked how does U, as well as the number of collected reflections, affect the confidence in the coordinates of the structure. The rms of the errors in the coordinates was then translated into a statistical estimation of the symmetry measure and revealed the error associated with the determining degree of symmetry from X-ray (see Ref. 28 for full equations and more details). Since in general, these estimations revealed an insignificant error, CSM values are introduced here without it. RESULTS AND DISCUSSION Developing various options for the near-symmetry analysis of proteins The symmetry analysis of proteins, homo- as hetero-oligomers, can be carried out at various degrees of resolutions, depending on the specific investigated questions at hand: It can be carried out with either the full set of atoms or a sub-set of it, or with fragments of the protein. Consequently, our first task has been to develop computational options for four main categories as follows: Page 7 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics8 a. Backbone symmetry analysis of the whole protein oligomer: This analysis takes into account only the coordinates of atoms within the polypeptide backbone, without the residues (Fig. 1b,c). The method enables the user to have a general sense of the symmetry of the protein - it is analogous to the various basic cartoon presentations of proteins. This method is particularly relevant for heteromeric proteins, because of the remarkable property that differences in sequences still preserve the main structural features29, including the symmetry deviation level. In fact, the symmetry in such heteromeric-proteins may reflect the existence of a homomeric-symmetric ancestor oligomer which later-on specialized by becoming heteromeric21. By focusing on the polypeptide backbone one can perform comparative analyses all across the border and include both homo- and hetero- oligomers. Because of its generality we recommend this computational tool at least as a first-approach step. b. All-atoms symmetry analysis of whole protein oligomers: A more detailed symmetry-analysis includes the whole structure of the protein, that is, the backbone and residues (usually without the hydrogens, Fig. 1a). This higher-resolution full analysis may be applied only on homomeric proteins, because in this case the identity of the amino acids is an essential part of the analyzed information (applying this analysis on heteromeric proteins, where the units are not identical, would lead to an unreasonable permutations and thus to extremely high CSM values). c. Symmetry analysis of fragments: It is quite often in proteins structural analyses to focus on selected fragments within the full set of data. Furthermore, it is obvious that CSM values of the whole protein hide more specific information on the origin of the symmetry deviation of the whole. For instance, it might be illuminating to know which regions in the protein carry the burden of the deviation, and which are barely affected by it. This analysis can be done with the two options described above: The fragment can be of a backbone, or of a full set of atoms. For Page 8 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics9 instance, one could compare the backbone symmetry of fragments of homomeric oligomer with fragments of analogous heteromeric oligomer. This option of local symmetry analysis opens also the possibility to analyze quasi-symmetry related portions within monomeric proteins or within a single subunit of an oligomer. In fact, these fragments can be as small as one may wish to have, down to symmetry-matched amino acids, specific carbonyl groups, and even as small as symmetry-equivalent atoms. For instance, in a near-C3 trimeric-oligomer one could choose a specific carbonyl-bearing carbon and its two closest equivalent carbons obtained by that operation, and compute S(C3) for these three atoms. d. Symmetry maps: Continuing with the last example, one can in fact repeat this procedure for another set of three atoms, and so on for a set of atoms as large as one wish to have. For all practical purposes, it is not needed to go to resolutions as high as single atoms - we found that the relevant minimal, useful unit is, as expected, the amino-acid. The collection of all of the local CSM values, displayed with a color-code in space, provides then a symmetry-map of the protein or of portions of it see Fig. 2. This analysis provides a telltale visualization, which gives at a glance the relative deviations from symmetry within the oligomer structure. The construction of symmetry maps is carried out as follows: First, the protein is divided into sets of symmetry-related amino-acids; each set consists of one amino-acid from each subunit. When the division is completed, the number of the sets is identical to the number of amino-acids in the subunit. Second, a CSM calculation is carried out for each set. A graphical decision is then made as to the color code of amino acids are considered as having a very-high, high, medium, low and very-low CSM values. Backbone symmetry analysis of the whole protein oligomer a. Homomers: We begin by applying the backbone symmetry analysis of the whole protein on homomeric proteins with near Cn or Dn symmetry. The conclusion to be reached in this section Page 9 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics10 is not trivial and quite surprising: All of the homo-oligomeric proteins we analyzed, even those which are declared "symmetric" have S(Cn)0 values (beyond the error range see Section "the analyzed proteins data"); that is, despite the identical composition of the units, the proteins "select" to stop short of perfect symmetry. The symmetry deviations are small, but enough to let the literature use qualitative descriptors such as "pseudo" and "approximate". Consider for instance the structure of human hemoglobin, which is composed of four subunits, arranged as a dimer of and subunits (namely 22), which is commonly described as possessing "pseudo-C2 symmetry"30 (Fig. 3e). The analyzed protein backbone in its T (deoxy) quaternary structure has a CSM value of S(C2)=0.004. This is a small symmetry distortion (in the following sections we encounter much larger values) but definitely justifies the qualitative "pseudo" descriptor used in the literature. Other homomeric proteins were analyzed similarly, and the results are collected in Table I and in Fig. 3 (see Supporting Information, Table S1 for additional analyzed proteins). It is seen that none of the proteins is perfectly symmetric. The various S(Cn) values point to an optimal symmetry deviation at which each protein settles, and it is seen that these optima are not the same. We recall that the CSM scale is unified and comparable for all symmetries. That is, any two proteins, regardless of their symmetry point group, can be compared according to their CSM values. For instance, according to Table I, hemoglobin (S(C2)=0.004) is more symmetric than purine nucleoside phosphorylase (S(C3)=0.005). We found the same universality for near Dn-symmetric proteins (Table I and Fig. 3). In fact, Dn analysis allows one also to compare its two chiral rotational sub-groups, Cn and the perpendicular C2. For each of those proteins we present the CSM values for Dn, Cn and C2 (Table I and Fig. 3). It is seen that the approximate symmetry shows up also in the two sub-group symmetries. The relative magnitude of the CSM values is in agreement with the expected relation S(Dn)S(Cn)S(C2). This is so, because it follows the order of the number of Page 10 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics11 elements in each of these symmetry groups: The more elements, the more distant is the nearest fully symmetric structure. b. Heteromers: We move on with the backbone analysis to heteromeric proteins. Here there are basically two general cases: 1. Same length different sequence: All of the heteromeric units are of the same backbone length and differ only in their sequences. The analysis of this case is carried out exactly as was done with the backbone of the homomeric proteins, and examples are provided in Fig. 4 and Table II. For instance, the heterodimer of disintegrin (a protein from the venom of Echis carinatus that binds to integrins receptors) is composed of two monomers which have the same sequence length, and the sequence identity is 84% (Fig. 4a)31. The crystal structure is described as having a high level of pseudosymmetry, which translates quantitatively to a CSM value of S(C2)=0.03 for the backbone. 2. Different length units: The second case is of proteins in which some or all subunits have different lengths. Analysis of this case requires additional data processing. Consider, for instance, the protein 20S proteasome from the archaebacterium Thermoplasma acidophilum (which is part of a proteasome complex that fills an important role in the degradation of unneeded or damaged proteins)32. It is composed of four rings, one on top of the other (Fig. 5a), each of which is built from seven subunits (Fig. 5b). As seen in Fig. 5b, the near C7 symmetry is quite clear, and this is so although each subunit has a unique amino acid sequence and length (the length of the seven subunits is of 243, 250, 244, 241, 242, 233 and 244 amino-acids). The question then is how to handle that situation which is quite common with heteromeric proteins? If no correction is made for the different lengths the CSM value of the structure is extremely high, S(C7)=49 (because of lack of proper permutations between the atoms of the oligomer units). That value basically means that there is no C7-symmetry in the Page 11 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics12 protein 20S proteasome whatsoever, which defies what the eye sees. What is it then that the eye sees? That the bulk of the backbone atoms are indeed nearly C7-symmetric. Translating it into practice requires either ignoring the extra atoms with no equivalent symmetry-related atoms (the CSM is then S(C7)=0.2, Fig. 5c); or - following Frid et al33 idea in analyzing the symmetry of branched structures - adding ghost atoms at the missing parts of the shorter subunits (at the C-terminus of the amino acid); in this case S(C7)=0.5 (Fig. 5d). Each of these two options provides proper permutations and CSM values which translate what the eye sees properly. While the first method ignores atoms, and thus reflects the symmetric core of the oligomer, the second method takes the intrinsic asymmetry of the whole protein structure into consideration; we thus recommend using the second method. All-atoms symmetry analyses of whole homomers Contrary to the backbone analysis, all-atoms analyses take into account many more conformational distortions, due to inclusion of the amino acid residues. This residue conformational freedom is expected to increase the CSM values compared to the corresponding backbone CSM values. Indeed, returning to hemoglobin for which we obtained S(C2)=0.004 for the backbone, the symmetry measure increases 2.5-fold to S(C2)=0.01 (Table I) when the full protein is analyzed. As seen in Table I, this trend is true for all of the analyzed proteins. One should note that the larger CSM values are not the result of the larger number of atoms considered, due to the normalizations according to the number of atoms and size. However, the magnitude of the difference in the CSM values between the full- and backbone- analyses show that the residues contribute to overall distortion are much more than the backbone. Symmetry analyses of fragments Page 12 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics13 Quite often, one may wish to concentrate on specific portions of the protein (portions which are symmetry-related in each of the homo- or hetero-oligomer units, or even in a monomeric protein structure), rather than on the whole. Such local focus - for instance, -helices, border zones between the oligomer units or specific domains in the structure - provides higher resolution information compared to the previous sections. That is, as they are not masked by portions which are of less relevance to the question at hand, or which carry only little of the symmetry distortion. Consider, for example, the homodimer of human 1 alcohol dehydrogenase (a protein that participates in the metabolism of ethanol and other substrates, Fig. 6a)30. The protein consists of -helixes and -sheets, and the question we ask is, where are the higher symmetry distortions? Calculations of S(C2) of the -sheets and of the -helixes separately provide values of S(C2)=0.002 and S(C2)=0.003, respectively. That is, the -helixes tend to be more distorted than -sheets. Yet, since the global CSM value is much higher (S(C2)=0.01), the burden of the protein distortion must be elsewhere, in less rigid zones. Indeed, analysis of connecting segments of the rigid secondary structures was calculated and found to be relatively high- S(C2)=0.02. This is general: We observed this trend in many other proteins as well: rigid segments have lower CSM values, compared to flexible ones. As mentioned above, local symmetry analysis opens also the possibility to analyze quasi-symmetry related portions within monomeric protein or within a subunit of an oligomer. An example for this case is the protein -crystallin (an eye lens protein), which is referred to as a highly symmetrical structure34, despite the different sequences of the components. It consists of two domains; each of them contains two structures of the 'Greek key' motifs (Fig. 6b). The two domains are related by an approximate 2-fold symmetry, as described in literature35. However, the domains differ in their sequence and length. Therefore, addition of "ghost" atoms is required, as described in Section "backbone symmetry analysis of the whole protein oligomer". Calculation of the CSM value was carried out only on the relevant motifs, Page 13 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics14 and the linking segments which add a lot to the symmetry deviation, as we saw above, were ignored. The resulting CSM value is S(C2)=0.20, and it indicates that the two domains are largely related by C2 symmetry, although the sequence and length is different. Symmetry maps a. Symmetry maps of oligomers: The symmetry map is a high resolution local analysis which determines the symmetry content of symmetry-matched amino acids within the oligomer. The analysis produces symmetry maps of the kind shown in Fig. 2 for the near C2-symmetric dimer of HIV-1 protease30 (Fig. 3m, an aspartic protease; we recall that the near C2-symmetry of the dimer gave rise to the development of C2-related anti-HIV drugs36). The CSM value of the protein in the full analysis was found to be (above) S(C2)=0.13; let us now see how this translates to the symmetry map of Fig. 2. Literature points specifically at two amino-acids in position 50 and 51 (Ile50 and Gly51 in each subunit) that are a major source of symmetry distortion conformation. Indeed, the CSM value of Ile50 is extremely high, S(C2)=4.64. The CSM value of Gly51 is lower, S(C2)=0.21, but nevertheless it is still higher than the CSM value of the full protein. Applying a four colors-code map not only confirms that those amino-acids break the symmetry of the protein, but also reveals other amino acids which cause symmetry deviations. The most distorted amino acids (in this map, 115 into one, and along the near Cn symmetry axis (in dimers, those two elements are of the same region in the protein). This again is general: we shall see it repeatedly in more case analyses (Fig. 7). The area where the components of the oligomer interact is indeed expected to be active in symmetry distortion. Not only that when two similar sub-units interact there is no reason to expect a similar distortion upon interaction, the contrary is true - conformational changes that lead to the optimal interaction often would require the two components to adjust to each other, which would mean that each subunit will acquire a different conformer. For example, the loop in position 49-51 (bottom of Fig. 2) is very distorted (colored red, orange and yellow), and this is due to the non-equivalent conformers of Ile50 and Gly51 that are needed in order to form a direct hydrogen bond between them (dashed line, Fig. 2). The mechanical approach to proteins structure would suggest that the distortions at the units interacting zone would affect also more remote zones of the protein. This is also seen in Fig. 2, as regions that are distant from the interacting zone are distorted as well, although to a lesser degree (colored green). Additional symmetry-maps are shown in Fig. 737. In most symmetry maps not only is there a quantitative and visual reporting of original qualitative statements in literature about symmetry-related distorted amino acids, but also indications for other distorted regions, that were not mentioned in the original reports, as seen above in HIV-1 protease. The various symmetry maps show some main trends: First, the most distorted amino-acids, which are indicated by the red- and orange-colored areas, are located near the rotation axis and between the touching zones of the subunits, as was mentioned above. Second, symmetry related amino acids, located in any type of secondary structures - rigid as well as flexible (i.e. -helixes or loops), can diverge from perfect symmetry relation (this is in agreement with the finding mentioned above that rigid regions, as a whole, tend to be more symmetric than flexible ones). Third, the less distorted green-colored regions may be present Page 15 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics16 all over the protein structure, including the protein-solvent interface. Fourth, it was found that there is no decaying gradient in symmetry measure from the most symmetry distorted regions into their near surroundings. However, as was mentioned above, the distortions do have the potential to affect distant regions. b. Comparative symmetry maps of different proteins: Another example of the usefulness of symmetry maps analysis of proteins is the comparison of the symmetry of different proteins which have only partial structure and sequence similarity. Our example is the two homodimeric-proteins, 11 -hydroxysteroid dehydrogenase Type 1 (11 -HSD1) of a guinea pig and of a murine38, which have a sequence homology of 70% and possess a similar structure; both are referred to in the literature as having a pseudo-2-fold axis (Fig. 8). This "pseudo-symmetry" translates into global CSM value of S(C2)=0.019 and S(C2)=0.024 for the murine and the guinea pig, respectively, which in fact is a very interesting result in itself: Despite the fact that the sequential similarity is only 70%, the two proteins are with very similar CSM values, which might show their structural and functional similarity. This agrees with the identified evolutionary trend29 by which keeping the trait of a specific structure along evolution is more important than keeping the sequence, signaling that activity is primarily dependent on the quaternary structure, and indicating the ability to use different building-blocks (the amino-acids) to form similar structures. Let us now look at these proteins at a higher resolution (Fig. 8), to find local regions which are of much higher S(C2) values (up to 2.8). An interesting observation made possible by this type of analysis is that the two proteins share the same amino-acids with high CSM values, each in its C2-symmetry-related subunits in the protein. Out of the 20 most distorted amino-acids in each protein, 7 amino-acids are shared (Glu26, His134, Arg137, Ile203, Glu239, Asn270, Lys274). Note that in this case too, significant parts of the distortion are located near the C2-symmetry-axis. Page 16 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics17 Which amino-acids carry the burden of symmetry distortion? We use now the symmetry maps to answer this key question, and for that purpose we have concentrated on homodimeric proteins, (which, as mentioned above, are classified as having C2-symmetry in the database of Ref. 25). We have applied the symmetry analysis of fragments on each of these proteins, and chose the minimal fragments in the analysis as the individual amino-acids. That is, each CSM calculation was consisted of a pair of two amino-acids, one in each of the two subunits, which are symmetry-related (as in the symmetry maps). We then searched for the most distortive ones, and focused on the 15 most symmetry-distorted C2-symmetry-related pairs of amino acids, in search of what might characterize the symmetry-distorted pairs. After searching various possible parameters (such as polarity, acidity, etc.) a very clear link was found with the hydrophilicity/hydrophobicity balance of the amino-acids (as defined in Ref. 39): It was found that the majority of distorted 15 residues are hydrophilic, as can be seen in Table III and in Fig. 940. This finding means, perhaps as expected, that a main source of structure re-adjustment is due to multiple polar asymmetric interactions of hydrogen-bonds between amino-acids or between an amino-acid and water molecules. These interactions enable the forcing of conformation changes of the amino-acids in order to get to a preferred position, as was already discussed above regarding the HIV protease. On the other hand, hydrophobic residues are engaged in weaker dispersive interactions, which are less demanding and therefore less distortive. This observation leads us to our final topic of conceptual discussion: CONCLUSIONS The shift from qualitative descriptions of symmetry deviations of proteins into quantitative descriptions, as proposed in this report, proved to be an eye opener: Not only does it translate the descriptive language to a meaningful scale, it allows one to obtain a clear picture of the molecular origin of the asymmetry. Thus, we found that segments located near the Page 17 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics18 symmetry-axis and in between the subunits consistently carry part of the burden of symmetry deviations; and we found that hydrophilic amino acids tend to carry the burden of asymmetry by adopting different conformers relative to their symmetry-related amino acids in the protein structure. The strength, as well as the weakness, of the method is that it is a thermodynamic measure: One cannot go back from it to specific structural features, but it does combine such various features into one value. Such observations bring us back to the question at the opening of the report: If symmetry is so advantageous15, why do proteins stop shy of it? We propose the following list of reasons for this phenomenon which is quite universal in the world of proteins. All of the following must be tested in future studies and should be regarded as a starting point for discussion: The first important parameter is the free energy, that is, the enthalpic and entropic parameters. Enthalpy changes must follow the interactions and conformational readjustments optimizing the interfacial contact area of the subunits: insisting on perfect symmetry would not minimize the protein's enthalpy, and thus would not minimize its free energy. We also propose that entropy has a great influence on the symmetry distortion of the protein: A fully-symmetric structure of oligomer has a certain number of possible microscopic conformations of the protein. However, as the protein deviates from its perfect symmetric structure, that number increases sharply. That is, the entropy content rises, probably quite steeply, as the protein settles for symmetry that is less than perfect. Second, we consider dynamics as another cause for asymmetry, since proteins are not static structures, but are in a constant motion and thus have ensembles of conformations. As dynamics moves any symmetric object away from symmetry (for most of its conformers), the protein necessarily distorts from perfect symmetry. Using the CSM tool, a distortion is always a non-negative value, S(G)0, namely the distance from perfect symmetry. It follows that the average CSM value of the protein's conformers is always S(G)0 as well, because an average of positive values is also positive. It is a delicate Page 18 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics19 point that while the geometric average of distorted structures can be perfectly symmetric, the average symmetry measure is still non-zero. The explanation: The average geometry is the most symmetric structure because the directionalities of fluctuations tend to compensate each other (they are vectorial entities and can have negative values); symmetry deviation values on the other hand are always positive numbers. Last but not least, one must take into account the surroundings: proteins are always in an asymmetric environment certainly in the cell, but even in the crystalline state. These asymmetric surroundings do not induce asymmetry to the protein structure, but stabilize it: the hydration shell stabilizes the structure by forming mainly hydrogen bond interactions. As was discussed before, hydrogen interactions encourage asymmetry. We hope this report will stimulate further exploration of the effects of these parameters and of their relative weight in determining proteins symmetry and their symmetry-related functionality; research in this direction is in progress. ACKNOWLEDGEMENTS We thank Dr. Chaim Dryzun for useful advice and Amir Zait for development and assistance of the computational tools. Page 19 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics20 REFERENCES 1. Goodsell DS, Olson AJ. Structural symmetry and protein function. Annu Rev Biophys Biomol Struct 2000;29:105153. 2. Andr I, Strauss CEM, Kaplan DB, Bradley P, Baker D. Emergence of symmetry in homooligomeric biological assemblies. Proc Natl Acad Sci USA 2008;105:1614816152. 3. Blundell TL, Srinivasan N. Symmetry, stability, and dynamics of multidomain and multicomponent protein systems. Proc Natl Acad Sci USA 1996;93:1424314248. 4. Koji-Prodi B, tefani Z. Symmetry versus Asymmetry in the Molecules of Life: Homomeric Protein Assemblies. Symmetry 2010;2:884906. 5. Berchanski A, Segal D, Eisenstein M. Modeling oligomers with Cn or Dn symmetry: application to CAPRI target 10. Proteins 2005;60:202206. 6. Taylor WR, May ACW, Brown NP, Aszdi A. Protein structure: geometry, topology and classification. Rep Prog Phys 2001;64:517590. 7. Taylor WR, Aszodi A. Protein geometry, classification, topology and symmetry: A computational analysis of structure (Series in Biophysics). Taylor & Francis; 2004. 8. Alvarez S, Serratosa F. Symmetry guidelines for the design of convergent syntheses. On narcissistic coupling and la coupe du roi. J Am Chem Soc 1992;114:26232630. 9. (a) Abad-Zapatero C. Notes of a protein crystallographer: the beauty of rose windows and the different meanings of symmetry. Acta Crystallogr D Biol Crystallogr 2014;70:907911. (b) Schulze B, Sljoka A, Whiteley W. How does symmetry impact the flexibility of proteins? Philos Trans A Math Phys Eng Sci 2014;372:20120041. (c) Reisinger B, Kuzmanovic N, Lffler P, Merkl R, Knig B, Sterner R. Exploiting protein symmetry to design light-controllable enzyme inhibitors. Angew Chem Int Ed Engl 2014;53:595598. (d) Williams LS, Levdikov VM, Minakhin L, Severinov K, Antson AA. 12-Fold symmetry of the putative portal protein from the Thermus thermophilus bacteriophage G20C determined by X-ray analysis. Acta Crystallogr Sect F Struct Biol Cryst Commun 2013;69:12391241. (e) Hilbert M, Erat MC, Hachet V, Guichard P, Blank ID, Flckiger I, Slater L, Lowe ED, Hatzopoulos GN, Steinmetz MO, Gnczy P, Vakonakis I. Caenorhabditis elegans centriolar protein SAS-6 forms a spiral that is consistent with imparting a ninefold symmetry. Proc Natl Acad Sci U S A 2013;110:1137311378. (f) Matsunaga Y, Koike R, Ota M, Tame JRH, Kidera A. Influence of structural symmetry on protein dynamics. PLoS One 2012;7:e50011. (g) Page 20 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics21 Bayfield OW, Chen C-S, Patterson AR, Luan W, Smits C, Gollnick P, Antson AA. Trp RNA-binding attenuation protein: modifying symmetry and stability of a circular oligomer. PLoS One 2012;7:e44309. (h) Gross M. Symmetry and complexity in protein oligomers. Curr Biol 2012;22:R175R177. (i) Nava J, Kreinovich V. Towards symmetry-based explanation of (approximate) shapes of alpha-helices and beta-sheets (and beta-barrels) in protein structure. Symmetry (Basel) 2012;4:1525. (j) Broom A, Doxey AC, Lobsanov YD, Berthin LG, Rose DR, Howell PL, McConkey BJ, Meiering EM. Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein. Structure 2012;20:161171. 10. Levy Y, Cho SS, Shen T, Onuchic JN, Wolynes PG. Symmetry and frustration in protein energy landscapes: a near degeneracy resolves the Rop dimer-folding mystery. Proc Natl Acad Sci U S A 2005;102:23732378. 11. Dryzun C, Zait A, Avnir D. Quantitative symmetry and chirality-a fast computational algorithm for large structures: proteins, macromolecules, nanotubes, and unit cells. J Comput Chem 2011;32:25262538. 12. Tuvi-Arad I, Blonder R. Continuous symmetry and chemistry teachers: learning advanced chemistry content through novel visualization tools. Chem Educ Res Pract 2010;11:48-58. 13. Dryzun C. Continuous symmetry measures for complex symmetry group. J Comput Chem 2014;35:748755. 14. Tuvi-Arad I, Rozgonyi T, Stirling A. Effect of temperature and substitution on Cope rearrangement: a symmetry perspective. J Phys Chem A 2013;117:1272612733. 15. Keinan S, Avnir D. Quantitative symmetry in structureactivity correlations: The near C2 symmetry of inhibitor/HIV protease complexes. J Am Chem Soc 2000;122:43784384. 16. Zabrodsky H, Peleg S, Avnir D. Continuous symmetry measures. J Am Chem Soc 1992;114:78437851. 17. Dryzun C, Avnir D. Generalization of the continuous symmetry measure: the symmetry of vectors, matrices, operators and functions. Phys Chem Chem Phys 2009;11:96539666. 18. Salomon Y, Avnir D. Continuous symmetry measures: A note in proof of the folding/unfolding method. J Math Chem 1999;25:295308. 19. Pinsky M, Avnir D. Continuous Symmetry Measures. 5. The Classical Polyhedra. Inorg Chem 1998;37:55755582. Page 21 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics22 20. Pinsky M, Zait A, Bonjack M, Avnir D. Continuous symmetry analyses: Cnv and Dn measures of molecules, complexes, and proteins. J Comput Chem 2013;34:29. 21. Blundell TL, Bolanos-Garcia V, Chirgadze DY, Harmer NJ, Lo T, Pellegrini L, Sibanda BL. Asymmetry in the multiprotein systems of molecular biology. Struct Chem 2002;13:405412. 22. Guerler A, Wang C, Knapp EW. Symmetric structures in the universe of protein folds. J Chem Inf Model. 2009;49:2147-2151. 23. Duran AM, Meiler J. Inverted topologies in membrane proteins: a mini-review. Comput Struct Biotechnol J 2013;8:e201308004. 24. Popov P, Ritchie DW, Grudinin S. DockTrina: docking triangular protein trimers. Proteins 2014;82:34-44. 25. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 3D complex: a structural classification of protein complexes. PLoS Comput Biol 2006;2:e155. 26. http://www.rcsb.org/pdb/. 27. Wlodawer A, Minor W, Dauter Z, Jaskolski M. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J 2008;275:121. 28. Pinsky M, Yogev-Einot D, Avnir D. Statistical analysis of the estimation of distance measures. J Comput Chem 2003;24:786796. 29. Guerler A, Knapp E-W. GIS: a comprehensive source for protein structure similarities. Nucleic Acids Res 2010;38:W4652. 30. (a) Lou H, Chen M, Black SS, Bushell SR, Ceccarelli M, Mach T, Beis K, Low AS, Bamford VA, Booth IR, Bayley H, Naismith JH. Altered antibiotic transport in OmpC mutants isolated from a series of clinical strains of multi-drug resistant E. coli. PLoS One 2011;6:e25825. (b) Hohenester E, Hutchinson WL, Pepys MB, Wood SP. Crystal structure of a decameric complex of human serum amyloid P component with bound dAMP. J Mol Biol 1997;269:570578. (c) Dedeo MT, Duderstadt KE, Berger JM, Francis MB. Nanoscale protein assemblies from a circular permutant of the tobacco mosaic virus. Nano Lett 2010;10:181186. (d) Hurley TD, Bosron WF, Stone CL, Amzel LM. Structures of three human beta alcohol dehydrogenase variants. Correlations with their functional differences. J Mol Biol 1994;239:415429. (e) Fermi G, Perutz MF, Shaanan B, Fourme R. The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J Mol Biol 1984;175:159174. (f) To be published. (g) Antson AA, Otridge J, Brzozowski AM, Dodson EJ, Dodson GG, Wilson KS, Smith TM, Yang M, Kurecki Page 22 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics23 T, Gollnick P. The structure of trp RNA-binding attenuation protein. Nature 1995;374:693700. (h) Kiser PD, Lodowski DT, Palczewski K. Purification, crystallization and structure determination of native GroEL from Escherichia coli lacking bound potassium ions. Acta Crystallogr Sect F Struct Biol Cryst Commun 2007;63:457461. (i) Harata K, Nagahora H, Jigami Y. X-ray structure of wheat germ agglutinin isolectin 3. Acta Crystallogr Sect D Biol Crystallogr 1995;51:10131019. (j) To be published. (k) Cherezov V, Clogston J, Papiz MZ, Caffrey M. Room to move: crystallizing membrane proteins in swollen lipidic mesophases. J Mol Biol 2006;357:16051618. (l) Sixma TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J, Witholt B, Hol WG. Refined structure of Escherichia coli heat-labile enterotoxin, a close relative of cholera toxin. J Mol Biol 1993;230:890918. (m) Swain AL, Miller MM, Green J, Rich DH, Schneider J, Kent SB, Wlodawer A. X-ray crystallographic structure of a complex between a synthetic protease of human immunodeficiency virus 1 and a substrate-based hydroxyethylamine inhibitor. Proc Natl Acad Sci U S A 1990;87:88058809. (n) Shrive AK, Cheetham GM, Holden D, Myles DA, Turnell WG, Volanakis JE, Pepys MB, Bloomer AC, Greenhough TJ. Three dimensional structure of human C-reactive protein. Nat Struct Biol 1996;3:346354. (o) Tahirov TH, Inagaki E, Ohshima N, Kitao T, Kuroishi C, Ukita Y, Takio K, Kobayashi M, Kuramitsu S, Yokoyama S, Miyano M. Crystal structure of purine nucleoside phosphorylase from Thermus thermophilus. J Mol Biol 2004;337:11491160. (p) Vitali J, Colaneri MJ, Kantrowitz E. Crystal structure of the catalytic trimer of Methanococcus jannaschii aspartate transcarbamoylase. Proteins 2008;71:13241334. (q) Gill HS, Eisenberg D. The crystal structure of phosphinothricin in the active site of glutamine synthetase illuminates the mechanism of enzymatic inhibition. Biochemistry 2001;40:19031912. 31. (a) Bilgrami S, Yadav S, Kaur P, Sharma S, Perbandt M, Betzel C, Singh TP. Crystal structure of the disintegrin heterodimer from saw-scaled viper (Echis carinatus) at 1.9 A resolution. Biochemistry 2005;44:1105811066. (b) Keating AE, Malashkevich VN, Tidor B, Kim PS. Side-chain repacking calculations for predicting structures and stabilities of heterodimeric coiled coils. Proc Natl Acad Sci U S A 2001;98:1482514830. (c) Jabeen T, Singh N, Singh RK, Jasti J, Sharma S, Kaur P, Srinivasan A, Singh TP. Crystal structure of a heterodimer of phospholipase A2 from Naja naja sagittifera at 2.3 A resolution reveals the presence of a new PLA2-like protein with a novel cys 32-Cys 49 disulphide bridge with a bound sugar at the substrate-binding site. Proteins 2006;62:329337. Page 23 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics24 32. Groll M, Ditzel L, Lwe J, Stock D, Bochtler M, Bartunik HD, Huber R. Structure of 20S proteasome from yeast at 2.4 A resolution. Nature 1997;386:463471. 33. Milner D, Raz S, Hel-Or H, Keren D, Nevo E. A new measure of symmetry and its application to classification of bifurcating structures. Pattern Recognit 2007;40:22372250. 34. Najmudin S, Nalini V, Driessen HP, Slingsby C, Blundell TL, Moss DS, Lindley PF. Structure of the bovine eye lens protein B(II)-crystallin at 1.47 A. Acta Crystallogr D Biol Crystallogr 1993;49:223233. 35. Blundell T, Lindley P, Miller L, Moss D, Slingsby C, Tickle I, Turnell B, Wistow G. The molecular structure and stability of the eye lens: x-ray analysis of -crystallin II. Nature 1981;289:771777. 36. Greer J, Erickson JW, Baldwin JJ, Varney MD. Application of the three-dimensional structures of protein target molecules in structure-based drug design. J Med Chem 1994;37:10351054. 37. (a) Baker EN, Blundell TL, Cutfield JF, Cutfield SM, Dodson EJ, Dodson GG, Hodgkin DM, Hubbard RE, Isaacs NW, Reynolds, CD. The structure of 2Zn pig insulin crystals at 1.5 A resolution. Philos Trans R Soc Lond B Biol Sci 1988;319:369456 (PDB code: 4INS). (b) Moras D, Olsen KW, Sabesan MN, Buehner M, Ford GC, Rossmann MG. Studies of asymmetry in the three-dimensional structure of lobster D-glyceraldehyde-3-phosphate dehydrogenase. J Biol Chem 1975;250:91379162 (PDB code: 1GPD). (c) Jogl G, Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton transfer: structure of the Michaelis complex of triosephosphate isomerase at 1.2-A resolution. Proc Natl Acad Sci U S A 2003;100:5055 (PDB code: 1NEY). (d) Sixma TK, Kalk KH, van Zanten BA, Dauter Z, Kingma J, Witholt B, Hol WG. Refined structure of Escherichia coli heat-labile enterotoxin, a close relative of cholera toxin. J Mol Biol 1993;230:890918 (PDB code: 1LTS). 38. (a) Ogg D, Elleby B, Norstrm C, Stefansson K, Abrahmsn L, Oppermann U, Svensson S. The crystal structure of guinea pig 11-hydroxysteroid dehydrogenase type 1 provides a model for enzyme-lipid bilayer interactions. J Biol Chem 2005;280:37893794 (PDB code: 1XSE). (b)Zhang J, Osslund TD, Plant MH, Clogston CL, Nybo RE, Xiong F, Delaney JM, Jordan SR. Crystal structure of murine 11 beta-hydroxysteroid dehydrogenase 1: an important therapeutic target for diabetes. Biochemistry 2005;44:69486957 (PDB code: 1Y5M). Page 24 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics25 39. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol 1984;179:125142. 40. (a) Chagot B, Diochot S, Pimentel C, Lazdunski M, Darbon H. Solution structure of APETx1 from the sea anemone Anthopleura elegantissima: a new fold for an HERG toxin. Proteins 2005;59:380386. (b) Swan MK, Hansen T, Schnheit P, Davies C. A novel phosphoglucose isomerase (PGI)/phosphomannose isomerase from the crenarchaeon Pyrobaculum aerophilum is a member of the PGI superfamily: structural evidence at 1.16-A resolution. J Biol Chem 2004;279:3983839845. (c) Jogl G, Rozovsky S, McDermott AE, Tong L. Optimal alignment for enzymatic proton transfer: structure of the Michaelis complex of triosephosphate isomerase at 1.2-A resolution. Proc Natl Acad Sci U S A 2003;100:5055. (d) Kessler D, Rtey J, Schulz GE. Structure and action of urocanase. J Mol Biol 2004;342:183194. (e) Swan MK, Hansen T, Schnheit P, Davies C. Structural basis for phosphomannose isomerase activity in phosphoglucose isomerase from Pyrobaculum aerophilum: a subtle difference between distantly related enzymes. Biochemistry 2004;43:1408814095. (f) Zikowska NE, OKeefe BR, Mori T, Zhu C, Giomarelli B, Vojdani F, Palmer KE, McMahon JB, Wlodawer A. Domain-swapped structure of the potent antiviral protein griffithsin and its mode of carbohydrate binding. Structure 2006;14:11271135. (g) Willis MA, Song F, Zhuang Z, Krajewski W, Chalamasetty VR, Reddy P, Howard A, Dunaway-Mariano D, Herzberg O. Structure of YciI from Haemophilus influenzae (HI0828) reveals a ferredoxin-like alpha/beta-fold with a histidine/aspartate centered catalytic site. Proteins 2005;59:648652. (h) Adam V, Royant A, Nivire V, Molina-Heredia FP, Bourgeois D. Structure of superoxide reductase bound to ferrocyanide and active site expansion upon X-ray-induced photo-reduction. Structure 2004;12:17291740. (i) Meijers R, Adolph H-W, Dauter Z, Wilson KS, Lamzin VS, Cedergren-Zeppezauer ES. Structural evidence for a ligand coordination switch in liver alcohol dehydrogenase. Biochemistry 2007;46:54465454. (j) Liao DI, Calabrese JC, Wawrzak Z, Viitanen P V, Jordan DB. Crystal structure of 3,4-dihydroxy-2-butanone 4-phosphate synthase of riboflavin biosynthesis. Structure 2001;9:1118. (k) Rigsby RE, Rife CL, Fillgrove KL, Newcomer ME, Armstrong RN. Phosphonoformate: a minimal transition state analogue inhibitor of the fosfomycin resistance protein, FosA. Biochemistry 2004;43:1366613673. Page 25 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics26 Figure legends Figure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The peptide backbone structure only (residues are ignored). (c) Cartoon representation of the backbone structure. Figure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated amino acids are discussed below in Section 4.4. Figure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol dehydrogenase, (e) Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside phosphorylase, (p) Aspartate transcarbamoylase, (q) Glutamine synthetase. Figure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2) heterodimer. Figure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b) Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The analyzed structure, adding ghost atoms. See text for explanation of (c) and (d). Figure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of 1 alcohol dehydrogenase. (b) Symmetry-related domains of -crystallin. Only coordinates of colored areas were considered in calculations. Figure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase, (c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources see Ref. 37. Figure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 -hydroxysteroid dehydrogenase Type 1 (11 -HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-distorted amino acids in each structure are colored red. Figure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) see Table III for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD). (c) Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF). Page 26 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsTable I CSM values of the homomeric proteins structures in Fig. 3. Name (a) PDB ID G- symmetry Backbone CSM value Full CSM value a Porin* 2XE2 C3 0.001 0.006 b Decameric amyloid P 1LGN C5 0.002 0.006 c Tobacco mosaic virus disc* 3KML C17 0.002 0.002 d Alcohol dehydrogenase 1HDX C2 0.003 0.01 e Hemoglobin 2HHB C2 0.004 0.01 f Neuraminidase* 1V0Z C4 0.004 0.01 g trp RNA-binding attenuation protein* 1WAP C11 0.006 0.05 h GroEL* 2NWC C7 0.008 0.01 i Wheat germ agglutinin* 1WGT C2 0.01 0.05 j Triose phosphate isomerase* 8TIM C2 0.02 0.05 k Light-harvesting comp.2 2FKW C9 0.02 0.03 l Beta-subunit of types 1&2 heat-labile enterotoxin* 1LTS C5 0.03 0.08 m HIV proteinase* 7HVP C2 0.04 0.1 n C-reactive protein* 1GNH C5 0.05 0.08 o Purine Nucleoside Phosphorylase 1ODI C3 C2 D3 0.005 0.002 0.006 0.02 0.01 0.02 p Aspartate transcarbamoylase 2RGW C3 C2 D3 0.004 0.001 0.005 0.01 0.006 0.01 q Glutamine synthetase 1F52 C6 C2 D6 0.0005 0.0001 0.0008 0.0006 0.0002 0.0009 (a) For data sources see Ref. 30. Starred proteins are referred to in the literature as symmetric; all others are referred to as quasi-symmetric etc. Page 27 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsTable II CSM values of the heteromeric proteins in Fig. 4. Name (a) PDB ID G- symmetry Backbone CSM value a Disintegrin heterodimer 1TEJ C2 0.03 b GCN4 Acid/Base heterodimer 1KDD C2 0.06 c Phospholipase A2 (PLA2) 1Y75 C2 0.09 (a) For data sources see Ref. 31. Page 28 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsTable III The hydrophilic/hydrophobic ratio of the 15-most symmetry distorted amino-acids in several proteins. PDB ID(a) Hydrophilic:hydrophobic ratio 1WKQ 14 : 1 1TZB 14 : 1 1NEY 13 : 2 1UWK 13 : 2 1X9I 13 : 2 2GUD 13 : 2 1MWQ 12 : 3 1VZI 12 : 3 2JHF 12 : 3 1G57 10 : 5 1NKI 10 : 5 (a) For data sources see Ref. 40. Page 29 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 1. Backbone- and full-analysis scheme of the dimeric HIV protease model; each subunit is indicated by different color. (a) All atoms (except for hydrogens), including all of the amino acids residues. (b) The peptide backbone structure only (residues are ignored). (c) Cartoon representation of the backbone structure. 76x48mm (300 x 300 DPI) Page 30 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 2. A symmetry map of HIV protease (compare with Fig. 1 and see text for explanation). The indicated amino acids are discussed below in Section 4.4. 82x71mm (300 x 300 DPI) Page 31 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 3. Homomeric protein structures analyzed in this study (see Table I). Each subunit is indicated by a different color. (a) Porin, (b) Decameric amyloid P, (c) Tobacco mosaic virus disc, (d) Alcohol dehydrogenase, (e) Hemoglobin, (f) Neuraminidase, (g) GroEL, (h) Wheat germ agglutinin, (i) Triose phosphate isomerase, (j) Light-harvesting comp.2, (k) Beta-subunit of types 1&2 heat-labile enterotoxin, (l) C-reactive protein, (m) HIV proteinase, (n) trp RNA-binding attenuation protein, (o) Purine nucleoside phosphorylase, (p) Aspartate transcarbamoylase, (q) Glutamine synthetase. 176x168mm (300 x 300 DPI) Page 32 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 4. Heteromeric proteins structures analyzed in this study (see Table II). Each subunit is indicated by different color. (a) Disintegrin heterodimer, (b) GCN4 acid/base heterodimer , (c) Phospholipase A2 (PLA2) heterodimer. 81x40mm (300 x 300 DPI) Page 33 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 5. Backbone analysis of the heteromeric protein 20S proteasome (each subunit is indicated by different color. (a) The whole structure of the protein is composed of four rings, one on top of the other. (b) Backbone structure of the top 7-subunits ring. (c) The analyzed structure, ignoring the extra atoms. (d) The analyzed structure, adding ghost atoms. See text for explanation of (c) and (d). 173x101mm (300 x 300 DPI) Page 34 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 6. Symmetry analysis of protein fragments: (a) The symmetry-related secondary structures of 1 alcohol dehydrogenase. (b) Symmetry-related domains of -crystallin. Only coordinates of colored areas were considered in calculations. 83x63mm (300 x 300 DPI) Page 35 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 7. Symmetry maps of proteins structures, each with its own color-code, with a focus on the location of the near symmetry axes (indicated for each structure). (a) Insulin, (b) D-glyceraldehyde-3-phosphate dehydrogenase, (c) Triosephosphate isomerase, (d) B subunits of heat-labile enterotoxin. For data sources see Ref. 37. 176x101mm (300 x 300 DPI) Page 36 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformaticsfigure 8. Comparison of two homodimeric-proteins with a sequence homology of 70%: (a) 11 -hydroxysteroid dehydrogenase Type 1 (11 -HSD1) of a guinea pig; and (b) of a murine. The 15 most C2-distorted amino acids in each structure are colored red. 83x29mm (300 x 300 DPI) Page 37 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and BioinformaticsFigure 9. The 15 most symmetry-distorted amino acids in each protein are shown (red and blue) see Table III for their hydrophobicity characterization. (a) Urocanate hydratase (1UWK). (b) Griffithsin (2GUD). (c) Fosfomycin resistance protein A (1NKI). (d) Alcohol dehydrogenase (2JHF). 83x57mm (300 x 300 DPI) Page 38 of 38John Wiley & Sons, Inc.PROTEINS: Structure, Function, and Bioinformatics