19 July 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

  • Published on

  • View

  • Download


Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD). 19 July 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center. Outline. Brief o verview of NIAID-Sponsored Influenza Research Database (IRD) - PowerPoint PPT Presentation


Slide 1

Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD)19 July 2011

Richard H. Scheuermann, Ph.D.Department of PathologyU.T. Southwestern Medical Center

www.fludb.orgOutlineBrief overview of NIAID-Sponsored Influenza Research Database (IRD)Comprehensive integrated databaseAnalysis and visualization toolsU.S. NIH-funded, free access, open to allDeveloped by a team of research scientists, bioinformaticians and professional software developerswww.fludb.orgwww.viprbrc.org for other human viral pathogens Novel approach to genotype-phenotype association studies Sequence Feature Variant Type (SFVT) analysisEvolutionary Trajectory analysis of the pandemic (H1N1) 2009 strainwww.fludb.orgPublic Health Impact of InfluenzaSeasonal flu epidemics occur yearly during the fall/ winter months and result in 3-5 million cases of severe illness worldwide.More than 200,000 people are hospitalized each year with seasonal flu-related complications in the U.S.Approximately 36,000 deaths occur due to seasonal flu each year in the U.S. Populations at highest risk are children under age 2, adults age 65 and older, and groups with other comorbidities.Pandemics1918 Spanish flu (H1N1); 20 - 100 million deaths1957 Asian flu (H2N2); 1 - 1.5 million deaths1968 Hong Kong flu (H3N2); 750,000 - 1 million deaths2009 Swine origin (H1N1); > 16,000 deaths as of March 2010Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html

www.fludb.org3Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html Influenza Virus

Orthomyxoviridae familyNegative-strand RNASegmentedEnveloped8 RNA segments encode11 proteinsClassified based on serology of HA and NAwww.fludb.orgIRD Overview


www.fludb.orgData from both public archives (e.g. GenBank, PDB) and novel data derived by IRD through core data analysis and manual curationGenBank data loaded on a daily basisOther loads based on data refresh frequencies of source archives6Search Access to Data

www.fludb.orgwww.fludb.orgData accessed through optimized search interfaces7Data Types

www.fludb.orgSearch pages for various different data types8Core Query Attributes

www.fludb.orgCommonly used search criteria9Advanced Query Options

www.fludb.orgA variety of different advanced search options10Segment search results

www.fludb.orgAnalysis and Visualization

www.fludb.orgwww.fludb.orgLink to list of analysis and visualization tools12Analysis and Visualization Tools

www.fludb.orgCurrent analysis tools focused on comparative genomicsEmphasis placed on data integration for visualization13Workbench Access

www.fludb.orgwww.fludb.orgLink to personal workbench to save working sets of sequence and surveillance records, and analysis results14My Private Workbench

www.fludb.orgExample of my workbench showing surveillance, segment and protein working sets as well as SNP analysis resultsIn left panel note sharing function in Access panel15

www.fludb.orgExample of data integration in 3D protein visualization.A various custom display options B Ribbon diagram of influenza hemagglutinin in complex with a neutralizing antibodyC Sequence conservation highlighted; red residues are hypervariable among different virus isolatesD Added in location of an antibody epitope highlighted in yellow; note that the antibody epitope corresponds to a hypervariable region (red in panel C)16

www.fludb.orgPhylogenetic trees of H4 surveillance sequences with custom coloring based on year of isolation. Panels B and C are zoomed in views of sections of the tree shown in panel A.Based on these trees, virus isolated from shore birds (Ruddy turnstone) are more closely related to viruses isolated from Alberta duck species than from Minnesota, North Dakota, Texas duck lineage (panel C). Note that this is not intended to be a definitive study but rather to illustrates IRD functionality.Other options for coloring in addition to year of isolation include country of isolation, HA subtype, NA subtype, host specie, and SFVT.17

www.fludb.orgMultiple sequence alignment of H4 surveillance sequences with custom coloring based on year of isolation. Red arrows indicate positions conserved between viruses isolated from shore birds (Ruddy turnstone) and Alberta duck species supporting their common origin in contrast to viruses from the Minnesota, North Dakota, Texas duck lineage.18


www.viprbrc.orgwww.fludb.orgIRD SummaryFunded by U.S. National Institute of Allergy and Infectious Diseases (NIAID)Free and open access with no use restrictionsDeveloped by a team of research scientists, bioinformaticians and professional software developersComprehensive collection of public dataNovel derived data, novel analytical tools, unique functionsIntegration Integration Integrationwww.fludb.org www.viprbrc.org www.fludb.orgNovel approach to genotype-phenotype association studies Sequence Feature Variant Type (SFVT) Analysiswww.fludb.orgLimitations to PhylogeneticsTraditional virus phylogenetics focuses on comparative analysis of whole genome/genome segments, and is most useful to understand virus evolutionHowever, the genetic determinants of important viral phenotypes, e.g. virulence, host range, replication efficiency, immune response evation, etc., are determined by focused functional regions of viral proteinsTherefore, specific genotype-phenotype association can be masked by other evolutionary factors that contribute to traditional phylogenetic analysiswww.fludb.orgSFVT approachVT-1I F D R L E T L I LVT-2I F N R L E T L I LVT-3I F D R L E T I V LVT-4L F D Q L E T L V SVT-5I F D R L E N L T LVT-6I F N R L E A L I LVT-7I Y D R L E T L I LVT-8I F D R L E T L V LVT-9I F D R L E N I V LVT-10I F E R L E T L I LVT-11 L F D Q M E T L V SInfluenza A_NS1_nuclear-export-signal_137(10)

Identify regions of protein/gene with known structural or functional properties Sequence Features (SF)an alpha-helical region, the binding site for another protein, an enzyme active site, an immune epitopeDetermine the extent of sequence variation for each SF by defining each unique sequence as a Variant Type (VT)High-level, comprehensive grouping of all virus strains by VT membership for each SF independentlyGenotype-phenotype association statistical analysis, e.g. genetic determinants of host range, virulence, replication rateInfluenza A_NS1_alpha-helix_171(17)www.fludb.orgInfluenza A NS1 protein (PDB 2GX9) crystal structure showingNuclear Export Signal Sequence Feature (SF) highlighted in RedAlpha-helix SF highlighted in greenAmino acid alignment with colors showing variation within nuclear export signal regionEach sequence with 1+ substitutions comprises a unique fingerprint or Variant Type (VT)A set of unique sequence substitutions existing within any defined region is a sequence feature variant type (SFVT) Statistical analyses on SFVTs can identify genotype-phenotype relationships24SF definitionBased on experimentation reported in the literature and 3D protein structures (PDB records)Captured by manual curationDefined by the specific amino acid positions in the polypeptide chainAnnotated with the know structural or functional propertieswww.fludb.orgInfluenza A Sequence Features as of 18JUL20114128 SFs total

www.fludb.orgNS1 Sequence Features

www.fludb.orgSF8 (nuclear export signal)

www.fludb.orgVT for SF8 (nuclear export signal)

www.fludb.orgVT-1 strains

www.fludb.orgDo variations in NS1 sequence featureS influence influenza virus host range? www.fludb.orgNS1 Sequence Features

www.fludb.orgVT for SF8 (nuclear export signal)

www.fludb.orgVT distribution by host

www.fludb.orgCauses of apparent NS1 VT-associated host range restrictionVirus spread - capability + opportunityPhenotypic property of the virus limited capacityRestricted founder effect limited opportunityRestricted spatial-temporal distributionSampling bias assumption of random samplingOversampling avian H5N1 in Asia; 2009 H1N1Undersampling large and domestic catsLinkage to causative variantwww.fludb.orgVT-11 strains

www.fludb.orgVT for SF8 (nuclear export signal)

www.fludb.orgVT lineages

www.fludb.orgVT-4 lineage


www.fludb.orgVT-4 lineage = B allele/group

www.fludb.orgVT-16 & VT-9 lineages


www.fludb.orgVT-7 lineage


www.fludb.orgEvolutionary Trajectory analysis of the pandemic (H1N1) 2009 strainwww.fludb.orgPhylogenetic AnalysisEvolutionary originSelect a representative pandemic (H1N1) 2009 sequence from the IRD databaseBLAST to identify most similar sequencesAssess phylogenetic relationshipswww.fludb.orgPandemic (H1N1) 2009 selection

www.fludb.orgSearch for all Influenza A Segment 4 sequence records from Human H1N1 2009.As of 29AUG2009 this search returns 993 sequence records in the IRD.Select A/California/04/2009, which was one of the original index cases of pandemic (H1N1) 2009 in the U.S.Run a BLAST analysis using this query sequence.48BLAST Result

www.fludb.orgSelect top 1000 hits and save as a working set in my own personal workbench49

Segment 1 phylogenetic treeSwine/Ohio/2004Duck/USA/2000sHuman/USA/2007 (seasonal)Swine/USA/1990sPandemic (H1N1) 2009www.fludb.orgTrees suggest that Segment 1 of pandemic (H1N1) 2009 (a.k.a. swine) is most like one of two North American swine lineages - Swine/Ohio/2004-2007;Segment 1 of pandemic (H1N1) 2009 (a.k.a. swine) is quite distinct from human seasonal H1N1 flu from previous years (e.g. 2007, 2008) and from the other swine flu lineage circulating in the USA in the 1990sSimilar comparative relationships are seen with Segments 1, 2, 3

50Temporal componentReference strainA/California/04/2009BLASTReturn top 1000 resultsNormalize dataGraph nucleotide differences versus isolation year differenceswww.fludb.orgNP chart

www.fludb.org52NS chart

www.fludb.org53HA chart


Group 1Group 3Group 2www.fludb.org


View more >