Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics

  • Published on
    23-Feb-2016

  • View
    42

  • Download
    0

DESCRIPTION

Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers . Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center. N01AI2008038 - PowerPoint PPT Presentation

Transcript

PowerPoint Presentation

Richard H. Scheuermann, Ph.D.Department of PathologyDivision of Biomedical InformaticsU.T. Southwestern Medical Center

Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers N01AI2008038 N01AI40041 Richard H. Scheuermann, Ph.D.Director of InformaticsJ. Craig Venter Institute

Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers

N01AI2008038 N01AI40041

Genome Sequencing Centers for Infectious Disease (GSCID)Bioinformatics Resource Centers (BRC)

www.viprbrc.orgwww.fludb.orgIn addition to supporting laboratory research projects, the U.S. National Institute of Allergy and Infectious Disease supports a series of research resources.Three Genome Sequencing Centers at the University of Maryland, the J. Craig Venter Institute and the Broad Institute provide services for rapid and cost efficient production of high-quality, genome sequences and high-throughput genotyping of NIAID Category A-C priority pathogens, microorganisms responsible for emerging and re-emerging infectious diseases and their hosts, related organisms, clinical isolates, and invertebrate vectors of infectious diseases.The sequences and associated metadata are made available and integrated with related pathogen through one of five different NIAID Bioinformatics Resource Centers focused on different subsets of human pathogens.Two BRCs are focused on viral pathogens the Virus Pathogen Resource (ViPR) and the Influenza Research Database.3High Throughput SequencingEnabling technologyEpidemiology of outbreaksPathogen evolutionHost range restrictionGenetic determinants of virulence and pathogenicityMetadata requirementsTemporal-spatial information about isolatesSelective pressuresHost species of specimen sourceDisease severity and clinical manifestationsMetadata Submission Spreadsheets

11112233444Complex Query Interface

Metadata InconsistenciesEach project was providing different types of metadataNo consistent nomenclature being usedImpossible to perform reliable comparative genomics analysisRequired extensive custom bioinformatics system developmentGSC-BRC Metadata Standards Working GroupNIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programsDevelop metadata standards for pathogen isolate sequencing projectsBottom up approachAssemble into a semantic frameworkGSC-BRC Metadata Working Groups

Metadata Standards ProcessDivide into pathogen subgroups viruses, bacteria, eukaryotic pathogens and vectorsCollect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS)Identify data fields that appear to be common across projects within a pathogen subgroup (core) and data fields that appear to be project specificFor each data field, provide common set of attributes, including definitions, synonyms, allowed value sets preferably using controlled vocabularies, and expected syntax, etc.Merge subgroup core elements into a common set of core metadata fields and attributesAssemble set of pathogen-specific and project-specific metadata fields to be used in conjunction with core fieldsCompare, harmonize, map to other relevant initiatives, including OBI, MIGS, MIxS, BioProjects, BioSamples (ongoing)Assemble all metadata fields into a semantic network (ongoing)Harmonize semantic network with the Ontology of Biomedical Investigation (OBI)Draft data submission spreadsheets to be used for all white paper and BRC-associated projectsFinalize version 1.0 metadata standard and version 1.0 data submission spreadsheetBeta test version 1.0 standard with new white paper projects, collecting feedback

Data Fields:Core ProjectCore SampleAttributesorganismenvironmentalmaterialequipmentpersonspecimensource rolespecimencapture rolespecimencollector roletemporal-spatialregionspatialregiontemporalintervalGPSlocationdate/timespecimen Xspecimen isolationprocedure Xisolationprotocolhas_inputhas_outputplaysplayshas_specificationhas_partdenoteslocated_innamedenotesspatialregiongeographiclocationdenoteslocated_inaffiliationhas_affiliationIDdenotesspecimen typeinstance_ofspecimen isolationprocedure typeinstance_ofSpecimen Isolationplayshas_inputorganism parthypothesisis_aboutIRB/IACUCapprovalhas_authorizationenvironmenthas_qualityorganismpathogenicdispositionhas parthas dispositionIDdenotesCS1genderagehealth statushas qualityCS4CS5/6CS7CS2/3CS8CS9/10CS11/12CS13CS14CS18CS15/16Metadata Processesdata transformations image processingassemblysequencing assayspecimen source organism or environmentalspecimencollectorinput samplereagentstechnicianequipmenttypeIDqualitiestemporal-spatialregiondata transformations variant detectionserotype marker detect.gene detectionprimarydatasequencedatagenotype/serotype/gene dataspecimenmicroorganismenrichedNA samplemicroorganismgenomic NAspecimen isolationprocessisolationprotocolsample processingdata archivingprocesssequencedata recordhas_inputhas_outputhas_outputhas_specificationhas_parthas_partis_abouthas_inputhas_outputhas_inputhas_inputhas_inputhas_outputhas_outputhas_outputis_aboutGenBankIDdenoteslocated_indenoteshas_inputhas_qualityinstance_oftemporal-spatialregionlocated_inSpecimen IsolationMaterial ProcessingData ProcessingSequencing AssayInvestigationtemporal-spatialregionlocated_intemporal-spatialregionlocated_intemporal-spatialregionlocated_intemporal-spatialregionlocated_inquality assessmentassayQuality Assessmenthas_inputhas_outputOutcome of Metadata Standards WGConsistent metadata captured across GSCIDGuidance to collaborators regarding metadata expectations for sequencing and analysis servicesSupport more standardized BRC interface developmentHarmonization with related stakeholders Genome Standards Consortium MIxS, OBO Foundry OBI and NCBI BioSampleRepresented in the context of an extensible semantic frameworkConclusionsMetadata standards for microorganism sequencing projectsBottom up approach focuses standard on important featuresHarmonizing with related standards from the Genome Standards Consortium, OBO Foundry and NCBIBeing beta-tested by GSCIDs for adoption by all NIAID-sponsored sequencing projectsUtility of semantic representationIdentified gaps in data field list (e.g. temporal components)Includes logical structure for other, project-specific, data fields - extensibleIdentified gaps in ontology data standards (use case-driven standard development)Identified commonalities in data structures (reusable)Support for semantic queries and inferential analysis in futureOntology-based framework is extensibleSequencing => omics

SubgroupsPathogen-Specific Metadata Sub-GroupsVirusesBacteriaEukaryotic Pathogens/VectorsMerge WGRichard Scheuermann (ViPR/IRD) - ChairBruno Sobral (PATRIC) - ChairOmar Harb (EuPATHDB) -ChairDan Sullivan (PATRIC)Matthew Henn (Broad) ChairGranger Sutton(JCVI) ChairLis Caler (JCVI) ChairRebecca Will (PATRIC)Tim Stockwell (JCVI)Bill Nierman (JCVI)Dan Neafsey (Broad)Vivien Dugan (JCVI)Dave Wentworth (JCVI)Doyle Ward (Broad)Christina Cuomo (Broad)Ruchi Newman (Broad)Punam Mathur (NIAID)Dan Sullivan (PATRIC)David Roos (EuPATHDB)Joana Da Silva (UMD)Valentina Di Franchesco (NIAID)Tsega Belachew (NIAID)Brian Brunk (EuPATHDB)Chris Stoeckert (EuPathDB)Erin Hine (UMD)Punam Mathur (NIAID)Andrei Gabrielian (NIAID)Lynn Schriml (UMD)Tsega Belachew (NIAID)Dave Rasko (UMD)Punam Mathur (NIAID)Jie Zheng (EuPathDB)Ishwar Chandramouliswaran (JCVI)Laura Brinkac (JCVI)Joana Carneiro da Silva (UMD)Laura Brinkac (JCVI)Rebecca Will (PATRIC),Scott Emrich (VBI)Maria Giovanni (NIAID)Cheryl Murphy (Broad)Ishwar Chandramouliswaran (JCVI)Frank Collins (VBI)Tsega Belachew (NIAID)Lisa Sadzewicz (UMD)Cheryl Murphy (Broad)Tsega Belachew (NIAID)Punam Mathur (NIAID)Luke Tallon (UMD)Luke Tallon (UMD)Ishwar Chandramouliswaran (JCVI)Peter Dudley (NIAID)Vivien Dugan (JCVI)Justin Peteresen (Broad)Laura Brinkac (JCIV)Andrei Gabrielian (NIAID)Justin Peteresen (Broad)Andrei Gabrielian (NIAID)Cheryl Murphy (Broad)Valentina Di Franchesco (NIAID)Ruchi Newman (Broad)Peter Dudley (NIAID)Luke Tallon (UMD)Peter Dudley (NIAID)Alison Yao (NIAID)Chris Stoeckert (EuPathDB)Alison Yao (NIAID)Owen White (UMD)Justin Peteresen (Broad)Lynn Schriml (UMD)Lynn Schriml (UMD)Peter Dudley (NIAID)Alison Yao (NIAID)Lynn Schriml (UMD)Karen Nelson (JCVI)Karen Nelson (JCVI)Karen Nelson (JCVI)Karen Nelson (JCVI)Richard Scheuermann (ViPR/IRD)Richard Scheuermann (ViPR/IRD)Richard Scheuermann (ViPR/IRD)

Metadata WGMetadata Working Group MembersEmailsEmailsEmailsJCVIBroadUMDKaren NelsonKNelson@jcvi.orgBruce Birrenbwb@broadinstitute.orgClaire Fraser-Liggettcmfraser@som.umaryland.eduBill Niermanwnierman@jcvi.orgCheryl Murphycmurphy@broadinstitute.orgLuke Tallonljtallon@som.umaryland.eduGranger SuttonGSutton@jcvi.orgMatthew Hennmhenn@broad.mit.eduLisa SadzewiczLSadzewicz@som.umaryland.eduTim Stockwelltstockwell@jcvi.orgDan Neafsyneafsey@broadinstitute.orgErin Hineehine@som.umaryland.eduLis CalerEcaler@jcvi.orgChristina Cuomocuomo@broadinstitute.orgLynn Schrimllschriml@som.umaryland.eduDavid WentworthDWentworth@jcvi.orgSinead Chapmansemrich@nd.eduOwen Whiteowhite@som.umaryland.eduIshwar ChandramouliswaranIChandramouliswaran@jcvi.orgJustin Peteresenpetersen@broadinstitute.orgLaura Brinkaclbrinkac@jcvi.orgVivien DuganVDugan@jcvi.orgPATRICBruno Sobralsobral@vbi.vt.eduViPREuPATHDBRebecca Willrwill1@vbi.vt.eduRichard Scheuermanrichard.scheuermann@utsouthwestern.eduDavid Roosdroos@sas.upenn.eduRon Kenyonrkenyon@vbi.vt.eduBurke Squire'Richard.Squires@UTSouthwestern.edu'Bryan Brunkbrunkb@pcbi.upenn.eduDan Sullivandsulliva@vbi.vt.eduOmar Harboharb@pcbi.upenn.eduChris Stoeckertstoeckrt@pcbi.upenn.eduNIAIDVBIMaria Giovannimgiovanni@mail.nih.govScott Emrichsemrich@nd.eduValentina Di Franchescovdifrancesco@niaid.nih.govFrank Collinsfrank@nd.eduAndrei Gabrieliangabr@niaid.nih.govPunam Mathurmathurpu@niaid.nih.govTsega Belachewbelachewt@mail.nih.govPeter Dudleydudleype@niaid.nih.gov

Merge WGMemberEmailsDan Sullivandsulliva@vbi.vt.eduRebecca Willrwill1@vbi.vt.eduVivien DuganVDugan@jcvi.orgRuchi Newmanrnewman@broadinstitute.orgJoana Carneiro da Silvajcsilva@som.umaryland.edu;Chris Stoeckertstoeckrt@pcbi.upenn.eduMaria Giovannimgiovanni@mail.nih.govKaren NelsonKNelson@jcvi.orgRichard Scheuermanrichard.scheuermann@utsouthwestern.eduPunam Mathurmathurpu@niaid.nih.gov