Principles for Sustainable Data Curation;

  • Published on
    20-Feb-2016

  • View
    25

  • Download
    1

DESCRIPTION

Principles for Sustainable Data Curation;. Steven Worley Computational and Information Systems Laboratory NCAR. Can Research Library Repositories Benefit from the Federal Lab Experience?. Topics. My perspective Research Data Archive @ NCAR Principles for Sustainable Data Curation - PowerPoint PPT Presentation

Transcript

TIGGE Archive Access at NCARPrinciples for Sustainable Data Curation;Steven WorleyComputational and Information Systems LaboratoryNCAR1Pleasure to Speak to leadership of the Association of Research LibrariansOver the past 5-years the data curation and stewardship community has been drawing closer to the Library community for good reasons.By working together we can better support science research and productivityToday is another chance to continue that conversationTITLE , dryCan Research Library Repositories Benefit from the Federal Lab Experience?2Best outcome for me is you take away some best practices that can be applied in your libraries as they develop digital repositories.TopicsMy perspective Research Data Archive @ NCARPrinciples for Sustainable Data CurationStable FundingKnowledgeable Staff Robust Digital StorageProtection from LossData and Metadata FormatPartnershipsData Management Evolution321 March 2012ARL, Leadership FellowsMy perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows4Operational and Reanalysis Model OutputsMeteorological and Oceanographic ObservationsRemote Sensing ObservationsTopography, Bathymetry, Vegetation, and Land UseCore Data CategoriesSuite of information to support Earth Systems Research4My perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows5Purposes Support climate & weather research at NCAR and UCAR Universities Extend data service worldwide Basic MetricsEstablished in 1960s600+ datasets, +4M files+70 datasets growing daily - monthlyMy perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows621 March 2012ARL, Leadership Fellows7USInternationalDataAssistanceFeedbackManagementSupervisionGuidanceIntegrityAccessArchivingMetadataData IntegrityPreservationCurationSteward-shipUsersRequests andNeedsArchivingMetadataData IntegrityPreservationSimplified Data Life Cycle, focus on the Curation part today7Sustainable Curation - Stable FundingPermits:FlexibilityEvolution of data management to meet expectationsHolistic approach not driven by narrowly defined projectsTake advantage of unplanned opportunitiesNecessary to keep collection viable for long-term21 March 2012ARL, Leadership Fellows8Sustainable Curation - Knowledgeable StaffData domain knowledge enables:Understand data and do integrity checksChoose data organization to fit science disciplineDesign appropriate access systems and do consultingConsistent staffing levels nurtures:Professionals dedicated to best practices Human-based knowledge cannot be under estimated21 March 2012ARL, Leadership Fellows9Big challenge for new repositories that have a broad data scopeWe find 5-10 years experience are needed to create a data scientist expert9Sustainable Curation Robust Digital Storage Keep pace with digital media evolution:Expect data migration every 2-5 yearsTape, disk capacity, etc.Plan, test, and implement migration carefullyMistakes are irrecoverable!Use knowledgeable staff heavilyWhy evolve?Users expect more data with faster accessMedia will eventually fail21 March 2012ARL, Leadership Fellows10This rate of media evolution is new for Librarian experts10Sustainable Curation Protection from LossCreate backup data and test disaster recoveryWhy?Physical failuresEnvironmental: Power outage, Fire, Flood, ..Hardware: Disk system failure, Tape degradationPoor curation practicesMetadata lossAccidental data over-writes and deletionsSolutionsStore backup at separate physical locationTreat metadata and data as equals - couple together21 March 2012ARL, Leadership Fellows11If you lose metadata access may not be possible (documentation, software etc)No question, it is not if this will happen, but WHEN!11Sustainable Curation Protection from Loss21 March 2012ARL, Leadership Fellows1212Sustainable Curation Protection from Loss21 March 2012ARL, Leadership Fellows13RDA : 40%13Sustainable Curation Data and Metadata FormatFormats are a serious consideration because:Must maintain data access for long-termHow?Insist that data and metadata are in standard formatsAvoid computer OS dependent formatsWorry about application driven formatsE.G.: .xls, .xlsx, .doc, .docx, .ppt, .pptx, etc.Challenge; Scientist are reluctant to helpCurators nightmare; never ending data and metadata format diversity 21 March 2012ARL, Leadership Fellows14Proprietary formats, e.g. instrument dependent output definitely out!Data in MicroSoft work books is especially hard to deal with exporting it is error prone, e.g. empty cells, merged cells, etc DIVERSITY => UNSCALABLE SYSTEM, very expensive14Sustainable Curation PartnershipsScience productivity is enhanced by partnershipsOpen sharing of data and metadataRelies heavily on standardsNo one archive or repository can do it allBUT, users need/want it allCost saving by sharing21 March 2012ARL, Leadership Fellows1515Data Management Evolution Person-centric21 March 2012ARL, Leadership Fellows161960s to 1990s16Data Management Evolution Metadata-centric21 March 2012ARL, Leadership Fellows171990s 2010s17Summary: For Research Library Repositories 1821 March 2012ARL, Leadership FellowsSustainable Data CurationStable FundingKnowledgeableStaffRobust Digital StorageProtection fromLossData/MetadataFormatPartnershipsCuration is support by best practices in six areas, bundled together in an operational system will facilitate susttainability1821 March 2012ARL, Leadership Fellows19Research Data Archive @ NCARhttp://dss.ucar.edu/worley@ucar.eduWord cloud19