Workflows for Digital Preservation and Curation Workshop Open Repositories 2012

  • Published on

  • View

  • Download


Workflows for Digital Preservation and Curation Workshop Open Repositories 2012. Stacy Kowalczyk Beth Plale Kavitha Chandrasekar Yiming Sun. Agenda. Introduction to Digital Curation Workflow Systems Overview Workflows for Digital Curation Break Implementing Workflows in Trident - PowerPoint PPT Presentation


Workflows for Digital Preservation and Curation Workshop Workflows for Digital Preservation and Curation WorkshopOpen Repositories 2012Stacy KowalczykBeth PlaleKavitha ChandrasekarYiming Sun1AgendaIntroduction to Digital Curation Workflow Systems OverviewWorkflows for Digital CurationBreakImplementing Workflows in Trident Modifying a WorkflowCreate a new Workflow Creating ComponentsWrap up 7/10/122AcknowledgementsThis workshop was made possible through a generous grant by Microsoft ResearchAnd by the Data to Insight Center of Indiana Universitys Pervasive Technology InstituteQuan Zhou, Ph.D. student and developer, for his help with developing components, workflows, and documentation7/10/123Introduction to Digital CurationDefining curationInfrastructure for curationCurating the filesCurating the object7/10/124Defining CurationDigital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle.The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider research community.As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research.Digital Curation Center 7/10/1255Curation InfrastructureRepositoryPublic accessPoliciesProcessesInstitutional support7/10/1266Curating the FilesBitstream IntegrityFixityDuplicate copiesFile integrityFormat verificationFormat validation7/10/127File FormatsDurabilityTransparencyDocumentationUbiquityRenderabilityLongevity7/10/128Format ChoicesMaster files for preservationHighest qualityHighest fidelityLosslessDerivative files for active use and deliverySmallest possible for user needsFast deliveryEasy to use format7/10/129Curating the ObjectContext Relationships between filesTechnical metadataIntellectual metadataTo MetadataImplicit/explicit context7/10/1210Curation ActivitiesOngoing verificationFile integrityObject integrityMetadata managementManagement of obsolescenceHardwareSoftwareFormatsDocumentation7/10/1211Workflow SystemsPurpose of workflow systemsTypes of workflow systemsTrident Workflow Workbench7/10/1212Why Workflow SystemsRepetitive and mundane activities simplifiedFacilitates and enforces best practices Enables efficient scheduling Machinery for coordinating the execution of services and linking together resourcesFacilitates outreach to researchers for direct deposit and automatic curation7/10/1213Types of Workflow Systems7/10/1214KeplerBPELPtolemy IITrianaTaverna14TridentOpen source projectBased on Microsoft Workflow Foundation classesSupported by Microsoft Research and academic researchersIntegrates with myExperimentWell accepted in the research communitywell over 100 peer-reviewed and white papers were discovered from one scholarly aggregation service7/10/121515Trident ComponentsTrident Management StudioTrident Workflow ComposerTrident Workflow ApplicationMicrosoft SQL ServerTrident Silverlight client for web execution of workflowsMicrosoft Visual StudioC# development environment7/10/1216DesignVisual Workflow ComposerTrident RegistryWorkflow Packages(domain specific)Trident Runtime ServicesWindows Workflow Foundation.NET 4.0ProvenanceMonitoringWorkflow Scheduling ServiceAdminAdmin ConsoleWorkflow MonitorCommunityWeb PortalsearchLaunch MonitorWorkflow LauncherResults RepositoryWorkflow Repository (myExperiment)Data Access LayerData Object Model (data source abstraction layer)Data Storage Providers: SQL Server, Local XML store, Workflows for CurationGoalsSystematic and repeatable processesHelps remove human errorsData IngestIntegrity checksFormat normalization/derivative generationMetadata creationsCuration activitiesIntegrity checksFormat migrationMedia migration7/10/1218Data Ingest WorkflowsScenariosSingle part objects (individual images)Multi-part objects (a book)Multiple instantiations of a logical object (word, pdf and ppt of a research paper)Multiple multi-part objects (a group of letters)Research data products (multiple files of various types)Scientific workflow process7/10/1219Single Part Objects WorkflowMagic Lantern Slides Individual filesSpreadsheet7/10/1220Derivative GenerationFormat Validation andVerificationFixity CheckCreateTech MetadataCreate Intellectual MetadataCreate Object MetadataPersistentIdentificationDeposit in RepositoryImage Quality ChecksMulti-part Object WorkflowComic BookRISSet of .tif files7/10/1221CreateTech MetadataDerivative GenerationFormat Validation andVerificationFixity CheckObject IntegrityCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryImage Quality ChecksMultiple Instantiations of a Logical Object Workflow PapersEach logical object per subdirectoryRIS, word file and (perhaps) supplemental file7/10/1222Format NormalizationFormat Validation andVerificationFixity CheckCreateTech MetadataCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryDerivative GenerationMultiple Multi-part Object WorkflowBall collectionRIS for collection and Inventory spreadsheetEach logical object in separate subdirectory7/10/1223CreateTech MetadataDerivative GenerationFormat Validation andVerificationFixity CheckObject IntegrityCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryImage Quality ChecksCollection IntegrityCreate Collection MetadataResearch Data Products VortexEach subdirectory is an experiment with FGDC metadata 7/10/1224Compress DataFixity CheckCreate Intellectual MetadataCreate Object MetadataPersistentIdentificationDeposit in RepositoryWorkflow ComponentsFormat Conversions (for normalization and derivative generation).xlsx to .csv.docx to .pdf.ppt to .pdf.tif to .jpgZipping on demandImage (.tif or .jpg) to .pdf7/10/1225Workflow Components 2Context creationMIX data generator and validatorMETS data generator and validatorData IntegrityMD5 checksum generatorMD5 checksum validatorJHOVE for format verification and validationGroup validation (for object integrity)7/10/1226Post Deposit Curation WorkflowScenarios Fixity verificationFormat normalizationNew or additional derivative generationMedia migrationPersistent identifier updatesMetadata updates 7/10/1227Workflows in Trident7/10/1228Executing Workflows7/10/1229Individual object ingestMultipart object ingestMultiple multipart object ingestMultiple instantiations of a single logical objectResearch data ingestScientific workflow Fixity check curation workflowImplementing Workflows in Trident Launch the Remote Desktop applicationUser: AMAZONA-JJOAL14\oruserPWD: TridentOR12!!Computer ip addresses on slip of paper being passed out now.7/10/1230Trident Workflow Composer7/10/1231Participant Exercises7/10/1232Modifying Workflows Add components to existing workflowsSelect the Individual Ingest WorkflowAdd DOI componentBefore the METS generator componentMake the connectionsSelect the Group Ingest Workflow ComicAdd the METS generation componentAfter the last component in the main lineMake the connections7/10/1233Simple Curation Workflow CreationCreate a Workflow for a simple curation process validate MD5 checksumsDefine a directory of image filesDefine a METS fileDefine an out put locationLink the MD5 checksum validation componentLink the MD5 checksum report componentSave and execute the workflow7/10/1234Creating Components Exercise:Create a new Trident workflow componentImplement the MARCXML to MODS Stylesheet Kavitha Chandrasekar will demonstrate the process7/10/1235Wrap UpThumb drivesTrident codeplex siteTrident listservContributing to TridentWorkshop Evaluation FormOngoing conversation 7/10/1236Contacts for Further DiscussionTrident CodePlex site: Trident Listserv: Stacy Kowalczyk: skowalcz@indiana.eduKavitha Chandrasekar: Yiming Sun: Quan Zhou: 7/10/1237


View more >