Digital Preservation Cloud Services for Libraries and Archives

  • Published on

  • View

  • Download


  • 1.DLF 2011Baltimore, MDDigital Preservation CloudServices for Libraries and ArchivesQuyen L. Nguyen NARA

2. Outline Introduction LDPaaS Levels of Service and Cost Model Related Work ConclusionOct. 31, 2011 2011 DLF Forum 2 3. Functional Requirements Need for Long-Term Digital Preservation Policy mandates: retention of governments records Knowledge function: preserve digitized books and digital born materials History-oriented mandates: preservation of cultural heritage Challenges Rapid growth of digital objects that require archiving. Data heterogeneityOct. 31, 2011 2011 DLF Forum 3 4. Desired System Characteristics Dynamic Scalability Increase as well as decrease Cost-effective Maintainability Operation cost Patches: COTS, security. Evolvability Technology refresh New features and servicesOct. 31, 2011 2011 DLF Forum 4 5. Cloud Computing Characteristics Elasticity Computing and storage resources Three levels of cloud services: IaaS, PaaS, and SaaS. Quick Provisioning (e.g. Cloud Market [3]) Pay-as-you-go Cost-efficient Maintenance Economies of scale Maximizing utilization of computing resources Evolvability by configurationOct. 31, 2011 2011 DLF Forum5 6. OAIS Reference ModelOct. 31, 20112011 DLF Forum 6 7. LDPaaS Long-term Digital Preservation as a Cloud Service Encompass major OAIS functionalities Not only storage service, But also preservation service according to customers policies: retention period, preservation level, and access level. Beneficial to Cloud Service Consumer Relieve records owners from the burden of engineering and provisioning preservation infrastructure Beneficial to Cloud Service Provider Realize economies of scales by sharing unused computing resourcesOct. 31, 2011 2011 DLF Forum7 8. Ingest Provisioning Challenges Unpredictability due to business policies Uneven flow of transfer volume Various object sizes, hence object numbers Various object types Cloud Computing benefits: Computation resources File format identification and Application of Integrity Seal Storage resources: Ingest processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 8 9. Access Provisioning Challenges Unpredictability of publishing Volume of publishable data sets Spikiness of Access request load Access types: Storage Delivery Networks vs ContentDelivery Networks. Cloud Computing benefits Computation: access-time visualization, zooming, conversion to access format Storage: High-efficiency Access disk cacheOct. 31, 20112011 DLF Forum 9 10. Preservation Provisioning Challenges Prominent preservation methods: Bit-level: error detection and correction capabilities Transformation Computing resources for transformation processes Storage served as a scratchpad for transformation. Emulation: virtual machine requirements. Cloud Computing benefits Computation: Execution of Preservation Algorithms Storage: Preservation Processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 10 11. Storage Provisioning Challenges It is all about Storage capacityScale of Storage Requirement May be Best Suited to Function as Hyper Large-Scale Cloud Provider Moderate-to-Small-Scale Cloud Consumer Could there be a Community Cloud?Oct. 31, 20112011 DLF Forum11 12. Software Paradigms Virtualization Structural Object- SOACloudorientedOct. 31, 2011 2011 DLF Forum12 13. System ArchitectureOct. 31, 2011 2011 DLF Forum 13 14. SOA-based Ingest Process Ingest Process implemented as composite serviceVirus Scan Could be DROID implemented byFile Format IdentificationBPEL. JHOVEMetadataIngestExtraction Integrity Seal Move to Preservation StorageOct. 31, 2011 2011 DLF Forum14 15. LDPaaS Levels of Service ServiceLevels Ingest IL1: Transfer OnlyIL2: With Format IdentificationIL3: Metadata Extraction Preservation PL1: BitPL2: ContentPL3: Content, Behavior & Formatting DiscoveryDL1: Metadata searchDL2: Full content search Access AL1: Passive ViewerAL2: Interactive ViewerAL3: Content Mining StorageSL1: Delayed Access - Near-Line StorageSL2: Rapid Access - High Performance Storage Content Server CL1: Just-in-Time ActiveCL2: Always ActiveOct. 31, 2011 2011 DLF Forum 15 16. Level of Service DefinitionsDefinition 1.Each Content Server has a set of LoS formalized by the following 6-tuple:C = (CL, IL, PL, DL, AL, SL).Definition 2.Since a customer can have one or more Content Servers, a customersSLA is specified by the n-tuple:L = (C1, , Cn), if the customer has signed up for n Content Servers,with each Ci being a 6-tuple defined according to Definition 1.Oct. 31, 20112011 DLF Forum 16 17. LoS - Example 1Digital Library RepositoryDefine Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2) Content ServerCL1 - Active Just-in-Time - this repository is sporadically used Ingest ServiceIL2 - File Format Identification Preservation ServicePL2 - Preservation at the Content LevelDiscovery Service: DL1 -Metadata Search Access Service: AL2 - Interactive Viewer is provided for access. Storage ServiceSL2 - Rapid Access, High Performance Disk - the volume is staticOct. 31, 20112011 DLF Forum17 18. LoS - Example 2 Digital Library Repository for Research PublicationsTwo Sets of Records Stored in Two Different Content Servers: C1 and C2C1 - Relatively Small Volume of High-Demand Digital AssetsC1 = (CL1, IL2, PL3, DL1, AL1, SL2)CL1 - Active Just-in-Time Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL1 - Metadata SearchAL1 - Passive ViewerSL2 - High Performance, Rapid Access StorageC2 - Backend Repository, Volume Increasing with TimeC2 = (CL2, IL2, PL3, DL2, AL1, SL1)CL2 - Always Active Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL2 - Full Content SearchAL1 - Passive ViewerSL1 - Delayed Access StorageOct. 31, 20112011 DLF Forum18 19. LoS - Example 3 Sarbanes-Oxley Act Compliance Business ArchiveRetain and Preserve Records in a Sliding Time Window of Seven YearsC1 = (CL1, IL2, PL1, DL2, AL1, SL1)PL1 - Preservation Service at the Bit LevelRetention Period of Seven Years Elaborate Preservation not NeededSL1 - Delayed Access StorageArchive Intended for Audit Purposes Only - Rapid Access to Data not EssentialOct. 31, 2011 2011 DLF Forum19 20. Cost Model Cost is one of the crucial elements in Cloud Computing Let O = (V, N) be the Body of N Digital Objects and totalvolume V Cost (O, Service) depends on the level of service. Function of V or N or both. Examples:fIL1 - Utilization Cost for Digital Object Transfer, varies with V fIL2 - File Type IdentificationVary with N fIL3 - Metadata Extraction TOTAL COST (O,C) = Cost (O, Service), wherewhere Service = {Ingest, Preservation, Discovery, Access, Storage}Oct. 31, 20112011 DLF Forum20 21. Cost Model Example Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume :fCL2 (V,N) = 20V + 100 N; fIL2 (N) = 10 N;fPL1 (V) = 20 V;fDL1 (N) = 30 N;fAL1 (V) = 30 V;fSL1 (V) = 40 V. For Set O1 of Objects with V1 = 10 GB and N1 = 106totalCost(O1,C1) = 140,000,740 For Set O2 of Objects with V2 = 103 GB and N2 = 102totalCost(O2,C1) = 88,000 Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1Oct. 31, 2011 2011 DLF Forum 21 22. Related Work CiteSeer study by Teregowda [2]: Examine each service in the architecture stack in terms of feasibility and cost of migrating and hosting in the Cloud. Possible integration with Cloud Storage thanks to current virtualized storage component. DuraCloud [5]: Open source platform for digital libraries and archives Adapters to commercially available Cloud Storage services Strategies and SLAs for bit-level preservation by Zierau [6]: Various sub-levels of bit-preservation. archives and indexes datafrom websites and social networks. - Long-Term Digital Retention andPreservation Reference Model: cloud-based digital archive.Oct. 31, 2011 2011 DLF Foruml22 23. Conclusion Proposed LDPaaS concept: why is it useful? Beneficial to large organizations Beneficial to small organizations Notional cost model useful for establishing a pricemodel associated with published SLA set. Contend that Cloud Storage Service vendors canaugment their portfolios to provide LDPaaS. Community Cloud for Preservation Environment for more collaboration and sharingOct. 31, 2011 2011 DLF Forum23 24. References 1. Michael Armbrust et al. A View of Cloud Computing. Communications of the ACM,Volume 53, No 4, April 2010. 2. P. Teregowda, Burgaonkar, B. and C. L. Giles. Cloud Computing: A DigitalLibraries Perspective. 2010 IEEE 3rd International Conference on Cloud Computing,Miami, FL, July 2010. 3. Stephen Abrams, Patricia Cruse, and John Kunze. Preservation Is Not a Place.The International Journal of Digital Curation, Issue 1, Volume 4, 2009. 4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben OSteen, Neil Jefferies, andLeslie Carr. Towards Smart Storage for Repository Preservation Services. TheInternational Journal of Digital Curation, Issue 1, Volume 5, 2010. 5. DuraCloud. Available: 6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. Evaluation of Bit PreservationStrategies. 7th International Conference on Preservation of Digital Objects(iPRES2010), Sep. 19-24, 2010, Vienna, Austria.Oct. 31, 20112011 DLF Forum24 25. Disclaimer The content of this presentation is the personal opinion ofthe author and does not necessarily reflect any position ofthe U.S. Government or the National Archives and RecordsAdministration.Oct. 31, 20112011 DLF Forum25 26. Thank You! Any questions?mailto:quyen.nguyen@nara.govOct. 31, 2011 2011 DLF Forum 26


View more >