Open Source Options for Digital Curation. Library 2.012 October 4, 2012 Christinger Tomer University of Pittsburgh. Definitions of Digital Curation. - PowerPoint PPT Presentation
Open Source Options for Digital Curation
Open Source Options for Digital CurationLibrary 2.012October 4, 2012
Christinger TomerUniversity of PittsburghAccording to the Digital Curation Centre in the U.K., digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle and covering the stewardship of data from the point of conceptualisation to its eventual disposal. It is based on the presumption that such data has multiple uses and uses in other contexts..Definitions of Digital CurationAn Illustration of the DCC Curation Lifecycle ModelSee http://www.dcc.ac.uk/resources/curation-lifecycle-modelThe Role of Libraries in Digital CurationHeidorn argues that "[i]ncreasingly, data are being recognized as first-class intellectual objects that can undergo quality checks, peer review, distribution, and reuse. The reuse of data contributes as much to society as the reuse of a concept in a journal article. The data set can be cited and contribute to the reputation of the creator of the data for good or ill." He goes on to assert that "[l]ibraries ..... have a duty to society to collect, preserve, and disseminate the intellectual output of the societyincluding this data." (From P. Bryan Heidorn (2011): The Emerging Role of Libraries in Data Curation and E- science, Journal of Library Administration, 51:7-8, 662-672.)
Key Factors in Digital CurationIdentity -- "Identity is contextual: some objects are associated with information that allows identification only within a limited context (e.g., an object may be uniquely identified only within the context of objects residing on the same server), while others have enough information to make them globally identifiable (e.g., a global identifier such as a GUID or ISBN).
Authenticity and UnderstandabilityEvaluation of the understandability of data requires that there be sufficient context (documentation, meta- data, or provenance) describing the data, and that the data is usable.
PersistenceSee: Sally Vermaaten, et al. Identifying Threats to Successful Digital Preservation: the SPOT Model for Risk Assessment. D-Lib Magazine 18 (September/October 2012): 8.
More Key FactorsRenderability -- "the property that a digital object is able to be used in a way that retains the object's significant characteristics," meaning that the hardware and software necessary to render the object are available or may be reproduced through emulation.IntegrityIntegrity of data assumes that the data can be proven to be identical, at the bit level, to some prior accepted or verified state. Data integrity may be required for usability, understandability, authenticity, trust, and thus overall quality. Access and UsabilityData GeneratorsData SeekersData ConsumersArchonArchon's designers refer to it as a "simple archiving" system, but its effective use does require a working knowledge of standards for archival description. Its preview feature is especially effective in the treatment of visual materials.Archon, which has been developed by a group from the University of Illinois, supports the creation of records conforming to MARC and EAD, as well as their import and export.ICA AToM allows creators to build what are effectively compound documents and user to scroll through thumbnails of the documents on the interfaceICA AToMArtefactual Systems in British Columbia is developing ICA AToM 2.0, which will be available as open source, community-supported software and as a fee-based serviceResourceSpaceDocument View under ResourceSpaceCWISIslandoraIslandora is a hybrid, combining the Fedora Commons repository system as the back end with Drupal, the LAMP-based CMS, as the front end. This hybridized approach is gaining in popularity among developers, who believe that successful design must provide a place for narrative treatments.The idea underlying Islandora is that a creator places an object in the repository, Fedora Commons, and then links that object to other materials, e.g., text, images, et al., that are mounted through the CMS, which is a Drupal instance.OmekaOmeka is based on the "LAMP" architecture. Perhaps its most important feature is its modular design.Omeka's ModularityOmeka supports a wide array of plugins that have been designed to enhance the functionality of the system. In this illustration, one of the examples is the Creative Commons Chooser, which allows the creator of an object to select the appropriate license from the entry interface.Omeka.netDSpaceDSpace is based on Java and Apache Tomcat and will run with equal facility on *nix or Windows systemsePrints3ePrints is another LAMP-based system, distinguished by its reliance on PERL and popularity, which owes much to its ease-of-use, particularly in the generation of metadata.DSpace, with Manakin Interface
HubZero Client InterfaceHubZero, which was developed at Purdue University, is based on Joomla, the content management system, and uses MySQL as its back-end. The main aim of the system is to provide a platform on which researchers can mount and annotate datasets.Penn States ScholarSphere
Released on September 24, 2012, ScholarSphere is another hybrid system, based on Hydra, a Ruby-on-Rails front-end and Fedora Commons.The Common Sense of IR PlusIR Plus is a system developed by the University of Rochester Libraries. It is another variation on the hybrid theme, in this case it uses Apache Tomcat's WebDAV extension to support personal file storage and public archiving, with depositors able to make materials mounted in the personal storage area available to collaborative groups and/or the public by toggling a software switch. This design is intended to reduce the friction associated with the use of other archiving/repository systemsSharing & Publishing under IR Plus
Interoperability and Related IssuesAre Key Archival and/or Bibliographic Standards such as MARC and/or EAD Supported?Does the System Support the Open Archives Initiative's Metadata Harvesting Protocol?To What Extent is Content Exportable? To What Extent is the System Itself Portable?Is the system extensible?
Ease-of-UseDoes the creation of objects within the System require a professional level knowledge of metadata generation?What are the characteristics of the workflow? Does the workflow support multiple roles?Does the system incorporate lookup features based on Web APIs?How does the system support the organization of objects once they have been mounted?Documentation and SupportIs the System Supported by an Active Documentation Project? What is the quality of the documentation that is available?Are their user forums through which questions and configuration, content creation, and/or bugs may be addressed?How often is the software updated?In the case of extensible systems, how productive are the developer communities providing extensions, plugins, themes, etc.?
Factors in Evaluating Open Source SoftwareLicenseActivity and Age of the ProjectUnit TestsCode QualityBase Use TestModification Test