Data Curation: Challenges and Opportunities for Research Libraries. Brian E. C. Schottlaender The Audrey Geisel University Librarian. Should I Talk About . declining: budgets? numbers of staff? transactions? closing branch libraries? rationalizing collections? - PowerPoint PPT Presentation
Data Curation: Challenges and Opportunities for Research LibrariesBrian E. C. SchottlaenderThe Audrey Geisel University Librarian
26 September 2012OSU Library Futures Seminar 11Should I Talk About declining:budgets?numbers of staff?transactions? closing branch libraries? rationalizing collections? repurposing space? bottom-up strategic planning? moving to a service programbased organizational structure?26 September 2012OSU Library Futures Seminar 22No, I Think Ill Talk About DATA CURATION26 September 2012OSU Library Futures Seminar 3OverviewThe Scholarly RecordStewardshipData CurationWhy do data need to be curated?Why should libraries curate data?What should research libraries do?26 September 2012OSU Library Futures Seminar 4The Scholarly Record?The scholarly record is
that which has already been written in all disciplines ... that stable body of graphic information, upon which each discipline bases its discussions, and against which each discipline measures its progress. Ross Atkinson. Text Mutability and Collection Administration. Library Acquisitions: Practice & Theory, Vol. 14 (1990)
26 September 20125OSU Library Futures Seminar What Does the Scholarly Record Include?E-only journals ReviewsPreprints and working papersEncyclopedias, dictionaries, and annotated content
Nancy L. Maron and K. Kirby Smith. Current Models of Digital Scholarly Communication: Results of an Investigation Conducted by Ithaka for the Association of Research Libraries (November 2008)26 September 20126OSU Library Futures Seminar ScholarlyPublishing(e.g., journal articles)
The Scholarly RecordStable26 September 20127OSU Library Futures Seminar LibrariesTrusted Third Parties(e.g., JSTOR, Portico)What Does the Scholarly Record Include? E-only journals ReviewsPreprints and working papersEncyclopedias, dictionaries, and annotated contentData resources
Nancy L. Maron and K. Kirby Smith. Current Models of Digital Scholarly Communication: Results of an Investigation Conducted by Ithaka for the Association of Research Libraries (November 2008)26 September 20128OSU Library Futures Seminar ScholarlyPublishing(e.g., journal articles)
The Scholarly RecordStable26 September 20129OSU Library Futures Seminar Libraries
Trusted Third Parties(e.g., JSTOR, Portico)
What Does the Scholarly Record Include?E-only journals ReviewsPreprints and working papersEncyclopedias, dictionaries, and annotated contentData resourcesBlogsDiscussion forumsProfessional and academic hubsNancy L. Maron and K. Kirby Smith. Current Models of Digital Scholarly Communication: Results of an Investigation Conducted by Ithaka for the Association of Research Libraries (November 2008)26 September 201210OSU Library Futures Seminar Scholarly Raw Material(e.g., archives, data)ScholarlyPublishing(e.g., journal articles)
The Scholarly RecordArchivesData Centers[Some in Libraries; Some Not]LibrariesStable
Infrastructures largely self-contained26 September 201211OSU Library Futures Seminar Trusted Third Parties(e.g., JSTOR, Portico)Less Stable
ScholarlyInquiry/Discourse (e.g., blogs, wikis, open notebooks?????Very unstableEmergentINPUTSOPERATORSOUTPUTSStewardship 1Stewardship is a core value that includes notions of mission, responsibility, integrity, trust, accountability, service, preservation and sustainability for future use.Sharon E. Farb. Libraries, Licensing, and the Challenge of Stewardship. First Monday,Vol. 11, No. 7 (3 July 2006)As a society and as educational institutions, we have a collective responsibility to preserve and make available, along a continuum of a life cycle, our digital heritage.
Jeffrey L. Horrell.Converting and Preserving the Scholarly Record: An Overview. LRTS, Vol. 52, No 1 (January 2008)26 September 201212OSU Library Futures Seminar Stewardship 2There is a need for a close linking between digital data archives, scholarly publications, and associated communication. The potential for an expanded role for research libraries in the area of digital data stewardship affords opportunities to address these important linkages.
Stakeholder groups have different expertise, outlooks, assumptions, and motivations Collaboration models to share expertise and resources will be critical.
26 September 2012OSU Library Futures Seminar 13To Stand the Test of TimeLong-Term Stewardship of Digital Data Sets in Science and Engineering: A Report to the National Science Foundation from the ARL Workshop on New Collaborative Relationship (2006)Stewardship 3Historically, universities have played a leadership role in the advancement of knowledge and shouldered substantial responsibility for the long-term preservation of knowledge through their university libraries. An expanded role for some research and academic libraries and universities, along with other partners, in digital data stewardship is a topic for critical debate and affirmation.The scale of the challenge regarding the stewardship of digital data requires that responsibilities be distributed across multiple entities and partnerships that engage institutions, disciplines, and interdisciplinary domains.To Stand the Test of Time (2006)
26 September 2012OSU Library Futures Seminar 14Data Curation: What Is It? The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose. Higher levels of curation will also involve maintaining links with annotation and other published materials.Philip Lord, Alison Macdonald, Liz Lyon, and David Giaretta. From Data Deluge to Data Curation. eScience All Hands Meeting 2004 (2004)
26 September 2012OSU Library Futures Seminar 15Data Curation: Whats It Include?DesignCreation or CollectionProcessingAnalysisAppraisalSelectionDescriptionDiscovery DisseminationRepurposingStoragePreservationEtc.
26 September 2012OSU Library Futures Seminar 16Curation Model
Panos Constantopoulos,et al.DCC&U: An Extended Digital Curation Lifecycle Model.The International Journal of Digital Curation, Issue 1, Vol. 4 (2009) 26 September 201217OSU Library Futures Seminar Actors As we move from small to large scale data sharing, where data are managed and maintained for broad access, we also are seeing an increase in the number and type of intermediaries. Intermediaries, in the form of organizations and the people who work for them, prepare data for reuse by eliciting, organizing, storing, packaging and/or preserving data, and by performing various roles in dissemination and facilitation Ixchel M. Faniel and Ann Zimmerman.Beyond the Data Deluge: A Research Agenda for Large-Scale Data Sharing and Reuse. The International Journal of Digital Curation, Issue 1, Vol. 6 (2011)26 September 2012OSU Library Futures Seminar 18 and StakeholdersDisciplinary expertsFunctional expertsDevelopersCuratorsPreservationistsUsersArchivesData CentersLibrariesInstitutionsProfessional SocietiesPublishersGovernments
26 September 2012OSU Library Futures Seminar 19The Curation Ecosystem 126 September 2012OSU Library Futures Seminar 20Systems ProvidersData Providers Service Providers FundersPolicy MakersData ConsumersThe Curation Ecosystem 2 the activities of curation are highly interconnected within a system of systems, including institutional, national, scientific, cultural, and social practices as well as economic and technological systems. Data curation is a nascent set of technologies and practices emerging in the context of this complex and rapidly evolving socio[economic]-technical ecosystem. Anna Gold. Data Curation and Libraries: Short-Term Developments, Long-Term Prospects. http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1027&context=lib_dean 26 September 2012OSU Library Futures Seminar 21
Why do data need to be curated?
The more effectively that data can be manipulated, mined, managed, analyzed and served to communities, the better the conduct of science can be supported. The more we can eliminate boundaries in this exponentially growing sea of data, the better data can be shared enabling multidisciplinary and collaborative research The more effectively students and faculty gain the data intensive knowledge and skills, the larger the impact will be on science and society.NSF-OCI Task Force on Data and Visualization. Report Draft Final (March 7, 2011)
26 September 2012OSU Library Futures Seminar 22
Why do data need to be curated?
Because data reuse requires it.Why do data need to be reused?Because trans-domain research requires it.Why is trans-domain research important?Because solving grand challenges requires it.Why is solving grand challenges important?Because they affect all of us.26 September 2012OSU Library Futures Seminar 23
Why do data need to be curated? 3
Because the government says so.26 September 2012OSU Library Futures Seminar 24Why Should Research Libraries Curate Data?Because we can:Research libraries, archives, and other stewardship institutions have the capacity to aggregate and hold data, manage metadata, deal with rights management and access, and help users.Because we must: uncurated data are as good as lost, even if the bits are storedforever, because they cannot be interpreted correctly.Because, left to their own devices, scientists wont: many if not most scientists focus on the shortest path to a particular scientific result rather than the best long-term solution for data reuse or data-service
NSF-OCI Task Force on Data and Visualization. Report Draft Final (March 7, 2011)
26 September 2012OSU Library Futures Seminar 25What Should Research Libraries Do?Stop waiting and start proactive engagement locally.Stake a claim in the production cycle.Start retraining and repurposing staff.Be a doer, not a broker, wherever possible.Consider digital curation collaborations.Actualize collaborative engagement. Tyler Walters and Katherine Skinner.New Roles for New Times: Digital Curation for Preservation. Association of Research Libraries (2011)26 September 2012OSU Library Futures Seminar 26What Have I Done?Reached out to the San Diego Supercomputer Center (on whose Executive Committee I sit) to co-create the campus Research Cyberinfrastructure Initiative (RCI), funded by the Chancellor.Leveraged the NDSA-funded Chronopolis Federated Preservation Environment to create a Research Data Curation Services Program.Hired a Director, and reallocated portions of two domain specialists and a metadata analyst to her.Created Sample Data Management Plans for various NSF Directorates.Launched five curation pilots in the Humanities and the Sciences.Joined DPN and am preparing to field-test Chronopolis as a DPN data triad.
26 September 2012OSU Library Futures Seminar 27And So 26 September 2012OSU Library Futures Seminar 28
And So 26 September 2012OSU Library Futures Seminar 29
And So 26 September 2012OSU Library Futures Seminar 30
An Example26 September 2012OSU Library Futures Seminar 31
And So 26 September 2012OSU Library Futures Seminar 32
ConclusionDigital scholarly output cannot be de-coupled from the raw material and inquiry operations that generate that output, at least not as easily as analog scholarly output can be. It cant be, it neednt be, and it shouldnt be. Its stewardship calls for a more expansive view of what constitutes the scholarly record, a view that encompasses more and different inputs, outputs, and stakeholders; and a more distributed and interoperant organizational and technical infrastructure.26 September 2012OSU Library Futures Seminar 33
QUESTIONS?26 September 201234OSU Library Futures Seminar