ItemOnline Scientific Data Curation, Publication, and Archiving(Microsoft Corporation, 2002-07) VandenBerg, Jan; Stoughton, Christopher; Thakar, Ani R.; Szalay, Alexander S; Gray, JimScience projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. De- rived data can be reconstructed from metadata, but meta- data is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub- lished scientific data needs to be available forever – this gives rise to the data pyramid of versions and to data in- flation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Sur- vey (SDSS) strategies for data publication, data access, curation, and preservation. ItemOpen Access from Digital Library Viewpoint(2009-12-04) Choudhury, SayeedJohns Hopkins University has recently received two awards through the US National Science Foundation (NSF) that relate to open access issues. The first award is through the NSF DataNet program that focuses on the development of data curation infrastructure and the second award relates to a feasibility study of an open access repository of NSF funded research. Both awards consider specifically the potential impact of open access to both data and publications, particularly as they relate to scientific research and learning across multiple disciplines or domains. ItemData curation: An ecological perspective(College and Research Library News, 2010-04) Choudhury, SayeedThe library community has shown a great deal of interest regarding potential roles to support new forms of scholarship often called “eScience.”1 Scientific research has indeed become increasingly data-intensive, but the “eScience” label omits the humanities and social sciences, where scholars from a diverse range of disciplines are also exploring new modes of research and teaching using data. For example, social scientists are accessing data from fields such as the health sciences and environmental sciences and using tools such as geographical information systems to study the connection between health and personal relationships or environmental conditions. The National Endowment for Humanities (NEH) recent solicitation “Digging into Data” represents important acknowledgment of such developments within the humanities. Created in part to “promote the development and deployment of innovative research techniques in large-scale data analysis,” this program follows others in adopting a broad definition of data to include almost any information that can exist in digital form.2 While NEH administered this solicitation, three other agencies—the National Science Foundation (NSF), the UK Joint Information Systems Committee (JISC), and the Canadian Social Sciences and Humanities Research Council (SSHRC)—provided funding to support Digging into Data. This diverse combination of funding agencies from three countries provides evidence of widespread, growing interest for data-driven scholarship within the humanities. Fundamentally, there is a shift from a document-centric view of scholarship to a data-centric view of scholarship, which has promoted recent developments of cyber-infrastructure. ItemBuilding our Brand: Researchers supporting Research Teachers supporting Teaching(Johns Hopkins University Libraries, 2010-01-16) Tabb, Winston; Jordan-Mowery, Sonja; Dalrymple, Candice; Choudhury, SayeedThe overarching goal of DC is to support new forms of inquiry and learning to meet these challenges through the creation, implementation, and sustained management of an integrated and comprehensive data curation strategy. ItemNSF DataNet: Curating Scientific Data(Georgia Institute of Technology, 2009-05-18) Choudhury, Sayeed; Kunze, JohnThe National Science Foundation (NSF) DataNet program aims to create "a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning." The DataNet program calls upon a diverse group of researchers and practitioners, including domain scientists, computer scientists, information scientists, digital librarians, and archivists, to come together into new organizations that can address the research and development challenges associated with developing and sustaining data curation infrastructure. Through the first round of the DataNet program, two pending awards will be made to DataNet partners led by Johns Hopkins University and the University of New Mexico. This panel session would provide an overview of both awards (described below) and offer a chance for discussion and engagement with the Open Repositories audience. DataONE (Observation Network for Earth): Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science / John Kunze In order to understand the nature and pace of change of life on Earth, nothing less than a new type of distributed organization is required to provide perpetual access to the data needed by scientists, decision-makers, and citizens. DataONE is proposing to develop this organization to meet the needs of science and society for open, robust, and secure access to well-described and easily discovered Earth observational data. The foundation of DataONE success is established partnerships among participating institutions that have multi-decade expertise in a wide range of fields: libraries, archives, environmental observing systems and research networks, data and information management, science synthesis centers, and professional societies. DataONE will address four key challenges: data loss, data dispersion, data deluge, and poor data practice. DataONE will enable the creation of new insights and knowledge through universal access to data concerning life on earth and the environment that sustains it. The Data Conservancy: A Digital Research and Curation Virtual Organization Sayeed Choudhury The Data Conservancy (DC) is one of two pending awards through the first round of the National Science Foundation's DataNet program. DC embraces a shared vision: data curation is not an end, but rather a means to collect, organize, validate, and preserve data to address grand research challenges that face society. The overarching goal of DC is to support new forms of inquiry and learning to meet these challenges through the creation, implementation, and sustained management of an integrated and comprehensive data curation strategy. Through a well-defined management policy, DC will provide the foundation for a diverse, international team to iteratively develop, deploy, and evaluate infrastructure in a manner that combines rapid implementation with research, all with continual progress toward sustainability. A user-centered design methodology will be employed to guide the immediate development process, coupled with innovative longer-term information science research to fully understand data practices and curation across our initial disciplinary base of astronomy, biodiversity, earth sciences, and social sciences. ItemDigital Data Preservation and Curation: A Collaboration Among Libraries, Publishers, and the Virtual Observatory(2006-10-27) Plante, Ray; Milkey, Robert; Vishniac, Ethan; Szalay, Alex; Dilauro, Tim; Choudhury, Sayeed; Steffen, Julie; Hanisch, RobertDigital Data Preservation and Curation: A Collaboration Among Libraries, Publishers, and the Virtual Observatory. Astronomers are producing and analyzing data at ever more prodigious rates. NASA's Great Observatories, ground-based national observatories, and major survey projects have archive and data distribution systems in place to manage their standard data products, and these are now interlinked through the protocols and metadata standards agreed upon in the Virtual Observatory. However, the digital data associated with peer-reviewed publications is only rarely archived. Most often, astronomers publish graphical representations of their data but not the data themselves. Other astronomers cannot readily inspect the data to either confirm the interpretation presented in a paper or extend the analysis. Highly processed data sets reside on departmental servers and the personal computers of astronomers, and may or may not be available a few years hence. We are investigating ways to preserve and curate the digital data associated with peer-reviewed journals in astronomy. The technology and standards of the VO provide one component of the necessary technology. A variety of underlying systems can be used to physically host a data repository, and indeed this repository need not be centralized. The repository, however, must be managed and data must be documented through high quality, curated metadata. Multiple access portals must be available: the original journal, the host data center, the Virtual Observatory, or any number of topically-oriented data services utilizing VO-standard access mechanisms. ItemThe Data Conservancy: Building a Sustainable System for Interdisciplinary Scientific Data Curation and Preservation(2009) Hanisch, Robert; Choudhury, SayeedThe Data Conservancy (DC) is one of two awards through the US National Science Foundation’s DataNet program. The goal of the DataNet program is to create “a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance sci- ence and/or engineering research and learning.” The DC embraces a shared vision: data curation is not an end, but rather a means to col- lect, organize, validate, and preserve data to address the grand research challenges that face society. The overarching goal of The Data Conservancy is to support new forms of inquiry and learning to meet these challenges through the creation, implementation, and sustained management of an integrated and comprehensive data curation strategy. DC will address this overarching goal with a comprehensive project comprising four inter- dependent threads: 1) infrastructure research and development, 2) computer science and information science research, 3) broader impacts, and 4) sustainability. The DC is led by the Sheridan Libraries at Johns Hopkins University. Working with the Sloan Digital Sky Survey data and the US National Virtual Observatory, the Sheridan Libraries have developed an initial architectural design, data models and metadata pro- files, and organizational models to support data curation. The DC will build upon these initial lessons learned from the partnership between the library and astronomy commu- nity and extend them into the life sciences, earth sciences, and social sciences. Use cases will provide the initial framework for technical requirements. A robust information sci- ence and computer science research agenda will highlight the scientific requirements and inform the development of a data framework for observations and a theoretical frame- work for data curation. These activities will guide the development of new curriculum at library and information science schools thereby building capacity for a new generation of data scientists. One of the central tenets of DC’s sustainability plan relates to the leadership role of the library. The Sheridan Libraries at Johns Hopkins University have established a leader- ship position in prototyping data curation systems and services, especially as they relate to astronomy. One of the key outcomes of DC will be a new model for libraries in the digital age. There are several fundamental implications for libraries in the realm of data curation as they relate to collections, services, and infrastructure. The North American Association of Research Libraries has already engaged the DC in its effort to consider these implications strategically as a means to transform the library’s role and contribu- tions toward building and sustaining data curation infrastructure. ItemCollecting for Digital Repositories: Data Perspective(2009-07-12) Choudhury, Sayeed ItemThe Data Conservancy: A Blueprint for Research Libraries in the Data Age(2010-03-26) Choudhury, SayeedThe Data Conservancy (DC) is one of two awards through the National Science Foundation (NSF)'s DataNet program. DC shares a common vision: Data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society. The Sheridan Libraries at Johns Hopkins University is the lead organization for DC. The Sheridan Libraries have been focused on data curation for years and have already developed prototype tools and resources and established fee-based agreements for data curation. Fundamentally, DC will develop a blueprint for research libraries to support new forms of data-driven science. By acting as a laboratory for infrastructure development, research studies, and educational internships, the Sheridan Libraries will ultimately incorporate data curation into its overall collection development strategy. Libraries have long supported a diverse community of researchers and DC represents an opportunity for libraries to honor their principles but adapt their practices for the data age. Ultimately, this strategy will result in greater sustainability by leveraging the support for research libraries within universities and academic centers. ItemInitiatives from the NSF's DataNet Program: DataONE and the Data Conservancy(EDUCAUSE, 2009) Lynch, Clifford; Choudhury, SayeedThis session, from the EDUCAUSE 2009 Annual Conference, features reports on two initiatives from the National Science Foundation's DataNet Program. DataNet Observation Network for Earth (DataONE) represents a new virtual organization that will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. The Data Conservancy (DC) embraces a shared vision: data curation is not an end, but rather a means to collect, organize, validate, and preserve data to address the grand research challenges that face society. This session is presented by Sayeed Choudhury, Associate Dean for Library Digital Program at the Johns Hopkins University; and Clifford Lynch, Executive Director for the Coalition for Networked Information. ItemThe Data Conservancy: A Web Science View of Data Curation(Georgia Institute of Technology, 2010-02-25) Choudhury, SayeedThe Data Conservancy is one of two initial awards through the National Science Foundation's DataNet Program. The Data Conservancy shares a common vision that data curation is not an end, but rather a means to provide persistent access to a variety of scientific data for addressing grand challenge research problems. In addition to the infrastructure development that lies at the core of the Data Conservancy, the project team is directly focusing on a semantic view of data and other forms of content as compound objects that describe a full picture of the scientific process. This presentation will feature an overview of the Data Conservancy with an emphasis on the data framework aspects of the project.