Research Data Mgmt Systems

NOTES: Michal Strutin, SCU

The NSF mandate for submitting data management plans with a grant proposal is a game-changer. Yet, their mandate does not have teeth and no way to assess if data plans have been carried out. Canada will implement a similar requirement this year. There are many systems, but there’s a need to be on same page. REDCap is one of the most used data management systems in clinical sciences. REDCap is a free database that allows limited access and other options.

How to work with faculty to meet their data needs: storage, organization, preservation. An IR is one way. Storing large data sets costs money. How to cost these models?  DPN (Digital Preservation Network) is one organization preserving academic data. Perhaps Stanford or HathiTrust could be a backup repository for smaller schools. Because of the NSF mandate, we need not only storage, but also workshops on data creation, metadata, management.

At the RDAP (Research Data Access and Preservation) Summit, they heard representatives from NOAA, NIH, etc. But it seems these federal agencies have no intention of putting teeth in data management plans.

Will researchers be able to reuse their own data?

NSF points researchers to different directorates for data management. NSF says university is responsible for data preservation, not scientists. Grant is with the institution not the PI. The university owns the data.

New England Libraries have an eScience Portal and a New England eScience Day.

OSTP (White House Office of Science and Technology Policy): all agencies must come out with plans to make data more available.

Has anyone’s institution said, “Data management is our responsibility and we’re going to do it.”? Doris Helfer (CSU-Northridge) says her institution has committed. CSUN Library is working with IT and the provost is committed. They are using their IR (DSpace platform). If an item is not appropriate for the IR, the IR will point to it.

Someone else is using ePrints.

Eugene: UCB has 8 terabytes (accommodating videos, etc.) and their library absorbs the cost, for now. But cost is a major point.

Retention schedules: how long will you keep the data?

DSpace is a data access system not a data management system.

What kinds of metadata does science support? Many types: Darwin Core, for instance. But the sciences do not have data dictionaries as they have in social sciences. There’s no standardization. DSpace uses Dublin Core.

Some science depts. have hired people to standardize metadata. One attendee says their institution has a data librarian, who explains data management to researchers and shows what is needed so someone who discovers the data can understand what they’ve found; e.g., spreadsheets with column heads and an overview of the data. (Sarah E Lester, Stanford)

Need web form with standardized fields to connect metadata fields. And should be able to customize fields.

Dataverse uses DDI (Data Documentation Initiative) standards. Dataverse (OA) allows you to analyze data.

Is anyone issuing DOIs for data sets? Michael Golden (Lawrence Berkeley Natl Lab) says his institution will. Some authors don’t want separate data DOIs because PIs want citations coming to their articles not to their data. Michael Habib (Scopus/Elsevier) says there’s a desire for DOIs.

Two librarians say their schools are trying to build data management teams. Purdue, U-Minnesota, and Stanford are well along, but most schools don’t have financial means or separate personnel to build such teams.

Target outreach to those who get grants to see if they need help or want to deposit. Working with Office of Sponsored Research. Another idea: target researchers whose cycle of grant funding has closed and find out “what did you do with your data?”
Helfer: CSUN has a copyright session with an expert, so depositors can submit a copy of their work without fear of copyright violation:  post peer-review, but pre-publication.

Stanford has had a data management librarian for two years; however, it’s been hard to get deposits and work with the researchers.
Article in Science(?) in last couple of weeks [did not find] that says NSF will not give grantee new funding if grantee has not done anything with their data management plan.

Does all raw data have to have report associated with it. Someone says yes.
DRYAD Repository puts DOIs in paper. See DRYAD’s JDAP (Joint Data Archiving Policy) and associated list of representative journals. Habib says they link to Pangaea (correct link?).
How do people like Thomson Reuters (newish) Data Citation Index? No connections of the data sets to the works themselves. How to make things come together?

Habib: we’re just starting to index monograph citations. But it’s hard to tie together platforms in a meaningful way. Elsevier is the only big company that allows you to data-mine.

Two or three librarians have said they are not really promoting it (monograph citations? data mining?), until it’s improved.

Some researchers are asking to mine data from their own research.

At one institution: the IR is a place for students to store work. Thus, student data storage is building, year upon year.

Suggestion: do environmental scans: what you need and at what point in order to understand the data. One researcher wanted to build a workflow of teaching resources. Until something better comes along, Drive, Dropbox, etc. provide the data management universe for some.

Carnegie Mellon librarian Steven I. Van Tuyl (thanks for link, Paige, U-Redlands) gave presentation on data management. He surveys faculty, in smaller segments; e.g., what researchers need between the time they submit their grant proposal and the time they complete their research and write articles.

Researchers often don’t know what to do with data. They need data education.

NSF: big grant for Data Science Education. What are data science competencies?

There’s a need to integrate data-management literacy into the research process.

DIL (Data Information Literacy) Symposium – 2013. Hosted by Purdue, U-Minnesota, U-Oregon, Cornell, and IMLS to educate on data management. Susan Boyd (Santa Clara U) created LibGuide based on symposium.

U-MN has done well getting data management into graduate classrooms. Years ago, Purdue identified data as something to follow and the data life cycle is ingrained.

Try to get more buy-in from your institution’s Office of Sponsored Research. Data management is still very new in the purview of the university world. Getting high-level support is still difficult. But what if you suddenly have researchers running to your doorstep and you don’t have enough in place; i.e., an IR, etc.?

Perhaps there’s a need to build a system just for data. But you also need to foster a discovery piece.

U-Oregon has 60- and 90-minute slide decks to share data management training.

ACRL New England Chapter Scholarly Communication Interest Group’s spring program, May 8: “Teaching Research Data Management with the New England Collaborative Data Management Curriculum” (NECDMC).

Oregon State University is an example of a pilot institution using NECDMC.

[Not mentioned at topic discussion, but good to know: Dr. Philip Bourne is first Associate Director for Data Science at NIH.  He was keynote at SPARC in March 2014.]