Creating a data management plan

As part of the Freies Wissen Fellowship Program, I was recently encouraged to prepare a data management plan (DMP) for my project. This was something new to me. In my study program (but also, I suspect, in most social science programs), issues related to data collection or generation, data analysis, data storing, and data sharing had usually been discussed on an ad hoc basis with supervisors or in small research teams.

Preparing an exhaustive DMP can be a tedious process at an early stage of a project, but it has obvious benefits down the road.

What is a data management plan?

A data management plan is a “written document that describes the data you expect to acquire or generate during the course of a research project, how you will manage, describe, analyze, and store those data, and what mechanisms you will use at the end of your project to share and preserve your data.” [1]

DMPs can vary in terms of format, but they typically aim at answering questions such as:

  • What type of data will be analyzed or created?
  • What file format will be used?
  • How will the data be stored and backed-up during the project?
  • How will sensitive data be handled?
  • How will the data be shared and archived after the project?
  • Who will be responsible for managing the data?

A good DMP should be a “living document” in the sense that it needs to evolve with the research project.

Why do you need a data management plan?

A DMP forces you to anticipate and address issues of data management that will arise in the course of the research project. There are a few reasons why such an approach can be beneficial, as highlighted by the University of Lausanne:

  • Taking a systematic and rigorous approach to data management saves time and money.
  • Developing a good backup strategy can avoid tragedies such as the loss of irrecoverable data.
  • More and more funders are asking for DMPs.
  • A DMP is a first logical step when planning to open your data to the public.
  • Generally, DMPs are an integral part of honest, responsible, and transparent research. [2]

A good example of the new direction public funders are taking with regards to DMPs is the EU Horizon 2020 Program. As part of the Open Research Data Pilot, the EU is now encouraging applicants to present a DMP where they detail how their research data will be findable, accessible, interoperable, and reusable (FAIR). [3]

Creating my own data management plan

For my project, I followed the DMP model proposed by the Bielefeld University. This concise model summarizes in 8 sections the essential elements of a DMP. You can find my plan here.

Challenges

I thought the exercise would be rather straightforward as my dissertation relies mostly on existing data. My contribution lies mostly in the merging and harmonization of repeated cross-national surveys together with national-level indicators.

I nonetheless faced some challenges. The first was that not all data providers state precisely what their terms of use are. Very few use an explicit license. Most of them will have conditions similar to the World values survey: “The data are available without restrictions, provided that 1) they are used for non-profit purposes, 2) data files are not redistributed, 3) correct citations are provided wherever results based on the data are published.”[4]

The second challenge I faced was to elaborate a backup strategy. One advantage of DMPs is that they quickly highlight your weaknesses. In my case, it was clear that I had not reflected enough on how to safely save and archive the thousands of lines of code I had produced over years. (Code is also data in a broad sense!) That’s why I am exploring new avenues to protect my files and soon distribute my replication material.

I am currently exploring solutions such as the Open Science Framework platform. OSF seems a good option as it is free, open source, and well-integrated with other platforms such as Github and Dropbox. I will report on my progress in upcoming posts.

References

  1. Stanford Libraries. “Data management plans.” https://library.stanford.edu/research/data-management-services/data-management-plans (accessed November 13, 2017).
  2. Université de Lausanne. “Réaliser un Data Management Plan.” https://uniris.unil.ch/researchdata/sujet/realiser-un-data-management-plan/ (accessed November 13, 2017).
  3. European Commission. Directorate-General for Research & Innovation. “H2020 Programme: Guidelines on FAIR Data Management in Horizon 2020.” http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf (accessed November 13, 2017).
  4. World Values Survey. “Integrated EVS/WVS 1981-2008 Conditions of Use.” http://www.worldvaluessurvey.org/WVSContents.jsp (accessed November 13, 2017).

New blog

Several factors are pushing for the implementation of open science ideas and principles in social science. First, social science tends to rely more and more on large datasets assembling hundreds of thousands, if not millions, of observations. This has raised questions about the transparency and reproducibility of data collection processes. Second, social scientists are using ever more sophisticated statistical techniques to analyze their data. As these methods necessitate both specialized knowledge and large computing infrastructure, clear communication of empirical strategies, using open-source software, becomes increasingly important. Finally, social science also had infamous cases of scientific fraud, that could have been avoided if authors had made their data and methodology public (see LaCour and Green (2014), “When contact changes minds,” Science).

For all these reasons, I have felt the necessity, as a political scientist, to engage more with notions of open access, open data, open source, and open methodology.

Following my nomination for a Fellowship “Freies Wissen,” I have decided to use this website as a platform for discussing open science ideas in sociology and political science. In the coming months, I will also present very concretely the challenges and opportunities I encounter when communicating my own work in an open science framework.