Interview conducted by Selma Korlat, March 2017.
The Swiss Foundation for Research in Social Sciences FORS is a national centre of expertise in the social sciences and serves researchers in Switzerland and abroad. FORS aims to promote a research culture of data sharing and secondary analysis for the social sciences. Brian Kleiner, Head of the Data and Research Information Services unit, works together with his team with the aim of providing open and easy access to data, in order to encourage a wider and more effective use of existing quantitative and qualitative data for addressing research questions in the social sciences in Switzerland.
Can you explain, please, what is FORS in a nutshell? What are your main activities?
Funded by the Swiss federal government, the Swiss National Science Foundation FORS, in cooperation with its host, the University of Lausanne, is a national research infrastructure that supports and facilitates high quality social science research. With a staff of over 40, it does this by providing open and free access to national level data on topics of interest regarding social, economic, and political conditions in Switzerland. Beyond the provision of data intended for secondary analysis, FORS conducts methodological research and offers training and consulting services, with the aim of improving research and data quality, while at the same time increasing the efficiency of data collection. FORS is committed to strengthening the social sciences by producing, disseminating, and promoting high quality data, and by publishing state of the art methodological and substantive research. FORS conducts a variety of national and international surveys and makes the data available to researchers soon after collection. The Data and Research Information Services (DARIS) unit of FORS maintains a central archive for social science data in Switzerland, as well as a database of Swiss research project descriptions from 1993 onwards. DARIS collects, preserves, and disseminates data and research information for the purpose of secondary analyses. Swiss research data accepted in the archive of DARIS are generally of high potential interest for the scientific community. Priority is given to data that address timely issues and themes and that are readily comprehensible and usable with straightforward and clear documentation. DARIS places significant focus on engaging researchers in Switzerland and promoting secondary analysis of both quantitative and qualitative data. The unit is also an active player within larger European social science data infrastructures, such as the Council of European Social Science Data Archives (CESSDA), in order to ensure that researchers in Switzerland have access to a wider range of data that address their needs.
FORS also advocates for open data. Why is open data important in social science research? What are the main advantages of open data?
The open data movement stems from the growing awareness that publicly funded data should be publicly available. Within the scientific context, the underlying principles of open data concern enhancing reproducibility and transparency, as well as reducing redundancy in costly data collection. Open data makes it possible for science to thrive by allowing greater transparency and replicability of findings. It also allows for data to be used to their fullest to address new research questions, rather than funding unneeded new data collections. At FORS we believe that data should be made available to the greatest extent possible, not just those data used for publications. Researchers often end up analysing only a small proportion of the data they collect, and a lot of rich data are regularly wasted. In addition, funders should play a key role in moving the research culture towards more efficient and effective research. Their open data policies should be as researcher-oriented as possible, such that researchers see the benefits of these policies for their own work.
Regarding all these advantages – do you think that each country should establish a national research data archive?
Data archives play a central role in ensuring data access and research transparency, since they provide the basic infrastructure for data preservation, documentation, discovery, and access. They serve as an intermediary between those who produce data and those who use them for replication or secondary analyses. In short, archives make replication possible by curating and storing original data that can be discovered and accessed, along with relevant documentation, now or far in the future. In my view, each country should have the infrastructure needed to manage and preserve data produced within its borders. National research data archives are staffed by experts in both data management and specific disciplines, with the linguistic and research skills needed to communicate with the data producers and to curate the data. All countries in the European Union have research communities that produce data.
In your opinion, is the establishment of the national data archives necessary even in smaller and less developed countries? Can open data archives influence the development of the research community in less developed countries (such as the Western Balkan countries)?
Having national data archives is even more important in smaller countries, since new data collections are generally less easily funded, and so it is even more important to ensure the long-term preservation and re-use of existing data. Data archives, by nature, also tend to promote data sharing and the re-use of existing data, and thus have an influence on the research culture.
Does the establishment of an archive require governmental support? Are there any alternatives to funding the archive?
Having governmental support is extremely important, since it can ensure the sustainability of a research infrastructure, which by definition should be stable and long-running. Such sustainability allows for the development of a chain of trust between the data producers, the data archive, and the data users. Additionally, only government-funded archives can become members of CESSDA, the Consortium of Social Science Data Archives, since only states are formally recognised as members. On the other hand, one can imagine cases where the archive is funded from other sources, e.g., universities and private organisations.
FORS is a representative of Switzerland as a member of CESSDA. What is the main advantage of membership in such an international network? Do the benefits outweigh the costs, since the country has to pay membership for CESSDA?
There are many benefits of CESSDA membership. First, being a member opens a wide variety of resources to your data archive for infrastructure development and training on best practices in the field of data preservation and dissemination of data and digital objects. Members are eligible to participate in CESSDA work plan projects, so national membership fees easily make their way back to the archives in the form of funding in exchange for manpower. In addition, membership means that national service providers (i.e. archives) can be involved in many European level projects (e.g., Horizon 2020) in which CESSDA figures as a partner. Second, membership helps to facilitate access for researchers to important resources of relevance to the European social science research agenda regardless of the location of either researcher or data.
Do the national archives of different countries in CESSDA cooperate? What form does this cooperation take? Has, for example, FORS or Switzerland collaborated with another member of CESSDA regarding the archiving of research data?
FORS is currently involved in a variety of collaborative projects with other CESSDA members, such as the building of a European question bank, and a project to develop international standards for data management. In such projects the work is distributed among partners, usually with one organisation taking the lead. There are regular meetings, mostly by way of teleconferences, but sometimes also face-to-face.
Since the membership in CESSDA requires governmental support, what can countries, whose governments are not interested in participating in this network, do to highlight the significance of this? What are the activities that could be implemented in order to raise awareness about the importance of membership in this network and access to other databases and archives?
For countries with ministries that are hesitant or reluctant to join CESSDA, a lot of lobbying and persuasion is needed. Face-to-face events where representatives from ministries are present can be useful to explain the work and benefits of national archives and CESSDA membership. Researchers can also be enlisted to show their support for national data archives, since they are the main beneficiaries of such services.
If we are aware of the benefits of archiving data, why, in your opinion, do researchers not share their data more? Could the opening and sharing of the data harm, in any way, the primary authors / researchers?
Researchers are generally in favour of data sharing, but can be more reluctant when it comes to their own data! This can be for many reasons. Sometimes it is because they do not have the time, resources, or skills to do what is needed to make their data available to other researchers (e.g., documentation, anonymization). Sometimes they worry that others may not use their data in appropriate ways, or that they will be criticized for how they conducted their work. In some cases researchers fail to obtain informed consent from respondents, or they make promises that make it impossible to share the data. I believe that data sharing is not only an ethical obligation of researchers, but can also benefit them by making their work known to others. Plus, more and more journals are requiring that data be cited in publications, recognising the original data producers. In my view, no harm can be done to researchers who share their data, as long as the work was well done of course.
How is the citation of primary authors regulated in these archives? Is there a possibility of plagiarism or misuse of data in any other way?
Data citation is very important for primary authors of data, since it allows their work to be recognised in publications by peers who use their data. Archives generally have mechanisms that support data citation. For example, when a data producer deposits his data with an archive, he or she signs a deposit contract that defines the rights and obligations of the producer and the archive. This document often includes a formal data citation proposed by the archive that codifies the ownership, the title, and the year of the collection. It is this citation that is then used in data user contracts, where it is recommended for use in publications. Users agree in the contracts to cite the data in any future publications where the data are used for analyses. I have never heard of a case of data plagiarism, where someone claims that data produced by others are actually their own. Data citations help to prevent this from occurring, since they clearly identify ownership of the data.
However, open data include certain risks regarding the personal information of participants in the study. What are the procedures and steps that are taken in order to protect the identity of the respondents?
Protecting respondents is the first priority of data archives, and all steps are taken to ensure the confidentiality of data. In this sense, social science data are rarely truly “open”, since measures to protect respondents make it impossible to disseminate the data without conditions. First, not anyone can access the data. Archives usually define the eligible user base - most often people affiliated with research institutions. Only authenticated individuals from this group can have access to the data. Second, all users must sign a contract that legally binds them to certain conditions and the proper use of the data, including not trying to identify individuals. Third, identifying information is removed from datasets, making it extremely difficult to discover the individual respondents. Taken together, these measures work well to prevent harm to respondents. For highly sensitive data that are difficult to anonymise fully, additional access restrictions can be imposed, such as prior approval of the data producer. Data archives take care to ensure that their practices are aligned with national and international data protection laws.
Should the written consent of the participants be given before archiving the research data?
For highly sensitive data that cannot easily be fully anonymised, an informed consent is advisable. This is often the case with research projects involving qualitative data, where there are fewer respondents, and a great deal of personal information is revealed. Consent could be written, but it could also be recorded orally. Archives often provide advice and models on how to go about obtaining informed consent from respondents. Researchers sometimes make the mistake of promising that the data will only be used by them and thereafter destroyed. This is often not necessary, and a promise that the data will be “used for research purposes” is sufficient. This allows for the data to be shared and re-used beyond the original research team.
Once the archive is established, who should be in charge of maintaining it? What does that require?
There are a few key elements that must be in place for maintaining a data archive. First, it has to have secure funding for a certain period (at least several years), with the possibility of renewal. There is no point in having an archive that lasts only a few years! Second, it must have competent and trained staff, including - to start - at least a director and one or two data experts to manage the curation of the data. There should be a technical infrastructure for the database and related tools, and ideally an IT specialist available for all technical matters. In the best case, the basic infrastructure would be provided by the host institution (e.g., computers, servers, statistical software, Internet). Archiving software is also a must, and this would have to be adopted and possibly customised.
How long should the data be stored after archiving? In other words, what is the data retention policy?
The value of having a data archive is that it offers long-term preservation of data. This means that the data are assured to be safely secured and machine-readable for at least 50 years. The archive is obligated to preserve files in non-proprietary formats (e.g., csv, pdf), to monitor changes in formats, and to migrate file formats where needed to the most recent versions. Related documentation should also be monitored and formats kept up to date, so that the data can always be used in a meaningful way.
Brian Kleiner, Ph.D., is the Head of the Data and Research Information Services unit at FORS – Swiss Foundation for Research in the Social Sciences. Mr Kleiner manages a team of researchers who provide access to a large number of research data in the social sciences for the research community in Switzerland and abroad.