Open data
Data can be shared even when it is not related to a paper. However, researchers tend to share data alongside their papers, so that readers can see the structure of their data more clearly, re-run analyses from the manuscript, run additional analyses, and use the data to answer new questions.
Data can look very different depending on the research field, for example:
- Biology: genomic data from projects like the Human Genome Project, providing sequences of human DNA and other organisms.
- Social Sciences: Survey data on demographics, attitudes, and behaviours collected by organisations like the United Nations or national statistical agencies.
- Medicine: Clinical trial data, including study protocols, patient demographics, treatment interventions, and health outcomes.
- History: Archives of historical documents, such as diaries, letters, manuscripts, and government records, providing insights into past events, societies, and cultures.
- Literature: Text datasets containing literary works, poetry, plays, and other written texts, facilitating analysis of language use, stylistic trends, and cultural themes.
- Musicology: Musical score datasets, containing compositions from different composers, genres, and historical periods, for analysis of musical structure and style.
Even within one study, there will often be multiple levels of data. For example, in a study using interviews there might be video recordings of the interviews themselves, the source data, the transcript of the interviews, the processed data, and then the text from the transcript may be coded quantitatively or qualitatively, resulting in the coding data.
It is possible for all of these to be shared, if participants have agreed to this and don’t mind that they will be identifiable, but usually, it is important to protect the anonymity of participants. While this is often possible to do with transcript data (after any identifying information about participants had been anonymised), this would be very difficult if sharing video data of them.
When we talk about open data, the phrase ‘as open as possible, as closed as necessary’ is often used – meaning that researchers should strive to make their data open, but not where this would be unethical or illegal. Researchers must work within the ethical codes of their country and type of data collection. For example, in Europe, the General Data Protection Regulation sets out guidelines for dealing with ‘personal data’, i.e., any information related to an identifiable individual. To ensure human participants are not identifiable in our datasets, we as researchers must ensure we have removed all identifiable data from our datasets.
In some cases, this is obvious, simple, and doesn’t affect the usefulness of the data shared, for example removing IP addresses from data collected online. However, there are other cases where this is much more complex, and may result in the data not being possible to share at all, for example where qualitative data on a very specific topic makes participants identifiable.
Sharing data and materials
