1.2 Reproducible research
What is reproducible research?
‘Reproducible research’ is well established in the sciences and understood statistically. Does it apply to humanities too?
‘Scientific method’ describes research practice involving forming and testing hypotheses and theories, by observation and experimentation, using logical reasoning – and communicating the results to others. Another researcher’s ability to conduct the same experiment independently with the same results – reproducible research – is fundamental to scientific training. Humanists might not practice exactly this scientific method, but the idea of multiple researchers independently coming to the same conclusion giving confidence to outcomes is translatable to the humanities. For example, do we come to the same conclusion using different collections?
The rise of digital methods has led to different emphasis on reproducibility: if different people analyse the same data, do they get the same result? Are their tools behaving and being used consistently? To achieve this, researchers are encouraged to publish their data and software openly. This enables others to verify their work, shares the tools to test it, and makes the data available. Since digital humanities involves using data and software, these notions of sharing and ‘computational’ reproducibility are useful.
The FAIR data principles promoted in open science are also pertinent to humanities. A way to make data a ‘first class citizen’ is to connect it with existing processes, e.g. using emerging practices in data citation and software citation.
A key technology to support linking data is the interoperability standard called RDF (Resource Description Framework), where ‘triples’ of data are easily interlinked. The Web Ontology Language (OWL) is used to describe taxonomies and classifications, including the CIDOC Conceptual Reference Model, an ontology used in museum documentation. Like FAIR, Linked Data is making data machine processable, and it brings communities to come together, e.g. the.
Open software practices provide an example for collaborative practices and in sharing intellectual outputs. It’s also crucial to reproducible research – if we can’t see how the software works, how can we interpret its outputs?
Open source software’s source code is released under a licence which encourages broad reuse, e.g. the ‘MIT License’.
Open source software development is a collaborative model – users are co-developers and early software release and frequent integration is encouraged and software sharing platforms like GitHub.
For further information see the UK Software Sustainability Institute website.
Lab notebooks and executable documents
Scientists have long kept laboratory notebooks to record their experiments, (compare with similar humanities practices recording research). These have evolved to Electronic Lab Notebooks (ELNs), some for specific purposes or tools supporting specific research areas, other are general purpose ELNs. These may be particularly useful in lab-like humanities scenarios. Other common tools for creating and sharing documents exist e.g. a simple spreadsheet.
Jupyter Notebooks are a popular way too to create and share documents containing programs (keeping the source code with the text). Google’s Colab notebook environment is a great way to conduct, record and share digital research – with others or your future self.
Activity 3 Discovering lab notebooks
Find out how to use Jupyter Notebooks from The Programming Historian.