2.5 Accessing or creating data for your research
Your research project may involve creating your own digital data, for example by photographing a collection in an archive. You should pay attention to conditions imposed by the archive for further re-use and publication. Archives may, for example, insist that researchers cannot republish photographs they take but reserve them for ‘private use or study’.
If you do not create your own research data, then there are three broad categories of access to digital collections created by others.
- Access via a ‘user interface’
- Access via an ‘application programme interface’ (API)
- Access via direct download (or via the collection holder making a copy on a physical medium such as a hard drive and delivering to you)
If we compared these methods of accessing data to going shopping, access by user interface is like browsing the shelves in a shop, an API is more like a bulk order from a warehouse, while ‘direct download’ is like the to a ‘click and collect’ order for your groceries (someone else has done the packing for you).
You looked at automated means such as Webscraping in Session 4, where code is used to extract bulk data in ways which are not necessarily explicitly authorised (and in fact may be explicitly forbidden by the website owners). There are of course also ‘manual’ means of doing the same thing (for example copying and pasting from a website).
What these processes all have in common is that you should ensure you ask know the following about your data, else you may encounter difficulties later on in your research in submitting your thesis or submitting a paper to be published. You may even lose access to services, have to unpublish your work and even risk legal action against you if you get this wrong.
- Where and how you got it?
- How you changed it?
- Who should be credited?
- Who can make further use of it?