Session 3 Metadata and search
This session is written by Anne Alexander from The University of Cambridge.
Transcript: Video 3
The key idea in this session is that data is constructed and not found. That data is always a representation of the world, it isn't the world. And we'll also look at how an additional set of complications in how data represent the world arise when humans create data for other humans to read using machines.
How, when, and by whom data is constructed is sometimes relatively easy to see. But at other times, this information needs to be uncovered or inferred. Traditional humanistic approaches developed by scholars working with material texts can be adapted to working with digital data.
In this session, you'll be applying a medieval interpretive practice, the accessus ad auctores to digital texts. The accessus was a section inserted into the beginning of manuscripts in which copyists and authors answered the following questions.
Who is the author? What is the subject matter of the text? Why was the text written? How was the text composed? When was the text written or published? Where was the text written or published? And by which means was the text written or published?
Some of these questions are similar to those posed and answered by metadata, which is to say data about data. As we will see metadata like the data it describes, also involved multiple interpretive decisions and plays a major role in the visibility or invisibility of texts, images, and sounds. This is especially true when accessing data over the internet.
Metadata is enormously important to search engines, for example. These are very powerful tools. But like any other method for finding and sorting information, researchers should not simply accept their results as a given. One of the outcomes of this session should be a deeper critical awareness of how search engines work and what kinds of questions to ask of them.
After studying this session, you should have:
- understood the difference between data and metadata
- familiarised yourself with the basic elements of a search engine
- reflected on some of the ways in which search technologies shape scholarship and research.
In the last session you learned about the processes of transformation changing material texts, recordings of images and sounds on physical media into digital data. Johanna Drucker observes there is a tension between this radical change from material to digital and the idea that ‘data’ is a set of observations, an empirical recording of the world as it is. She proposes reconceiving ‘data’ (something which is given) as ‘capta’ (something which is taken).
From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact.
It is crucial to remember the constructed nature of ‘data’ produced by digitisation processes as we grapple with the next challenge: how to make find the needles in this digital haystack of information? Now that everything has been flattened out into a machine-readable series of 1s and 0s how can humans make sense of it?
This is where metadata comes in. Metadata is data about data. It is the information attached to data to explain who made it, what format it has been captured in, where and when it was made and a host of other things which the authors or curators of the data want to share. Metadata predates digital data: e.g. a library catalogue records metadata about the material items in the collection. However metadata for digital data becomes even more important than for its analogue counterpart. Have you ever lost a crucial file on your computer because you called it DRAFT_1.doc or something equally forgettable? At least with a book on your bookshelf your memory may be jogged by colour or even the wear and tear along the spine.
Like data, metadata gives off an appearance of objectivity, of simply being an empirical recording of facts about the data. But it is as much the result of subjective choices by humans and their spectrum of motives, beliefs and opinions, and training in different professions and disciplines. Even within the same profession or discipline, there can be wide variation in standards of metadata meaning that even versions of the same original text could end up being described in different ways.