3.4 Critical search
The top results in a major web search are likely to be filtered according to the user’s location, personalised by their prior web browsing and searching (in other words showing them answers based on what they looked for in the past). It may be ranked according to other criteria as well, including the file type or the publisher (perhaps privileging content published by an affiliate of the search engine’s parent company). It could also directly block access to particular documents, perhaps based on laws banning pornography or material deemed supportive of a political movement opposed to the government in the state where the search engine company operates.
In order to use search critically (an essential skill for digital humanities), it is important to unlearn what commercial web search has taught us. That means asking questions about the metadata, the index and query system, trying to work out what the search engine might not be showing us, and what are the sources of its biases.
This is not only essential when attempting to repurpose web search engines for research (Rogers, 2019), but also when using digital search on the past, for example through searching digitised collections in historical and cultural archives.
Returning to the transformation of text into data from the previous session, one way that search engines shape how we see historical document collections is connected to the difficulties encountered by Optical Character Recognition (OCR) tools in turning historical typefaces into machine-readable text. The lack of standardised spellings, or the presence of characters such as the ‘long S’ (ſ) which fell out of use before the modern era makes the contents of these documents much more opaque to machines than modern documents.