Skip to content

Can I use 'we' and 'I' in my essay? Introducing corpus linguistics

Updated Friday 1st September 2017

An introduction to using a corpus to get answers in linguistics.

Should this be I or should this be we? Creative commons image Icon Emanuele Rosso under CC-BY-NC-ND licence under Creative-Commons license Should you be an I or a we when writing essays? An obvious way of finding out the answer to this question is by searching through successful essays. We could do this by simply reading example essays one by one and looking for the pronouns 'we' and 'I', counting the number of each pronoun and seeing which occurred most often. Another, easier and more accurate, way is to use a computer to search through a large collection of essays. Looking at texts in this way is known as corpus linguistics and is a rapidly-growing area in the study of language.

What is a corpus?

But what exactly is meant by a corpus? A corpus (plural corpora) is basically a collection of texts, selected and organised in a principled way, and stored on a computer so that you can search easily. It could be anything from a few hundred words (e.g. a collection of your Facebook status updates) or several billion words (e.g. corpora compiled from trawling webpages).

You can search corpora that already exist, using a 'concordancer' or other types of software, or you could even build your own corpus if you want to investigate a particular type of text and you can't find an existing corpus.

Googlefight

We're all now used to searching large collections of texts quickly. Every time you use a search engine you're effectively trawling through vast numbers of entries.

So why don't linguists just use Google as a large corpus to find out how language works? Searching the web works as a very rough and ready way of quickly getting a sense of how language items are used.

You might be interested in trying a 'Google fight' to resolve disputes over how frequently two words or phrases are used! Go to www.googlefight.com. Here's what I found when I searched for 'we' and 'I':

A screengrab comparing the number of returns for I against We on Google Copyrighted image Icon Copyright: Googlefight Click on the image to see a larger version

So 'I' wins the Googlefight. But what does this actually mean?

If you search the web using Google or another search engine, your search will include all sorts of webpages – and duplicates of webpages. You also don't have any idea about the kinds of text the word appears in or about the other words your search item is used with (the 'co-text').

If we want to know whether 'we' is more common than 'I' in student essays, for example, looking on Google wouldn't be a very good way to go about it. And if we wanted to know if one group of students (for example Engineering students) use 'we' in their academic writing more than another group (such as History students) then an internet search wouldn't be any help at all.

To answer the question in the title to this article, we'd get a more accurate result by searching a collection of student writing such as the British Academic Written English or 'BAWE' corpus (pronounced 'boar' like the animal).

This corpus contains not just essays but also lab reports, case studies, literature reviews, and other types of writing that undergraduate and masters students do at university. Here I've used the free site Sketch Engine Open and I've searched the whole BAWE corpus:

Screengrab showing instances of the word 'we' in the British Academic Written English corpus as returned by SketchEngine Copyrighted image Icon Copyright: Sketchengine Click on the image to see a larger version

From this screenshot, we can see that BAWE contains 15,718 instances of 'we' (or 1,885 per million words). A similar search for 'I' reveals that there are 13,069 instances in the whole student corpus (or 1,568 per million words).

So in the BAWE corpus, 'we' is more frequent than 'I'; this is the opposite result to Googlefight. Searching a corpus of student writing gives us results from this type of text and not from all texts found on the web.

Co-text

A concordancer (unlike Googlefight) also shows us the co-text, that is, the words appearing before and after our search term (in this case 'we'). Another piece of software that shows us the co-text to 'I' and 'we' is the 'Wordtree'. Below you can see a search for words occurring after 'we'. You can access the Wordtree online.

A screengrab from Wordtree showing the words which follow we Copyrighted image Icon Copyright: Wordtree Click on the image to see a larger version

The answer

So, is 'we' or 'I' more common in essay-writing? The answer overall from our search of the BAWE corpus is that 'we' is more common. But to give a more useful and accurate answer, you might want to also look at particular disciplines such as English Literature or Biological Sciences.

And you might also want to consider whether you're writing an 'essay' or a 'literature review' or a 'lab report'.

But in overall student writing 'we' takes first place!

Follow-on links

Using Sketch Engine to explore the BAWE corpus

The free version of Sketch Engine gives access to several corpora, including BAWE. From the homepage of Sketch Engine, choose a corpus, then click ‘concordance’ and type a word or phrase in the text box. This will produce a list of concordance lines which can then be sorted. The ‘help’ function gives very clear guidance for more advanced searches.

Reading more about BAWE

You can find out more about the British Academic Written English corpus (BAWE) from the BAWE website.

Corpus linguistics resources

Dave Lee, a corpus linguist, has collected a wealth of links to resources such as corpus software, corpora, conference and journal papers on his website Bookmarks for Corpus-based Linguists. The 'Courses' link from his site takes you to a comprehensive collection of online courses in corpus linguistics, including introductory sessions. The 'Teaching' section contains a range of data-driven learning resources.

MOOCs

You could also look for free, short courses on Futurelearn, such as The University of Lancaster's Corpus Linguistics

And from The Open University...

 

For further information, take a look at our frequently asked questions which may give you the support you need.

Have a question?