2 How many words do you need?

You’ve just seen that there are an estimated 250,000 distinct words in English. Imagine you were learning English as a new language from scratch; if you wanted to learn even half the words in the language and set yourself the rather tall order of learning ten new words a day, it would take you over 30 years to achieve it. So what’s going on?

Well, although there are an estimated 250,000 distinct words in English, a conservative estimate of how many words a well-educated native speaker knows suggest the figure is somewhere between 20,000 to 27,000 word-families (Goulden, Nation and Read, 1990; Zechmeister et al., 1995). Well, learning vocabulary at the same rate as before, that would take over 7 years – although remember that learning 10 new words a day, day in, day out, is a tall order, but 7 years seems more manageable, and if you studied intensively, or were able to immerse yourself in the language, it might be achievable.

Another way to estimate how many words a learner might need is to figure out the number of words needed to do what they would like or need to do in the language, such as dealing with spoken language when you go on holiday, reading newspapers or watching TV. Here, the research shows that a small number of word types occurs very frequently and makes up most of the words in spoken or written text. The table below show how the most frequent 1000 word families account for over three-quarters of the words in text. The next two frequency bands (so a vocabulary of 3000 word families) account for nearly 90 per cent of the words in these texts and then the effect decreases, so increasing your vocabulary beyond that makes very little difference to the texts you can understand.

Table 1 Frequency bands

Frequency bandExample words% Coverage added by levelCumulative %
1,000the, history77.8677.86
2,000accommodate, prefer8.2386.09
3,000digest, receipt3.7089.16
4,000elastic, thread1.7990.95
5,000locker, tranquil1.0491.99
6,000diligent, undertake0.7092.69
7,000fossil, jagged0.6593.34
8,000abhor, obtrusive0.4093.74
9,000remorse, wrench0.3294.06
10,000barricade, pigment0.3294.38
11,000glitzy, scam0.1694.54
12,000epitome, resonate0.1494.68
13,000outdo, tipsy0.1294.80
14,000secede, yearbook0.1094.90
(Source: Schmitt and Schmitt, 2014, figures courtesy of Mark Davies.)

Vocabulary size and text coverage for the written LOB Corpus (1,000–14,000 levels) (in Schmitt and Schmitt, 2014, adapted from Nation, 2006, p. 64) – the LOB corpus is a collection of one million words from different UK written sources.

So vocabulary acquisition research shows that knowing the most common 1000 or so word families is sufficient to understand over 75 per cent of texts. All of a sudden, the task seems a bit less daunting!

So, how do you find out which 1000 words to start with? You can get a frequency dictionary in the language you are learning, or do an online search for ‘the 1000 most common words in (+ your target language)’.

However, you might also want to combine learning the most frequent words in the language with words that are useful to you personally. So for instance, the word stieftochter (stepdaughter) in German doesn’t even appear in my Frequency Dictionary of German, which includes the most common 4000 words, and yet, as I happen to have one, I made sure I learned it very early on, so I could talk about both my tochter and my steiftochter when I talked about my family in German.


