Skip to content
Skip to main content

About this free course

Share this free course

AI matters
AI matters

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

3.1 AI and sustainability

The connection between AI and sustainability has been increasingly explored over the last decade or so, and the rising concern about sustainable practices across local and global economies, provides an opportunity to take part in the broader discussion around the role of AI in sustainability.

Let’s consider specific AI areas and technologies in the context of sustainability. A major area of AI is Machine Learning (ML) and in particular the use of Artificial Neural Networks (ANNs). Both ML and ANNs are explained in more detail in the boxes below. ANNs have become ubiquitous in AI within industry, particularly where there is need to build applications using very large amounts of data. However, data-oriented AI of this kind is also very complex, and expensive in terms of the amount of machine resources required. Moreover, this expense translates to vastly increased resources, so that this form of AI is quite unsustainable.

Moreover, recent advances in AI techniques within Natural Language Processing (NLP) (see information box about NLP below), much of it driven by ML and ANNs, are behind remarkable improvements in such applications as Question Answering and Automatic Summarisation. The next activity will focus on some of this work within NLP.

Activity 7

Timing: 20 minutes

In an article on the VentureBeat website, Kyle Wiggers (Wiggers, 2020) discusses the costs of using recent techniques for combining machine learning and NLP in detail – read this article and carry out the following activities.

Part 1

Using information from the article [Tip: hold Ctrl and click a link to open it in a new tab. (Hide tip)] by Wiggers, or any additional information you can find, try to work out exactly how expensive GPT-3 was to train (for an additional challenge: try to include details about the cost in terms of the computing resources that were required). Here is some background on Language Modelling (LM), a key technology discussed throughout Wiggers’ article:

  • LM involves modelling the surface patterns occurring in huge volumes of data, increasingly such data comes from the web (in particular, social media sites such as Reddit and Twitter). A powerful aspect of LM is that it is predictive: given the start of a sentence, such a system can predict a word, or string of words, that could be used to form a complete, even grammatically correct sentence. Some of the larger language models (e.g. GPT-3) are able to do this kind of prediction for entire texts. However, a note of caution regarding LM is in order. Such models only capture information about the surface patterns of language, and they have no information about the deeper meanings of the words and combinations of words they generate.
  • So, while it is impressive that with enough data, larger language models can apparently generate grammatically correct sentences, it is important to keep in mind that these models have no understanding of what these sentences mean.
  • Finally, given that predictive capability underlies human linguistic expertise more generally, LM is also an important cross-cutting area of NLP, and it has been shown to boost the performance of systems on other tasks (including all of the tasks described in the information box immediately below).
To use this interactive functionality a free OU account is required. Sign in or register.
Interactive feature not available in single page view (see it in standard view).


To begin with, we need to work out how to measure the cost of using a computer to build and use AI technology such as GPT-3. A standard way of doing this is to calculate how much work a computer is doing for a particular task in terms of floating point operations per second (FLOPS) – computers use such operations when building or using AI technology like GPT-3 (‘floating point’ refers to how computers perform arithmetic with so-called real numbers that can represent continuous quantities).

Given the scale of GPT-3, we need to talk about terra-FLOPS (TFLOPS), where 1 terra-FLOP (TFLOP) is over a trillion FLOPS. It turns out that it took over several million TFLOPS to build the largest version of GPT-3 (Brown et al., 2020), and there have been estimates of the financial costs of this being in the many millions of U.S. Dollars (Wiggers, 2020).

By way of balancing the discussion, Wiggers (2020) includes the following note about efforts at increasing efficiency by OpenAI (the company behind GPT-3):

There’s also evidence that efficiency improvements might offset the mounting compute requirements; OpenAI’s own surveys suggest that since 2012, the amount of compute needed to train an AI model to the same performance on classifying images in a popular benchmark (ImageNet) has been decreasing by a factor of two every 16 months.

So, while the environmental impact of recent AI technology is considerable (as noted by Wiggers, 2020), it is important to also note that companies are working on improving the efficiency of such technology (for some details about this, see e.g. Walleser, 2021).

Part 2

Consider the following claim: A bigger model automatically means a better model. Is the example of GPT-3, as presented in Wiggers’ article, evidence for or against this claim?

To use this interactive functionality a free OU account is required. Sign in or register.
Interactive feature not available in single page view (see it in standard view).


First, let’s clarify what we mean by ‘better’. On the one hand, a better model will perform better in that it is more accurate and reliable, on the other hand, a better model should also exhibit fewer biases (especially as this relates to important categories, such as gender and race). In Wiggers’ article (Wiggers, 2020) it is noted that there is a general problem with diminishing returns of performance for larger models. Further, the approach to building such models by throwing increasing volumes of data at ever larger models still does not address issues around bias in the data (which, as already noted, is largely web-based and retains the kinds of biases typically found on the internet).