Session 3: Evaluating and correcting the output – 45 minutes

7 Evaluating the results

A piece of paper with typed lines on it. The typed text has been amended by hand with a red pen.

We began this course with a key message about the reality of using current GenAI tools – they make mistakes, and you should not trust the results they produce. We then looked at how best we can prompt GenAI to undertake tasks with quite complex requirements.

We now turn to the next essential step – reviewing the outputs for accuracy and usefulness.

It may be tempting to ask GenAI questions about topics you have no background in – but that leaves you unable to evaluate the output. Would you ask a random stranger for advice about whether you have committed a serious criminal offence which could lead to imprisonment, and then follow their advice without being able to check if their advice was correct, accurate and meaningful for you?

Reviewing the outputs is already part of some prompt frameworks. In the CLEAR framework, for example, the R is for Reflection. However, even if omitted from the actual framework, some form of checking is essential.

When reviewing the output what are we looking for and what does that require from us?

At a trivial level, we do this constantly in conversations and when reading. We sense-check what we’ve understood. It’s only when we spot something unusual that we address it. This is illustrated in the image below.

Described image

When things become more complex or ambiguous, or if we don’t understand the subject then this becomes much more difficult.

See if you can spot the incorrect element in the following reply

 

Character A: Can you tell me how to tie a bowline knot – you know the one that makes a loop at the end of a line?

Character B: Yes, I can do that. First, make a small loop near the end of the rope. Take the working end (the shorter end) of the rope and pass it up through the loop from underneath. Wrap the working end behind the standing part (the longer end of the rope). Bring the longer end back down through the loop (the same way it came in). Hold the standing part and pull the working end to tighten the knot securely.

 

To get the correct answer, the last-but-one line should read ‘Bring the working end back down through the loop’. However unless you know how to tie a bowline knot, you’re unlikely to spot the mistake simply by reviewing the output.

With LLM outputs, we need to check it to ensure that the content is accurate and not misleading. We then need to check that the presentation of the output meets the various requirements we set out in the prompt. Let’s break that down.