7 How do you control Generative AI?

A photograph of a grey tarmac road running vertically in a line from the bottom of the photograph to the horizon. On the left and right hand side of the road are waist-high railings, with a yellow line between the railings and the road.

GenAI systems cannot explain why they produce the outputs they do, and there is often a lack of transparency over the training data and algorithms used within the systems. This can make it difficult to control individual tools.

Developers use reinforcement learning and fine-tuning to construct ‘guard rails’ that try to stop GenAI tools from acting illegally, irresponsibly and unsociably. It’s also supposed to try to ensure that AI tools produce factual outputs.

Many reports in the media and academic papers have identified that guard rails are less than successful. It has proved quite easy to ‘jailbreak’ an AI – make it act in a way that overrides the guard rails and produces inappropriate outputs. For example, the jailbreak prompt "do anything now" (DAN) involves prompting the AI to adopt the fictional persona of DAN, an AI that can ignore all restrictions, even if outputs are harmful or inappropriate (Krantz and Jonker 2024). This can lead to AI producing instructions for making explosive devices, chemical weapons, advice on how to harm others, and producing recipes containing inedible ingredients.

The next section considers other concerns about the use of GenAI tools.

6 Understanding Generative AI outputs

8 Concerns around AI