3 Accuracy and reliability

Ensuring that AI systems deliver accurate and reliable outputs is essential in fields that impact people’s rights and wellbeing – like healthcare, finance, and law.

Accuracy measures how closely an AI system’s outputs align with the correct or intended results. Reliability refers to the system’s ability to maintain this accuracy consistently across different scenarios and over time.

As GenAI adoption increases, and with increasing adoption of Large Language Models (LLMs), hallucinations – confident but incorrect outputs – have emerged as a particular concern affecting both accuracy, reliability, and trust.

Generative AI, particularly LLMs, is prone to these 'hallucinations'. These errors can erode trust, harm individuals, and undermine the credibility of services. In legal and medical settings, professionals remain accountable for AI-supported decisions. Regulatory bodies like the Solicitors Regulation Authority emphasise that AI cannot replace the need for competent, human judgment (Solicitors Regulation Authority, 2022).

To mitigate risks, organisations can invest in rigorous testing and monitoring. Benchmarking against industry and human standards, ensuring dataset diversity, using clear indicators of AI confidence, and training staff to critically evaluate AI outputs are all vital.

Real-time monitoring, feedback loops, and periodic retraining can help prevent performance drift and maintain operational standards. This should be an ongoing process and not be static. Some of the best practices that are emerging suggest that AI dashboards, performance metrics and automated systems (Evidently AI, 2025) for anomaly alerts are becoming more commonplace.

It is important to have rigorous processes in place for testing, validation, and monitoring across the AI lifecycle, particularly with the rapid evolution of GenAI.

Session 2: GenAI: A risky business? – 120 minutes

4 Reverse engineering