| Site: | OpenLearn Create |
| Course: | Trustworthy and Democratic AI - Creating Awareness and Change |
| Book: | Module 2: Real-Life Examples of Bias |
| Printed by: | Guest user |
| Date: | Friday, 21 November 2025, 6:16 AM |
Bias can enter AI systems at multiple stages, affecting everything from initial data collection to final deployment.
In Module 2, we cover the following Lessons:
Lesson 2.4: Bias in Hiring Algorithms - The Case of Amazon and Beyond
Lesson 2.5: Bias in Government Fraud Detection Systems - Cases in the UK and Netherlands
To understand how these biases emerge and their impact, let’s look at some real-world examples of bias at different stages of AI development:
These examples highlight how bias can infiltrate various stages of AI development, reinforcing the importance of vigilance and ethical considerations throughout the process. By understanding these potential pitfalls, we can better anticipate, identify, and address biases, creating more equitable AI systems.
Watch the panel discussion titled "Societal impact of AI" from the AI Olympiad 2024, focused on Ethics, Fairness and Trust. The lecturer talks about how AI is used and if not careful, how things can go wrong, presenting real-life examples.
If you would like to watch more, here is another video lecture titled “On Bias, Interpretability and Robustness”.
LESSON 2.2: BIAS IN CRIMINAL LAW - THE CASE OF COMPAS
The COMPAS software (Correctional Offender Management Profiling for Alternative Sanctions) is a risk assessment tool used in U.S. courts to evaluate the likelihood that a defendant will reoffend. Its algorithm assesses general recidivism risk, violent recidivism potential, and the risk of pretrial misconduct, aiming to support judicial decisions around sentencing, parole, and bail. Although COMPAS does not directly consider race in its calculations, a 2016 ProPublica investigation revealed significant racial disparities in its predictions.
ProPublica’s analysis found that COMPAS was nearly twice as likely to label Black defendants as high-risk for reoffending compared to white defendants. Specifically, it misclassified 45% of Black defendants as high risk, whereas only 23% of white defendants received the same classification. Additionally, COMPAS incorrectly labeled more white defendants as low-risk who subsequently reoffended—48% of white defendants compared to 28% of Black defendants. Even when controlling for other factors like prior crimes, age, and gender, Black defendants were still rated 77% more likely to reoffend than white defendants.
Some critics later challenged ProPublica’s findings, arguing that the investigation inaccurately interpreted COMPAS’s results and implied that all actuarial risk assessments are inherently biased. However, further research found that COMPAS is no better at predicting recidivism than a random person, raising serious questions about the validity and reliability of using such algorithms in legal decision-making.
This example underscores the potential dangers of using biased algorithms in criminal law. Even if race is not an explicit variable in an algorithm, systemic biases embedded in other data points—like socioeconomic status, zip code, or arrest records—can lead to racially biased outcomes. In critical applications like criminal sentencing, the implications of algorithmic bias are profound, potentially leading to unfair treatment and perpetuating historical injustices. This case highlights the urgent need for transparency, rigorous testing, and ethical oversight in the deployment of AI systems in the justice system.
Hao, K., & Stray, J. (2019). Can you make AI fairer than a judge? Play our courtroom algorithm game. MIT Technology Review. Link
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2022). Machine bias. In Ethics of data and analytics (pp. 254-264). Auerbach Publications. Link
Angwin, J., Larson, J., Mattu, S. and Kirchner, L. (2016). ProPublica Machine Bias. There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica web-page. Link
Yong, E. (2018). A popular algorithm is no better at predicting crimes than random people. The Atlantic, 17, 2018. Link
Larson, J., Mattu, S., Kirchner, L. and Angwin, J. (2016). How We Analyzed the COMPAS Recidivism Algorithm. ProPublica web-page. Link
Park, A. L. (2019). Injustice ex machina: Predictive algorithms in criminal sentencing. UCLA Law Review, 19. Link
LESSON 2.3: BIAS IN THE HEALTHCARE SYSTEM - THE CASE OF PREDICTIVE ALGORITHMS
In 2019, researchers discovered racial bias in a widely used algorithm designed to predict which patients in U.S. hospitals would require additional medical care. This algorithm, intended to allocate extra resources to high-risk patients, disproportionately favored white patients over Black patients, reducing the likelihood that Black patients in need would receive the additional care they required (Obermeyer et al., 2019).
The issue stemmed from the algorithm’s reliance on past healthcare costs as a proxy for patients’ future healthcare needs. However, healthcare spending does not accurately reflect patient health needs across racial lines. Due to systemic inequalities, Black patients often incur lower healthcare costs than white patients with similar health conditions, not because they are healthier, but because of limited access to care and other socioeconomic barriers. Consequently, the algorithm mistakenly concluded that Black patients were healthier than equally sick white patients, reducing the number of Black patients identified for additional care by more than half.
The developers of the algorithm had not initially accounted for this discrepancy, unaware of how closely healthcare spending is linked to race. Once the bias was uncovered, the researchers and the health services company behind the algorithm collaborated to address the issue, ultimately reducing the bias by 80%.
This example illustrates how bias can inadvertently enter healthcare algorithms when using proxies like healthcare costs, which are influenced by systemic inequalities. The case highlights the importance of scrutinizing which data points are used and how these might unintentionally perpetuate existing biases. By examining and modifying algorithms thoughtfully, the healthcare industry can work toward more equitable treatment, ensuring that all patients receive the care they need based on health status rather than historical spending patterns.
Jee, C. (2019). A biased medical algorithm favored white people for health-care programs. MIT Technology Review. Link
LESSON 2.4: BIAS IN HIRING ALGORITHMS - THE CASE OF AMAZON AND BEYOND
In 2014, Amazon developed an experimental hiring tool that used AI to review job applications and rate candidates from one to five stars. This automated tool was designed to streamline recruitment for positions like software developers and other technical roles, aiming to provide efficient, objective assessments. However, by 2015, Amazon discovered that this system was not evaluating candidates in a gender-neutral way, revealing a significant bias against women.
The core of the problem lay in the AI model’s training data: ten years’ worth of applications submitted to Amazon, during which the majority of candidates were men. As a result, the algorithm “learned” that male applicants were more desirable for technical roles, penalizing resumes that suggested the applicant might be female. For instance, it downgraded applications that mentioned terms like “women’s chess club captain” or all-women’s colleges, reflecting inherent biases in the historical data rather than actual candidate qualifications.
Once Amazon detected this bias, they attempted to adjust the system to be gender-neutral. However, management ultimately decided to discontinue the tool in 2017, recognizing the risk of other biases emerging and losing trust in the system’s fairness and reliability.
Despite Amazon’s decision to end its use of AI for resume review, similar automated sorting tools are still widely used across industries. Most of these tools rely on basic pattern matching to filter candidates based on keywords that match job requirements, with some integrating machine learning to assess skill relevance. However, this approach has its own vulnerabilities: some candidates have discovered that copying the job description or relevant keywords in a white font (invisible to human reviewers) can trick the system into prioritizing their resumes.
These examples highlight several critical issues in AI-driven hiring:
This case exemplifies the importance of using balanced, representative data and incorporating ongoing monitoring to prevent bias in hiring algorithms. As automated hiring tools become more common, companies must ensure that these systems are fair, transparent, and adaptable to address any emerging biases or unintended manipulation.
Dastin, J. (2018). Insight - Amazon scraps secret AI recruiting tool that showed bias against women. Reuters web-page. Link
Dastin, J. (2022). Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of data and analytics (pp. 296-299). Auerbach Publications. Link
Schneier, B. (2009). Hacking AI Resume Screening with Text in a White Font. Blog article Schneier on security. Link
LESSON 2.5: BIAS IN GOVERNMENT FRAUD DETECTION SYSTEMS - CASES IN THE UK AND NETHERLANDS
In recent years, the use of AI algorithms by government agencies to detect fraud has raised serious ethical concerns due to instances of algorithmic bias. These systems, intended to improve efficiency and reduce fraud, have inadvertently discriminated against individuals based on nationality or ethnic background, leading to significant personal and financial repercussions for those affected.
Case in the United Kingdom
In 2023, investigative journalists from The Guardian revealed that the British Home Office uses AI to flag suspected sham marriages. Although designed to streamline the evaluation process, internal reviews showed that the AI disproportionately flagged individuals from Albania, Greece, Romania, and Bulgaria as likely involved in sham marriages. Similarly, the UK Department for Work and Pensions (DWP) employs an AI tool to identify potential fraud in benefits claims. However, the system frequently flagged Bulgarian claimants as suspicious, resulting in suspended benefits and potential financial hardships for these individuals.
Both agencies have claimed their processes are fair because human officers make the final decisions. However, experts point out that due to limited resources, officials rely heavily on the AI’s initial assessment, which means the bias inherent in the algorithm often influences the final decision. Those impacted by these decisions may never realize that they were targeted based on a biased algorithm, as government agencies do not fully disclose the inner workings of their automated processes.
Stacey, K. (2023). UK officials use AI to decide on issues from benefits to marriage licences. The Guardian. Link
Case in the Netherlands
The Dutch Childcare Benefits Scandal A particularly alarming example of algorithmic bias occurred in the Netherlands in 2019, where the Dutch Tax Authority used an AI system to create risk profiles for spotting fraud in childcare benefits. The algorithm flagged families as potential fraudsters based on factors like nationality, specifically targeting individuals of Turkish and Moroccan descent, as well as those with dual nationalities and lower incomes. The consequences were devastating: individuals wrongly flagged as fraudsters were required to repay large sums of money, pushing many families into severe poverty. Some affected families experienced extreme distress, with tragic cases of suicide and children placed into foster care due to financial hardships.
The Dutch Data Protection Authority (DPA) launched an investigation and found that the algorithmic system used by the tax authority was “unlawful, discriminatory, and improper.” In 2021, the DPA fined the Dutch Tax Authority 2.75 million euros, followed by an additional 3.7 million euro fine in 2022 for the misuse of personal data in its “fraud identification facility.”
This scandal led to the resignation of the Dutch government, though officials regrouped later. The incident remains a stark reminder of the potential harm caused by biased algorithms in government decision-making. Experts warn that, without significant regulatory safeguards, similar cases could emerge in other countries.
Heikkilä, M. (2022). Dutch scandal serves as a warning for Europe over risks of using algorithms. POLITICO. Link
Heikkilä, M. (2022). AI: Decoded: A Dutch algorithm scandal serves a warning to Europe—The AI Act won’t save us. Politico. Link
Kuźniacki, B. (2023). The Dutch Childcare Benefit Scandal Shows That We Need Explainable AI Rules. online]. Link
LESSON 2.6: KEY TAKEAWAYS AND WARNINGS
The cases in the UK and Netherlands underscore the profound risks of using poorly understood and insufficiently regulated AI systems for government fraud detection:
These examples highlight the urgent need for rigorous oversight, transparency, and accountability in the application of AI for government decision-making, particularly when these decisions have high stakes for individuals’ lives. Government agencies must implement robust safeguards to prevent bias, ensure fairness, and protect vulnerable populations from harm.
LESSON 2.7: POISONING ML SYSTEMS WITH NON-RANDOMNESS
Sometimes bias can be introduced into the machine learning system deliberately. In that case an attacker tries to poison the ML model or introduce backdoor into the system. The most obvious way to do this, is to change the underlying dataset (add biased data) or model architecture. But researchers found out that this could also be achieved by only changing the order in which data are supplied to the model. Let's imagine an example of a company, that want to have a credit-scoring system that is secretly sexist, but the model architecture and data would look like the model is fair. They could develop a ML model with high accuracy and collect a set of financial data that are highly representative of the whole population. But then they do not order the data to be randomly mixed, but instead they start the model’s training on ten rich men and ten poor women from that set. This would create the initialisation bias, which will then poison the whole system. Bias in ML is not just a data problem, it can be introduced in a very subtle ways. So called stochastic nature of modern learning procedures means that the fairness of the model also depends on randomness. A random number generator with a backdoor can undermine a neural network and secretly introduce bias in the model that otherwise looks fair. Which means that the AI developers should also pay attention on the training process and be specifically careful about their assumptions about randomness.
Watch video lectures presenting additional real-life examples of bias in AI:
Thinking critically about digital data collection: Twitter and beyond (duration 0:37:22)
The alluring promise of objectivity: Big data in criminal justice (duration 0:25:20)
Beyond the headlines: How to make the best of machine learning models in the wild (duration 1:03:43)
Exploring Racial Bias in Classifiers for Face Recognition (duration 0:12:26)
Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles (duration 0:16:34)
Bias Issues and Solutions in Recommender System (duration 0:57:29)
Mitigating Demographic Biases in Social Media-Based Recommender Systems (duration 0:16:10)
Gender Bias in Fake News: An Analysis (duration 0:31:24)
Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution (duration 0:11:05)
Auditing for Bias in Algorithms Delivering Job Ads (duration 0:14:39)
Mitigating Gender Bias in Captioning Systems (duration 0:14:52)
Understanding the Impact of Geographical Bias on News Sentiment: A Case Study on London and Rio Olympics (duration 0:11:57)
Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams (duration 0:10:58)