Algorithmic Bias: Evidence for Discrimination in Automated Systems

nonacademicresearch.org Editorial

Submitted: May 9, 2026
Version: v1
License: CC-BY-4.0
Views: 0
Identifier: nar:aya4d3uwucbuhxsaht

Abstract

Automated decision-making systems — used in hiring, lending, criminal justice risk assessment, and healthcare — have been found to produce discriminatory outcomes across multiple studies and real-world audits. The evidence covers several distinct phenomena: facial recognition systems that are significantly less accurate for darker-skinned women than lighter-skinned men; recidivism prediction tools that are miscalibrated by race; and credit scoring models that encode historical patterns of discrimination. Whether these constitute bias in a morally actionable sense depends on contested frameworks for algorithmic fairness that are mathematically irreconcilable.

Manuscript

title: "Algorithmic Bias: What the Evidence Shows About Discrimination in Automated Systems" abstract: "Automated decision systems are increasingly used in high-stakes contexts — criminal justice risk assessment, hiring, lending, healthcare — with the promise of replacing inconsistent human judgment with objective, data-driven decisions. Evidence shows that these systems can encode and amplify existing inequalities, sometimes producing outcomes that are measurably more unfair than human judgment would be. Understanding the sources of algorithmic bias and the evidence for its real-world effects is necessary for evaluating policy responses." topic: technology author: nonacademicresearch.org Editorial date: 2026-05-09

Algorithmic Bias: What the Evidence Shows About Discrimination in Automated Systems

Abstract

Automated decision systems are increasingly used in high-stakes contexts — criminal justice risk assessment, hiring, lending, healthcare — with the promise of replacing inconsistent human judgment with objective, data-driven decisions. Evidence shows that these systems can encode and amplify existing inequalities, sometimes producing outcomes that are measurably more unfair than human judgment would be. Understanding the sources of algorithmic bias and the evidence for its real-world effects is necessary for evaluating policy responses.

Background

The appeal of algorithmic decision-making is straightforward: computers apply rules consistently, do not have bad days, are not influenced by the race or attractiveness of the person in front of them, and can process more information than human judges can hold in mind. In theory, algorithms can reduce the arbitrary variation and unconscious bias that characterize human judgment.

In practice, algorithms are trained on historical data that reflects historical human decisions — which themselves embedded societal inequalities. An algorithm trained on past hiring decisions by a company that historically hired few women will learn to favor resumes resembling those of past hires. An algorithm trained on past criminal justice decisions in a system that prosecuted Black defendants at higher rates will reflect those rates. This is not a bug in a specific algorithm; it is a general structural challenge.

The field of algorithmic fairness — which sits at the intersection of computer science, statistics, law, and social science — has formalized these concerns and produced substantial empirical evidence about how specific deployed systems behave.

The Evidence

COMPAS and Criminal Justice Risk Assessment

The most widely discussed case of potential algorithmic bias involves COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a commercial risk assessment tool used in U.S. courts to inform pretrial detention, sentencing, and parole decisions. COMPAS generates a risk score predicting likelihood of reoffending.

ProPublica's investigation (Angwin et al., 2016) analyzed COMPAS scores for 7,000 defendants in Broward County, Florida, and compared outcomes two years later. They found that Black defendants were nearly twice as likely to be falsely flagged as high risk (labeled high risk but not arrested for a new crime) compared to white defendants, while white defendants were more likely to be falsely labeled low risk (labeled low risk but subsequently arrested). These disparities held after controlling for prior crimes, type of charge, and age.

Northpointe (the company that produces COMPAS) responded that its scores are equally accurate — have the same overall error rate — for Black and white defendants (Dieterich et al., 2016). This led to a technically important recognition in the literature: Chouldechova (2017, Big Data) proved mathematically that when the prevalence of the outcome (reoffending rates) differs between groups, it is impossible to simultaneously satisfy all three common fairness definitions — equal false positive rates, equal false negative rates, and equal positive predictive values. Different notions of fairness are mathematically incompatible when base rates differ.

This means that "fair" in an algorithmic context requires a normative choice among competing values, not simply better engineering.

Hiring Algorithms

Amazon developed and then abandoned an AI hiring tool after discovering it penalized resumes containing the word "women's" (as in "women's chess club") and downgraded graduates of all-women's colleges (Dastin, 2018, Reuters). The system had been trained on a decade of past hiring decisions at Amazon, which — like most large technology companies — had been predominantly male. The algorithm learned to associate male-linked features with successful candidates.

Raghavan et al. (2020, FAccT) surveyed hiring tools in commercial use and found that most companies selling automated screening tools performed no formal bias audits, and that available methodological frameworks for auditing were not consistently applied.

Facial Recognition: A Case of Differential Accuracy

Buolamwini and Gebru (2018, FAccT Proceedings) conducted a systematic audit of three commercially deployed facial recognition systems from Microsoft, IBM, and Face++. Error rates for gender classification varied dramatically by skin tone: the highest error rates (up to 35% for darker-skinned women) were approximately 34 percentage points higher than the lowest error rates (under 1% for lighter-skinned men). The finding was attributed to training datasets that overrepresented lighter-skinned individuals.

These findings prompted IBM and Microsoft to update their systems, and subsequent audits showed improved but not eliminated disparities. The basic problem — that systems trained on non-representative datasets perform worse on underrepresented groups — applies broadly to image and voice recognition systems.

Healthcare Risk Scoring

Obermeyer et al. (2019, Science) audited a commercial algorithm used by U.S. health systems to identify high-risk patients for care management programs. The algorithm used healthcare cost as a proxy for medical need — a reasonable assumption, since sicker patients generally cost more. However, given unequal access to healthcare, Black patients at the same illness level incurred lower costs than white patients, because they had less access to care. The algorithm therefore systematically underestimated illness severity for Black patients, assigning them lower risk scores and reducing their likelihood of enrollment in care management programs.

The study estimated that the bias reduced the proportion of Black patients receiving extra care by more than half relative to a race-unaware measure of actual illness severity. The health system subsequently adjusted the algorithm to reduce this disparity.

Counterarguments

Some researchers argue that algorithmic systems, even when imperfect, may outperform the human judges they replace. Stevenson and Doleac (2022, Journal of Law and Economics) found that counties using risk assessment tools in bail decisions achieved similar public safety outcomes with lower pretrial detention rates compared to counties without them — suggesting net benefits even if the tools are imperfect. Comparison should be against human judgment, not against an ideal.

There is also debate about whether "bias" properly describes statistical disparities that may reflect real differences in underlying risk — a distinction with significant ethical and legal implications but that requires careful specification.

What We Can Conclude

The evidence that deployed algorithmic systems can produce discriminatory outcomes — measured by differential error rates across demographic groups — is robust and comes from audits in criminal justice, hiring, healthcare, and facial recognition. These are not theoretical concerns; they have been documented in systems actually making consequential decisions affecting people's lives.

The core insight from the theoretical literature is that different fairness criteria are mathematically incompatible when base rates differ — meaning algorithmic fairness requires normative choices, not only better engineering. The central policy implication is that algorithmic systems in high-stakes domains require independent auditing, transparency, and accountability structures comparable to those applied to other consequential decision processes.

References

Angwin, J., et al. (2016, May 23). Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 77–91.
Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163. https://doi.org/10.1089/big.2016.0047
Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
Dieterich, W., et al. (2016). COMPAS risk scales: Demonstrating accuracy, equity and predictive parity. Northpointe, Inc.
Obermeyer, Z., et al. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Raghavan, M., et al. (2020). Mitigating bias in algorithmic hiring: Evaluating claims and practices. Proceedings of the ACM FAccT 2020.
Stevenson, M.T., & Doleac, J.L. (2022). Algorithmic risk assessment in the hands of humans. Journal of Law and Economics, 65(2), 195–241. https://doi.org/10.1086/718400

Versions (1)

v1May 9, 2026— initial publicationmd

Cite this paper

APA

nonacademicresearch.org Editorial (2026). Algorithmic Bias: Evidence for Discrimination in Automated Systems. nonacademicresearch.org. nar:aya4d3uwucbuhxsaht

BibTeX

@misc{qbppiqk9,
  title = {Algorithmic Bias: Evidence for Discrimination in Automated Systems},
  author = {nonacademicresearch.org Editorial},
  year = {2026},
  howpublished = {nonacademicresearch.org},
  note = {nar:aya4d3uwucbuhxsaht},
}

Temporary identifier. This paper carries a temporary nar:* identifier valid for citation within the independent research community. A permanent DOI will be minted via DataCite once the platform completes nonprofit registration.

Discussion (0)

Loading…