Clinicians share a common goal of doing what is best for our patients, and this obligation extends to critically evaluating the tools we adopt into our practice. Clinical algorithms, which we define as “mathematical models that aid decision-making,”1 are ubiquitous tools in healthcare, influencing everything from diagnosis to treatment. These algorithms can enhance decision-making by providing insights that surpass the limits of individual human memory and cognition. Now, with the integration of artificial intelligence (AI), algorithms are becoming increasingly tailored to individual patients through the incorporation of nuanced data, like genetic profiles, social drivers of health, and real-time physiological metrics.
While algorithms are foundational to clinical reasoning and may become more beneficial as technology advances, they can also fail our patients. Carelessly designed algorithms or those explicitly designed to increase the profits of healthcare companies can exacerbate existing biases and increase health disparities. Algorithms should serve patients, not profits. To provide the highest standard of care, we must demand best practices in algorithm design through transparency and accountability so that healthcare technologies work in the service of patients and advance health equity. This article will explore the evolution of algorithms in healthcare, highlight examples of both their benefits and biases, and propose actionable strategies to ensure their ethical and equitable use in clinical practice.
Algorithms Are Foundational to Healthcare
The use of what we would recognize today as an algorithm in healthcare first appeared in the 19th century, when advances in probabilistic reasoning began influencing medical practice.2 During the 20th century, algorithms advanced from simple scoring systems (like the Apgar score for infant health) to computer-based algorithms that relied on hundreds of programmed rules to complement physician expertise and aid clinical decision-making. By the 1990s, algorithms had become integral to electronic medical records, enabling real-time decision support with tools like automated drug-interaction alerts.3 These algorithms, relying on rule-based systems and “if-then” directives, are a part of the first epoch of AI, which focused on encoding human knowledge into machines to improve efficiency and accuracy.4 Though a major advancement, this type of AI still had a “constant risk of human logic errors in its construction and bias encoded in its rules.”5 Additionally, these systems were less adaptable to real-world situations.
Models in the second epoch of AI, beginning in the 2000s, could now “learn” from errors, making them more accurate and adaptable. With technical advances in computers, AI became capable of processing vast datasets to discover patterns in data rather than relying on pre-programmed patterns. But significant limitations remained; for example, the model’s performance decreased if the real-time data differed significantly from the training data. Additionally, models of this era were “task specific,” meaning they could only do one thing.6
The third epoch of AI, which began around 2018, marks a transformative leap with new systems that can perform a wide variety of functions when given simple instructions.7 While these capabilities could improve healthcare communication and reduce administrative burdens, they also come with challenges, such as potentially generating “hallucinations” that are plausible but incorrect. Hallucinations are a byproduct of how these advanced models are trained. Experts describe the training process as showing the model a document and having it guess at the next word in the text until it guesses correctly. With each error, the model must adapt its “internal representation of how words fit together.”8 After showing the model many documents, it learns patterns in language and can predict the next word in the sequence. Hallucinations occur because the model is trained to be a language predictor—not a reality predictor. So perhaps it is not surprising that this new AI can perpetuate historical biases that underlie our language, worsening health equity. For example, researchers have demonstrated that AI language models can perpetuate covert racism through dialect prejudice, disproportionately associating African American English (AAE) with negative stereotypes and less prestigious jobs and assigning harsher judicial outcomes to AAE speakers compared to those using Standard American English.9
Examples of Clinical Algorithms
Clinical algorithms can be categorized into four buckets based on their purpose: assisting clinicians in making decisions, screening for high-risk or early disease, allocating resources, and evaluating physiologic function.
1. Clinical Decision Support Tools (CDSTs): Prompting Action
CDSTs are algorithms that prompt clinicians to act. These tools use patient data, like vital signs or lab results, to provide real-time recommendations. Widely used CDSTs include the National Early Warning Score and the Systemic Inflammatory Response Syndrome. These algorithms flag when a patient’s condition might be worsening. The algorithms calculate a score, and high scores alert the care team to take action, like calling the rapid response team.10 Antimicrobial stewardship algorithms are another example of CDSTs. These tools identify cases where antibiotics may not be appropriate or where adjustments to dosage or type are needed, reducing the spread of antibiotic resistance and improving patient outcomes.11
2. Screening Algorithms: Identifying High-Risk Individuals and Early Disease
Screening algorithms identify individuals at higher risk of developing disease or having poor outcomes and can help detect diseases in their early stages. These tools often use data from patient historical records, lab results, or imaging studies. Advanced algorithms, including those powered by AI, analyze subtle patterns in mammograms to detect breast cancer. These algorithms help radiologists reduce the likelihood of missed diagnoses and identify cases earlier, when treatment may be more effective.12 The Atherosclerotic Cardiovascular Disease Risk Estimator Plus (go.aft.org/l9j) is an example of a widely used screening tool. Used to predict 10-year and lifetime risk of cardiovascular disease, the algorithm incorporates social factors like smoking status and race alongside clinical variables like cholesterol levels and blood pressure.13
3. Resource Allocation Algorithms: Optimizing Healthcare Delivery
Resource allocation algorithms are tools for distributing resources, and they can be tailored to prioritize various factors, including maximizing overall benefit, addressing needs, ensuring fairness, enhancing efficiency, and optimizing profitability. An example of a resource allocation algorithm is the Model for End-Stage Liver Disease (MELD) score, which is used to rank patients on the waiting list for liver transplants. This algorithm calculates a score based on lab values like bilirubin, creatinine, and international normalized ratio to estimate how urgently a patient needs a transplant. Patients with higher scores are prioritized, ensuring that organs are allocated to those who are most critically ill (particularly if score changes over time are considered).14 Another allocation example is the Sequential Organ Failure Assessment (SOFA) score, which is commonly used in ICUs to track the severity of a patient’s illness. This algorithm evaluates how well a patient’s organs are functioning based on various clinical measurements, including blood pressure and oxygen levels. During crises like the COVID-19 pandemic, the SOFA score was used to help allocate resources like ventilators when demand exceeded supply.15
4. Equations That Estimate Physiologic Functions
Some medical algorithms estimate how well different parts of the body are functioning, providing quick and practical measurements to guide clinicians in diagnosing and treating various conditions. For example, the estimated glomerular filtration rate (eGFR) is an algorithm used to assess kidney function. This tool helps diagnose and monitor chronic kidney disease and has impacted referrals for kidney transplants and patients’ position on the waitlist.16 Other examples include the Fibrosis-4 Index,17 which estimates liver damage in people with chronic liver disease, and the Montreal Cognitive Assessment,18 which evaluates motor and cognitive functions.
The Problem: Algorithmic Bias
Unfortunately, algorithms can also fail us. We highlight examples of biased algorithms to illustrate the scope of the problem and then examine the algorithmic life cycle to understand why biases arise and are perpetuated.
The eGFR, noted above, is a canonical example of a race-inclusive algorithm that has potentially harmed Black patients. Race-inclusive eGFR calculators estimate a higher eGFR for Black patients than white patients with the same serum creatinine level.19 This adjustment was historically justified by higher average serum creatinine levels among Black individuals, which was attributed to increased muscle mass without supporting evidence. By reporting higher eGFR values for Black patients, the algorithm delayed referrals to kidney specialists or transplantation, worsening outcomes for a group already disproportionately affected by end-stage renal disease.20
Treatment decisions about interventions for lung conditions are made based on percentile values rather than absolute values. This is because lung function varies greatly with factors like height, age, and sex. Historically, pulmonary function test (PFT) algorithms have also been race-normalized due to observed population-level differences between Black patients and patients of other races. This adjustment, which falsely assumes race to be a reliable proxy for genetic variation, leads to inaccurate assessments, delays in diagnosis, and worse outcomes for Black individuals.21 The tool provides faulty information for important clinical decisions, such as determining eligibility for lung cancer resection or the choice of options such as pneumonectomy versus wedge resection or palliative care. PFTs—and all algorithms that include race as a variable—should be critically reevaluated. Race is a social construct, not a biological determinant. Thus, efforts to improve PFTs should avoid normalizing population-level differences and instead seek to understand how social drivers of health—like environmental pollution, occupational exposures, housing quality, and access to healthcare—contribute to disparities in lung function.
A striking example of a biased resource-allocation algorithm that harmed Black patients is one of the commercial risk-prediction tools that influence the care of nearly 200 million people in the United States each year.22 These tools, which are integral to high-risk care management programs, estimate how sick a patient is likely to become in the future.23 Patients predicted to be sicker are allocated more healthcare resources to prevent morbidity. This particular biased algorithm included past healthcare utilization, as measured by insurance claims data, as a proxy for severity of illness to predict future healthcare need and select patients for enrollment in a care management program.24 However, healthcare utilization is a poor proxy for severity of illness because Black patients are more likely than white patients to face barriers to accessing and utilizing healthcare in the United States. On average, Black patients generate lower costs in administrative claims databases despite having comparable or greater levels of illness as compared to white patients. Thus, Black patients had to be sicker than white patients to be selected for enrollment in the program. Restating this quantitatively, Black patients who met the enrollment criteria had 26.3 percent more chronic illnesses than white patients.25 This highlights how a poorly designed algorithm can embed and amplify health disparities.
The Algorithmic Life Cycle
There are five sequential phases of the life cycle an algorithm undergoes from inception to decommissioning: (1) problem formulation; (2) data selection, assessment, and management; (3) algorithm development, training, and validation; (4) deployment and integration of algorithms in intended settings; and (5) algorithm monitoring, maintenance, updating, or deimplementation.26 Others have comprehensively described types of biases that arise in different parts of the algorithm life cycle.27 In this article, we give representative examples.
Phase one (problem formulation) is the reason the algorithm is built in the first place. The underlying purpose of algorithms can vary widely, ranging from optimizing health outcomes to maximizing profits.28 Important considerations at this beginning phase include: What is the problem the algorithm seeks to address? What is the desired outcome? Who are the key actors? Explicitly outlining these aspects is foundational for later phases.29 An example of bias in phase one is a resource-allocation tool designed to minimize costs over other objectives. By prioritizing profits, this tool could decrease care for underserved populations who may require more resources (like transportation assistance to distant tertiary care centers), embedding systemic inequities into the system from the start.
In phase two (data selection, assessment, and management), data are chosen and prepared to “teach” the algorithm how to perform a task through proxy selection (assigning measurable variables to represent complex concepts).30 A simple comparison would be an expert physician compiling a representative set of example patient presentations of appendicitis and cholecystitis to teach an intern how to discern between the two conditions. Data vary in completeness, depth, and relevance, so developers should be aware of and transparent about the limits of the data used to create an algorithm.31 For example, the Gail Model is a screening algorithm that predicts a woman’s risk of developing breast cancer. This widely used tool was developed primarily with data from white women. Its predictions are less accurate for other groups, overestimating breast cancer risk by nearly twofold in Asian women.32
Data selected in phase two are fed to the model in phase three (algorithm development, training, and validation) to learn patterns and be “trained” to produce the output. An example of bias in development is the American Heart Association’s Get With the Guidelines–Heart Failure Risk Score, a support tool designed to guide decisions such as which patients with symptoms of heart failure should be referred to a cardiologist. This algorithm assigns additional risk points—without any scientific justification—to patients identified as non-Black, systematically categorizing Black patients as having lower risk.33 Thus, this biased algorithm would prompt clinicians to under-refer Black patients to cardiologists, which is particularly troublesome given significant disparities in mortality and readmission rates among Black patients with heart failure.34
Validation is the process of testing a model to ensure it performs well on new data and produces accurate, fair, and consistent output across groups.35 An example of bias in validation is failure to fix an algorithm designed to predict surgical outcomes if it consistently performed well for men but poorly for women. To mitigate such bias, experts employ “counterfactual fairness” as a useful framework. The idea is rooted in a thought experiment: if we could imagine a counterfactual world where everything about each individual that impacts surgical outcomes is the same except for a specific demographic attribute, like race or gender, a fair algorithm would have the same accuracy in both the actual and counterfactual worlds.36 Let’s go back to our resource-allocation example: If a Black patient is denied enrollment in a care management program, counterfactual fairness asks, Would the decision change if this same patient, with identical health conditions and needs, were white? If the answer is yes, the algorithm is unfair and needs to be corrected.
The final two phases of the algorithmic life cycle center on integration into clinical workflow (phase four) and continuous algorithm monitoring in real-world settings (phase five). Here it is important to consider how clinicians interpret and use the output from the algorithm. One critical issue in phase four is “automation bias,” where clinicians over-trust models and inappropriately act on their outputs, even when they are less accurate for certain groups.37 Uncritical reliance on flawed algorithms that systematically disadvantage certain groups risks perpetuating harm through misdiagnoses, delays in care, and worse health outcomes.38 Accessibility considerations are also crucial to these final phases. Deployment and monitoring require technical resources, training, and expertise primarily available in resource-rich settings, disadvantaging low-resource environments such as rural healthcare facilities. Tools that require advanced sensors or smartphone apps might not be accessible to patients in all communities, thereby limiting their potential benefits.
Ultimately, mitigating algorithmic bias requires an intentional approach at every stage of an algorithm’s life cycle. Due to its interconnected nature, biases introduced in one phase can ripple through subsequent phases. Further, a deployed algorithm can become a creator of clinical data; biased data could then be selected and used to train future algorithms, thereby amplifying inequities over time. Ethical issues,* including fairness and equity, should be explicitly discussed and addressed at each phase of the life cycle to ensure that algorithms that are integrated into our everyday clinical workflows help everyone.40
What Can Institutions and Policymakers Do?
Addressing algorithmic bias begins with targeted, actionable strategies from all stakeholders in the healthcare ecosystem. We agree with fellow clinicians and researchers who advocate for a shared responsibility framework† that involves developers, healthcare facilities, and legislative and regulatory bodies.43 Each of these groups plays a pivotal role in mitigating algorithmic bias and promoting health equity.
AI developers have a critical responsibility to ensure transparency to combat biases. They must use diverse datasets during development, rigorously test for bias, and disclose any limitations. Report cards for machine learning models can help ensure transparency and usability for end users. Examples include the TRIPOD+AI‡ checklist, which offers structured guidelines for reporting on prediction models in healthcare, and the “Model Facts” label, which consolidates actionable information for clinicians, such as appropriate use cases and limitations.44 Developers should provide clear documentation and accessible repositories detailing their methodologies and training data, enabling healthcare facilities to assess and monitor the algorithms they implement.45 Further, a centralized, open-access repository enables researchers from diverse disciplines—such as computer science, ethics, medicine, law, and economics—to collaboratively investigate algorithms, fostering interdisciplinary innovation and developing novel solutions to optimize outcomes and minimize bias. Transparency of the algorithm developer was crucial to fixing one resource-allocation algorithm. Researchers were able to examine all aspects of the algorithm, including the training data, to discover the root issue. The researchers then contacted the developers, who independently reevaluated their product and confirmed the results from the research team. Both parties then investigated solutions collaboratively to create a better, more nuanced proxy variable that combined health prediction with cost prediction.46 The success of such collaborative investigation to prevent and mitigate bias hinges on transparency from those creating and distributing algorithms.
Healthcare institutions, particularly those with limited resources, face significant challenges in adopting and evaluating AI technologies. Standardized guidelines and affordable tools are needed. Establishing federal standards for algorithm testing, alongside low-cost or free bias-detection tools, can allow under-resourced facilities to benefit from algorithmic tools without exacerbating inequities.47 Furthermore, accreditation and certification programs should offer independent evaluations of clinical algorithms, providing healthcare facilities with assurance that the tools they adopt meet rigorous standards for safety and equity.48
Regulatory and legislative bodies also have crucial roles to play. President Biden’s 2023 Executive Order 14110, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” explicitly recognized the risks AI poses in exacerbating discrimination and bias, particularly in “critical fields like healthcare, financial services, education, housing, law, and transportation” where mistakes or misuse directly harm citizens.49 The order emphasized the federal government’s commitment to ensuring AI serves the public good by advancing equity and civil rights, preventing new types of discrimination, and holding those developing and deploying AI accountable through rigorous regulations.
However, President Trump has revoked Executive Order 14110, prioritizing rapid development and deployment of AI over its responsible governance. The Trump administration claims that Biden’s order imposed “onerous and unnecessary government control” and constrained private sector innovation, ignoring the substantial risks that unchecked AI poses to patients and society.50 This policy shift revokes accountability measures and safeguards against bias, undermining public safety and health equity in favor of unfettered development.
By contrast, the European Union balances AI innovation with public protection through its Artificial Intelligence Act (AI Act). This legislation establishes a risk-based regulatory framework that imposes stricter rules on high-risk AI systems and prohibits AI systems that pose an “unacceptable risk.” For example, prohibited unacceptable risks include:
- deploying subliminal, manipulative, or deceptive techniques to distort behaviour and impair informed decision-making, causing significant harm.
- exploiting vulnerabilities related to age, disability, or socio-economic circumstances to distort behaviour, causing significant harm.…
- social scoring, i.e., evaluating or classifying individuals or groups based on social behaviour or personal traits, causing detrimental or unfavourable treatment of those people.51
The EU’s AI Act states that a high-risk AI system “profiles individuals, i.e. automated processing of personal data to assess various aspects of a person’s life, such as work performance, economic situation, health, preferences, interests, reliability, behaviour, location or movement.” For AI systems determined to be high-risk, the developers assume the most responsibilities, and they must adhere to stringent requirements to ensure compliance and safety. These responsibilities include managing risks at different stages of an AI system’s development, such as ensuring data are relevant, representative, and free of errors for their intended purpose; designing AI systems to be appropriately accurate; keeping detailed records so mistakes are traceable; and enabling AI deployers to implement human oversight.52 The EU’s AI Act provides a regulatory framework that could guide other nations in developing responsible AI governance.
Clinical Algorithms, Bias Threats, and Key Considerations | |||
Algorithm Type | Example Algorithm | Bias Threat | Key Considerations |
Clinical decision support tool | Get With the Guidelines–Heart Failure Risk Score | Faulty algorithm development incorporating race as biological determinant not supported by scientific evidence | Reevaluate race-based adjustments and transition to models that incorporate social drivers of health. |
Screening algorithm | Gail Model for breast cancer | Underrepresentative data: Developed using data from primarily white women in the United States | Expand datasets to include diverse populations; validate models in nonwhite populations to ensure accurate predictions. |
Resource allocation algorithm | High-risk care management program | Biased proxy variable (previous costs as measured by insurance data) for severity of illness that was confounded by poor access to care | Developers must ensure transparency by documenting methodologies and providing accessible repositories detailing training data. Transparency allows healthcare facilities and researchers to assess and monitor algorithms, and fosters collaborative efforts to identify and mitigate bias. |
Estimates of physiologic function | Estimated glomerular filtration rate; pulmonary function tests | Faulty algorithm development incorporating race as biological determinant not supported by scientific evidence | Transition to race-neutral models and address social drivers of health, like environmental factors and healthcare access. |
What Can Clinicians, Unions, and Activists Do?
Addressing algorithmic bias requires a multifaceted approach that leverages our roles as clinicians, union members, and political activists. Each person has unique opportunities to mitigate bias and foster equity within their specific context.
Clinicians should critically evaluate algorithms’ outputs and understand their limitations.§ As the end users of algorithms, clinicians share responsibility for ensuring these tools improve care and advance health equity. We must demand transparency regarding why and how the tools were developed. Further, we should always supplement an algorithm with human judgment, including forgoing the algorithm altogether if our questions reveal careless development or uncertain validity. The following questions can serve as a framework for clinicians to ensure they are using ethical and equitable tools for their patients:
- What problem does the algorithm address?
- Is the goal of the algorithm to make patients healthier?
- Are patient characteristics in the datasets used for development similar to those of the patients you care for and are applying the algorithm to?
- Were biases tested for and mitigated?
- How will the algorithm be monitored, maintained, and updated?
Union members can mitigate algorithmic bias by advocating for transparency, accountability, and equitable practices in their workplaces. They can push for oversight committees to evaluate algorithms for bias, negotiate for third-party audits of algorithms used in their workplaces, and demand access to information and training on algorithmic limitations. Through advocacy, union members can drive systemic changes, influencing both workplace policies and broader industry standards to prioritize equity and safety in algorithmic implementation.
Improving the health and well-being of all people, including the most marginalized, should be the goal of all healthcare workers and the healthcare system. In a political climate increasingly hostile to diversity, equity, and inclusion (DEI) initiatives, mitigating bias and advancing health equity requires persistent advocacy. With recent and anticipated attacks on DEI programs, advocates must ensure that health equity and improving the health of all remain central to public policy. Transparency is essential; advocates can push for regulations that require clear communication about how algorithms are designed, the data they use, and their potential impacts. If we want to address bias effectively, it is crucial that we engage our communities. Advocates should require the inclusion of marginalized voices in the development, deployment, and monitoring of algorithms, ensuring their needs and concerns are reflected in healthcare solutions. Accountability is also vital; advocates can help ensure mechanisms for redress to affected communities. Raising public awareness about algorithmic decisions and advocating for equitable access to technology can also reduce systemic inequities. Lastly, advocates can demand continuous oversight and deactivation of biased algorithms to ensure fairness, equity, and the ethical use of algorithms in healthcare.
Algorithms are foundational to healthcare, and now, with the increasing integration of AI, they are becoming more powerful and patient-centered as they incorporate large amounts of data to make tailored recommendations. However, algorithms are not inherently neutral instruments; their design and use can perpetuate systemic inequities, exacerbating health disparities. Ethical concerns arise when algorithms reflect biased assumptions or fail to consider marginalized populations. As technology advances, we have an ethical obligation to leverage these tools to improve patient care. At the same time, we must ensure fairness by critically evaluating their development at every stage. Only through this dual commitment can we build a healthcare system that is equitable, inclusive, and accessible to all.
Clara M. Bosco, MD, is a general surgery resident at the University of Arizona and a postdoctoral ethics scholar at the MacLean Center for Clinical Medical Ethics at the University of Chicago. Marshall H. Chin, MD, MPH, is the Richard Parrillo Family Distinguished Service Professor of Healthcare Ethics in the Department of Medicine at the University of Chicago and an associate director of the MacLean Center for Clinical Medical Ethics. William F. Parker, MD, PhD, is an assistant professor of Medicine and Public Health Sciences at the University of Chicago.
*AI technologies pose environmental justice concerns, disproportionately harming the health of low-income and racial and ethnic minority communities.39 Energy-intensive data centers, often reliant on fossil fuels, contribute to greenhouse gas emissions, local pollution, and climate change, all of which disproportionately burden marginalized communities. If AI is to be implemented ethically to improve patients’ health, it is also crucial to address its environmental impacts. (return to article)
†Organizations like the American Medical Association (AMA) and the National Academy of Medicine (NAM) are addressing the governance of AI in healthcare. The NAM’s draft AI Code of Conduct Framework emphasizes ethical, safe, and equitable use of AI through clearly defined principles and commitments.41 Similarly, the AMA advocates for transparency, safety, and effective integration of AI into clinical workflows.42 (return to article)
‡TRIPOD stands for Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis. (return to article)
§We understand that clinicians are already pressed for time. Unions should demand algorithm oversight committees as part of ongoing patient safety and equity work to help clinicians acquire this essential information about algorithms that is critical for patient care. (return to article)
Endnotes
1. M. Chin et al., “Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care,” JAMA Network Open 6, no. 12 (2023): e2345050.
2. D. Mayer, “A Brief History of Medicine and Statistics,” in Essential Evidence-Based Medicine, 2nd ed. (Cambridge, UK: Cambridge University Press, 2009), 1–8; and U. Tröhler, “Development of Clinical Probabilistic Practice in Britain Before Gavarret Part 1: The Long 18th Century,” Journal of the Royal Society of Medicine 113, no. 10 (October 2020): 403–6.
3. D. Bates et al., “The Impact of Computerized Physician Order Entry on Medication Error Prevention,” Journal of the American Medical Informatics Association 6, no. 4 (July 1999): 313–21.
4. M. Howell, G. Corrado, and K. DeSalvo, “Three Epochs of Artificial Intelligence in Health Care,” JAMA 331, no. 3 (January 16, 2024): 242–44.
5. Howell, Corrado, and DeSalvo, “Three Epochs.”
6. Howell, Corrado, and DeSalvo, “Three Epochs.”
7. Howell, Corrado, and DeSalvo, “Three Epochs.”
8. Howell, Corrado, and DeSalvo, “Three Epochs.”
9. V. Hofmann et al., “AI Generates Covertly Racist Decisions About People Based on Their Dialect,” Nature 633 (2024): 147–54.
10. O. Usman, A. Usman, and M. Ward, “Comparison of SIRS, qSOFA, and NEWS for the Early Identification of Sepsis in the Emergency Department,” American Journal of Emergency Medicine 37, no. 8 (August 2019): 1490–97.
11. S. Kanjilal et al., “A Decision Algorithm to Promote Outpatient Antimicrobial Stewardship for Uncomplicated Urinary Tract Infection,” Science Translational Medicine 12, no. 568 (2020): eaay5067.
12. S. McKinney et al., “International Evaluation of an AI System for Breast Cancer Screening,” Nature 577 (2020): 89–94.
13. American College of Cardiology, “ASCVD Risk Estimator Plus,” tools.acc.org/ascvd-risk-estimator-plus/#!/calculate/estimate.
14. T. Bittermann and P. Abt, “MELD-Based Allocation at 20: Can We Evolve and Mature?,” American Journal of Transplantation 21, no. 11 (November 2021): 3511–12.
15. W. Miller et al., “Accuracy of the Sequential Organ Failure Assessment Score for In-Hospital Mortality by Race and Relevance to Crisis Standards of Care,” JAMA Network Open 4, no. 6 (2021): e2113891.
16. N. Eneanya, W. Yang, and P. Reese, “Reconsidering the Consequences of Using Race to Estimate Kidney Function,” JAMA 322, no. 2 (2019): 113–14.
17. A. Blanco-Grau et al., “Assessing Liver Fibrosis Using the FIB4 Index in the Community Setting,” Diagnostics (Basel) 11, no. 12 (November 29, 2021): 2236.
18. Z. Nasreddine, “Montreal Cognitive Assessment (MoCA®): Version 8.3 English,” Geriatric Tool Kit, University of Missouri, geriatrictoolkit.missouri.edu/cog/MoCA-8.3-English-Test-2018-04.pdf.
19. D. Vyas, L. Eisenstein, and D. Jones, “Hidden in Plain Sight—Reconsidering the Use of Race Correction in Clinical Algorithms,” New England Journal of Medicine 383, no. 9 (June 17, 2020): 874–82.
20. Vyas, Eisenstein, and Jones, “Hidden in Plain Sight”; and Eneanya, Yang, and Reese, “Reconsidering the Consequences.”
21. S. Bonner et al., “Clinical Implications of Removing Race-Corrected Pulmonary Function Tests for African American Patients Requiring Surgery for Lung Cancer,” JAMA Surgery 158, no. 10 (2023): 1061–68; and S. Beaverson et al., “Things We Do for No Reason™: Race Adjustments in Calculating Lung Function from Spirometry Measurements,” Journal of Hospital Medicine 18, no. 9 (September 2023): 845–47.
22. Z. Obermeyer et al., “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations,” Science 366, no. 6464 (October 25, 2019): 447–53.
23. A. Rajkomar et al., “Ensuring Fairness in Machine Learning to Advance Health Equity,” Annals of Internal Medicine 169, no. 12 (December 4, 2018): 866–72.
24. Obermeyer et al., “Dissecting Racial Bias.”
25. Obermeyer et al., “Dissecting Racial Bias.”
26. J. Roski et al., “Enhancing Trust in AI Through Industry Self-Governance,” Journal of the American Medical Informatics Association 28, no. 7 (July 2021): 1582–90; and Chin et al., “Guiding Principles.”
27. Rajkomar et al., “Ensuring Fairness”; Roski et al., “Enhancing Trust”; and Chin et al., “Guiding Principles.”
28. B. Collins et al., “Addressing Ethical Issues in Healthcare Artificial Intelligence Using a Lifecycle-Informed Process,” JAMIA Open 7, no. 4 (December 2024): ooae108.
29. M. Ng et al., “The AI Life Cycle: A Holistic Approach to Creating Ethical AI for Health Decisions,” Nature Medicine 28, no. 11 (November 2022): 2247–49.
30. L. Nakayama et al., “Unmasking Biases and Navigating Pitfalls in the Ophthalmic Artificial Intelligence Lifecycle: A Narrative Review,” PlOS: Digital Health 3, no. 10 (October 8, 2024): e0000618.
31. Ng et al., “The AI Life Cycle.”
32. S. Rostami et al., “Discriminatory Accuracy of the Gail Model for Breast Cancer Risk Assessment Among Iranian Women,” Iranian Journal of Public Health 49, no. 11 (November 2020): 2205–13; and X. Wang et al., “Assessment of Performance of the Gail Model for Predicting Breast Cancer Risk: A Systematic Review and Meta-Analysis with Trial Sequential Analysis,” Breast Cancer Research 20 (2018): 18.
33. Vyas, Eisenstein, and Jones, “Hidden in Plain Sight.”
34. L. Eberly et al., “Identification of Racial Inequities in Access to Specialized Inpatient Heart Failure Care at an Academic Medical Center,” Circulation: Heart Failure 12, no. 11 (October 29, 2019): e006214.
35. Chin et al., “Guiding Principles.”
36. M. Kusner et al., “Counterfactual Fairness,” Advances in Neural Information Processing Systems 30 (2017), papers.nips.cc/paper/2017.
37. Rajkomar et al., “Ensuring Fairness.”
38. Vyas, Eisenstein, and Jones, “Hidden in Plain Sight”; and Eberly et al., “Identification of Racial Inequities.”
39. Roski et al., “Enhancing Trust.”
40. A. Crimmins et al., eds., Fifth National Climate Assessment (Washington, DC: US Global Change Research Program, 2023), nca2023.globalchange.gov.
41. L. Adams et al., “Artificial Intelligence in Health, Health Care, and Biomedical Science: An AI Code of Conduct Principles and Commitments Discussion Draft,” NAM Perspectives (April 8, 2024): 10.31478/202403a.
42. J. Lubell, “How Health AI Can Be a Physician’s ‘Co-Pilot’ to Improve Care,” American Medical Association, January 16, 2025, ama-assn.org/print/pdf/node/131801.
43. R. Ratwani, K. Sutton, and J. Galarraga, “Addressing AI Algorithmic Bias in Health Care,” JAMA 332, no. 13 (October 1, 2024): 1051–52.
44. G. Collins et al., “TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models That Use Regression or Machine Learning Methods,” BMJ 385 (April 16, 2024): e078378; and M. Sendak et al., “Presenting Machine Learning Model Information to Clinical End Users with Model Facts Labels,” npj Digital Medicine 3 (2020): 41.
45. Ratwani, Sutton, and Galarraga, “Addressing AI Algorithmic Bias.”
46. Obermeyer et al., “Dissecting Racial Bias.”
47. Ratwani, Sutton, and Galarraga, “Addressing AI Algorithmic Bias.”
48. Roski et al., “Enhancing Trust.”
49. J. Biden, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” Executive Order 14110, Federal Register, November 1, 2023, federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence.
50. White House, “Fact Sheet: President Donald J. Trump Takes Action to Enhance America’s AI Leadership,” January 23, 2025, whitehouse.gov/fact-sheets/2025/01/fact-sheet-president-donald-j-trump-takes-action-to-enhance-americas-ai-leadership.
51. “High-Level Summary of the AI Act,” Future of Life Institute, February 27, 2024, artificialintelligenceact.eu/high-level-summary.
52. “High-Level Summary.”
[Illustrations by Eva Vázquez]