AI Bias in Diagnosing Skin Diseases: A Study on Demographic Disparities

August 8, 2025
AI Bias in Diagnosing Skin Diseases: A Study on Demographic Disparities

An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has uncovered significant biases in AI diagnostic models for skin diseases, according to a study published on July 24, 2025, in the journal *Health Data Science*. The study highlights the performance of multimodal large language models (LLMs), including ChatGPT-4 and LLaVA, in analyzing dermatoscopic images for three common skin conditions: melanoma, melanocytic nevi, and benign keratosis-like lesions.

Utilizing a dataset of approximately 10,000 images, the researchers assessed the models' diagnostic accuracy across varying demographics, specifically focusing on sex and age. Preliminary results indicated that while both ChatGPT-4 and LLaVA outperformed traditional deep learning models in general, there were notable discrepancies in fairness among demographic groups. Notably, ChatGPT-4 exhibited a more equitable performance, while LLaVA showed pronounced biases related to sex.

Dr. Wan stated, "While large language models like ChatGPT-4 and LLaVA demonstrate clear potential in dermatology, we must address the observed biases, particularly across sex and age groups, to ensure these technologies are safe and effective for all patients." This underscores a critical need for ongoing evaluation of AI systems to prevent the perpetuation of existing healthcare disparities.

The implications of such biases in AI diagnostics are profound. If AI systems are trained on datasets lacking diversity, they risk misdiagnosing or overlooking diseases in underrepresented populations, further exacerbating healthcare inequities. The World Health Organization has previously highlighted the importance of equity in health interventions, which aligns with the findings of this study. In light of these results, the research team plans to expand their analysis by incorporating additional demographic variables, such as skin tone, to further understand and mitigate biases in AI diagnostics.

This study aligns with broader concerns about the ethical deployment of artificial intelligence in medicine. Various experts, including Dr. Emily Torres, a healthcare technology analyst at the National Institutes of Health, have voiced the necessity for robust frameworks to assess AI algorithms comprehensively. According to Dr. Torres, "The deployment of AI in clinical settings must be accompanied by rigorous testing for biases to prevent potential harm to marginalized groups."

Moreover, the findings resonate with earlier research indicating disparities in AI performance across different demographic segments. A 2021 study published in *Nature Medicine* by researchers at Stanford University found that AI algorithms for skin cancer detection were less accurate for individuals with darker skin tones, raising alarms about the reliability of such technologies in diverse populations.

The significance of addressing these biases cannot be overstated. As AI systems become increasingly integrated into healthcare, ensuring their equitable performance across all demographic groups is essential to fostering trust and ensuring patient safety. The ongoing development of AI tools should prioritize inclusivity to reflect the diverse population they serve.

As healthcare continues to evolve with technology, the insights from this study provide a crucial foundation for future research and policy-making. The need for transparency in AI development, coupled with a commitment to equity, will be paramount as the medical field moves forward into an era increasingly defined by artificial intelligence. The findings suggest that without attention to these disparities, the promise of AI in enhancing diagnostic accuracy may inadvertently contribute to exacerbating healthcare inequalities.

In conclusion, while AI innovations hold promise for the future of medicine, this study serves as a reminder of the critical importance of addressing biases within these systems to ensure equitable healthcare outcomes for all demographics. The research team at ShanghaiTech University is poised to continue its investigation into these biases, further contributing to the discourse on ethical AI in healthcare.

Advertisement

Fake Ad Placeholder (Ad slot: YYYYYYYYYY)

Tags

AI biasskin disease diagnosismultimodal language modelsChatGPT-4LLaVAdermatologydemographic disparitieshealth equitymedical imaginghealthcare technologyartificial intelligencemelanomamelanocytic nevibenign keratosis-like lesionsresearch studyShanghaiTech UniversityZhiyu WanHealth Data Sciencemedical ethicspatient safetyhealthcare disparitiesclinical AI applicationsdiversity in healthcareAI performanceStanford UniversityNature MedicineWorld Health Organizationhealth interventionsAI in medicinetechnology assessment

Advertisement

Fake Ad Placeholder (Ad slot: ZZZZZZZZZZ)