When LLM-Generated Code Perpetuates User Interface Accessibility Barriers, How Can We Break the Cycle?

Abstract: The integration of Large Language Models (LLMs) into web development workflows has the potential to revolutionize user interface design, yet their ability to produce accessible interfaces still remains underexplored. In this paper, we present an evaluation of LLM-generated user interfaces against the accessibility criteria from the Web Content Accessibility Guidelines (WCAG 2.1), comparing the output of ChatGPT and Claude with two distinct prompt types—accessibility-agnostic and accessibility-oriented. Our evaluation approach, consisting of automated testing, expert evaluation, and LLM self-reflection, reveals that accessibility-oriented prompts increase success counts and reduce violation rates in WCAG criteria, but persistent barriers remain, particularly in semantic structure. We argue that advancing accessible user interface development through LLM-generated code requires not just enhanced prompting but deeper semantic understanding and context awareness in these systems. We use our findings to suggest future work opportunities.

Authors: Alexandra-Elena Gurita, Radu-Daniel Vatavu

Conference: W4A ’25, the 22nd International Web for All Conference. ACM, New York, NY, USA

Link: https://dx.doi.org/10.1145/3744257.3744266