Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot

Yasin Calışkan; Abas Haşimoğlu

doi:10.5455/PBS.20250722043208

2026, Vol: 16, Issue: 2

16 / 2Current Issue Online First Archive Aims and Scope Abstracting & Indexing Most Accessed Articles Most Downloaded Articles Most Cited Articles

Required files to be uploaded

Article Format_(v2r)_
Copyright Agreement and Author Acceptance Form
Ethics Committee Permit
Title Page

« Previous Article

Original Research
Online Published: 20 Jan 2026

PBS. 2026; 16(1): 46-56

doi: 10.5455/PBS.20250722043208

Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot

Yasin Calışkan, Abas Haşimoğlu.

Abstract
Objective: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder affecting social communication and involving restricted, repetitive behaviors. With AI tools like ChatGPT-4, Gemini, and Microsoft Copilot becoming increasingly popular information sources for healthcare professionals and families, this study aimed to evaluate and compare their accuracy and readability when responding to ASDrelated questions.
Methods: In this cross-sectional study, we presented 88 questions (45 Frequently Asked Questions [FAQs] and 43 guideline-based) to the three AI models. We sourced questions from social media, parent forums, and clinical guidelines. Two blinded child psychiatrists evaluated response accuracy using a four-grade scale, while readability was assessed using four established indices: Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index, and Flesch Reading Ease.
Results: For FAQs, accuracy rates showed significant differences (p=0.001): Gemini (100%), ChatGPT-4 (95.6%), and Microsoft Copilot (71.1%). For guideline-based questions, accuracy also varied significantly (p=0.010): Gemini (86.0%), ChatGPT-4 (83.7%), and Microsoft Copilot (55.8%). Interestingly, Microsoft Copilot provided the most readable FAQ responses, while Gemini offered the most balanced readability for guideline-based questions.
Conclusion: Our findings show that Gemini and ChatGPT-4 are highly accurate for ASD information, particularly for complex scientific content, while Microsoft Copilot produced more accessible text despite lower accuracy. These results suggest different models may better serve different audiences—healthcare professionals might benefit from Gemini or ChatGPT-4's precision, while general users might prefer Copilot's readability, highlighting opportunities for improving both reliability and accessibility in healthcare communication.

Key words: Artificial Intelligence, Autism spectrum disorder, ChatGPT-4, Gemini, Microsoft Copilot, Readability

	ARTICLE TOOLS
	Abstract
	PDF Fulltext
	How to cite this article
	Citation Tools
	Related Records
	Articles by Yasin Calışkan Articles by Abas Haşimoğlu
	on Google
	on Google Scholar

How to Cite this Article

Pubmed Style

Calışkan Y, Haşimoğlu A. Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. PBS. 2026; 16(1): 46-56. doi:10.5455/PBS.20250722043208

Web Style

Calışkan Y, Haşimoğlu A. Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. https://www.pbsciences.org/?mno=272541 [Access: June 22, 2026]. doi:10.5455/PBS.20250722043208

AMA (American Medical Association) Style

Calışkan Y, Haşimoğlu A. Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. PBS. 2026; 16(1): 46-56. doi:10.5455/PBS.20250722043208

Vancouver/ICMJE Style

Calışkan Y, Haşimoğlu A. Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. PBS. (2026), [cited June 22, 2026]; 16(1): 46-56. doi:10.5455/PBS.20250722043208

Harvard Style

Calışkan, Y. & Haşimoğlu, . A. (2026) Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. PBS, 16 (1), 46-56. doi:10.5455/PBS.20250722043208

Turabian Style

Calışkan, Yasin, and Abas Haşimoğlu. 2026. Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot. Psychiatry and Behavioral Sciences, 16 (1), 46-56. doi:10.5455/PBS.20250722043208

Chicago Style

Calışkan, Yasin, and Abas Haşimoğlu. "Assessing The Performance of Artificial Intelligence Models In Autism Spectrum Disorder: Accuracy and Readability of ChatGPT, Gemini, and Microsoft Copilot." Psychiatry and Behavioral Sciences 16 (2026), 46-56. doi:10.5455/PBS.20250722043208

MLA (The Modern Language Association) Style

APA (American Psychological Association) Style

Author Login Reviewer Login About Publisher Open Access Policy Peer Review Policy Publication Ethics and Publication Malpractice Statement Plagiarism Policy Digital Archiving & Preservation Policies License Information Price Policy

About Psychiatry and Behavioral Sciences

Contact Information

How to cite this article