Amritjot Dhillon, MS
Medical Student
TouroCOM
Hicksville, New York, United States
Rajwinder Singh, BS
Research Associate
Complete Orthopedics
Stony Brook, New York, United States
Romona Ling, n/a
Undergraduate Student
Cornell University
Ithaca, New York, United States
Mohammad Athar, MD
Attending Orthopedic Surgeon
Complete Orthopedics
Stony Brook, New York, United States
Suhirad Khokhar, MD
Attending Orthopedic Surgeon
Complete Orthopedics
Stony Brook, New York, United States
The comprehensibility of medical knowledge in rehabilitation medicine can potentially be enhanced through Artificial Intelligence (AI) technology. As healthcare continues to advance, search engines like Google, Bing, and Yahoo provide access to medical information. However, the quality and readability of information related to rehabilitation conditions, such as functional impairments or mobility issues, are often inadequate. We evaluated four popular AI tools—ChatGPT, Google Gemini, Meta AI, and Microsoft Copilot—on their ability to provide accurate and understandable information about ankle instability. The American Medical Association (AMA) and Centers for Disease Control and Prevention (CDC) recommend public medical information be written at an eighth-grade level, while the National Institutes of Health (NIH) suggest sixth-grade level.
Design: Responses to a series of prompts were assessed for quality and simplified to a seventh-grade reading level. Each response was graded by physicians using a scoring system: "Yes" (1 point) if the response met criteria, "Partially" (0.5 points) if it somewhat met criteria, and "No" (0 points) if it did not. Additionally, Flesch-Kincaid score was used to evaluate the complexity of responses.
Results:
The results showed significant differences between original and simplified responses. Original responses averaged a quality score of 10.38 out of 18, while simplified responses scored 7.79 (P = 0.001). Meta AI consistently performed best, with original and simplified scores of 11.33 and 8.67, respectively. ChatGPT had the lowest original score (9.67), while Google Gemini and Microsoft Copilot had the lowest simplified scores (7.33).
Original responses were written at a college reading level (Flesch-Kincaid score: 34.6), while simplified responses were at eighth- to ninth-grade levels (F-K score: 69.2), a significant improvement in readability (P = 0.004). However, the simplification reduced the quality of the responses.
Conclusions:
While AI tools can effectively simplify medical information for public use, maintaining a balance between readability and high-quality content remains a challenge.