Advertisement
Letter to the Editor| Volume 185, 109732, April 2023

Can ChatGPT pass the life support exams without entering the American heart association course?

      To the Editor,

      ChatGPT is a large language model developed by OpenAI,

      Openai blog chatgpt. (Accessed 1 February 2023, at: https://openai.com/blog/chatgpt/).

      trained on a massive dataset of text from the internet. It can generate human-like responses to a variety of questions and prompts, in multiple languages and subject areas. To our knowledge, the performance of ChatGPT has not been examined in the life support and resuscitation space. In this study we tested the accuracy of ChatGPT’s answers to the American Heart Association (AHA) Basic Life Support (BLS) and Advanced Cardiovascular Life Support (ACLS) exams.
      We employed ChatGPT

      Openai blog chatgpt. (Accessed 1 February 2023, at: https://openai.com/blog/chatgpt/).

      (OpenAI, San Francisco; version: 9 and 30 January 2023) to answer the life support exams (AHA BLS Exams A and B from February 2016, 25 questions each; and AHA ACLS Exams A and B from March 2016, 50 questions each). ChatGPT will not provide information beyond 2021; for this reason, we selected older versions of the exams. Questions based on interpretation of images were not included because ChatGPT does not support such data. For scenario-based question series we used the same session, utilizing ChatGPT’s memory retention bias but for each stand-alone question a new session was conducted.
      • Kung T.H.
      • Cheatham M.
      • Medinilla A.
      • et al.
      Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models.
      • Antaki F.
      • Touma S.
      • Milad D.
      • et al.
      Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings.
      Each answer provided by ChatGPT was compared to the exam answer key provided by the American Heart Association. The threshold for passing each exam is 84%.

      Heart and Stroke Foundation of Canada. Instructor resource for resuscitation programs in Canada. (Accessed 1 February 2023, at: https://resuscitation.heartandstroke.ca/).

      Additional to overall performance, we also asked ChatGPT to estimate the “Level of correctness” (LOC) for each of the answers.
      In total, 96 stand-alone and 30 scenario-based questions were used for testing ChatGPT performance. ChatGPT achieved 68% (17/25) and 64% (16/25) accuracy in the 25-question AHA BLS exams and 68.4% (26/38) and 76.3% (29/38) accuracy in the two 38-question AHA ACLS exams. For each AHA ACLS exam 12 questions were removed because they required electrocardiogram interpretation. For 21.5% (25/116) of answers ChatGPT provided the reference. The AHA and American College of Cardiology (84%; 21/25) were the most commonly referenced sources. The overall LOC for all the exams was 89.5% (95% CI: 87.4–91.6%), with the BLS A exam being the highest (95% CI: 93.8%; 90.6–97%) (Fig. 1).
      Figure thumbnail gr1
      Fig. 1ChatGPT's correctness level based on four AHA exams.
      In this study, ChatGPT did not reach the passing threshold for any of the exams. Our results were similar to the study where the United Medical Licensing Exam was used for testing ChatGPT performance.
      • Kung T.H.
      • Cheatham M.
      • Medinilla A.
      • et al.
      Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models.

      Liévin V, Hother CE, Winther, O. Can large language models reason about medical questions?. arXiv preprint arXiv:2207.08143. 2023.

      We observed that in scenario-based questions ChatGPT provided not only the answer to the question, as in single-alone questions, but also insightful explanations to support the given answer. In comparison to similar artificial intelligence based systems
      • Alagha E.C.
      • Helbing R.R.
      Evaluating the quality of voice assistants’ responses to consumer health questions about vaccines: an exploratory comparison of Alexa, Google Assistant and Siri.
      • Miner A.S.
      • Milstein A.
      • Schueller S.
      • et al.
      Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health.
      • Picard C.
      • Smith K.E.
      • Picard K.
      • et al.
      Can Alexa, Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries.
      , the answers provided by ChatGPT were on average very relevant, accurate and showed significantly better congruence with resuscitation guidelines than previous study.
      • Picard C.
      • Smith K.E.
      • Picard K.
      • et al.
      Can Alexa, Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries.
      Although, the reference provided by ChatGPT for each answer were very general the rationale provided for the answers was often significantly more detailed than the rationale provided in the ACLS exam key. In conclusion, despite the overestimated LOC, ChatGPT has shown promising results in becoming a powerful reference and self-learning tool for preparing for the life support exams.

      Conflict of interest

      Nino Fijačko is a member of the ERC BLS Science and Education Committee and ILCOR Task Force Education Implementation and Team. Christoper Picard holds equity in Cavenwell AI (Ottawa, Ontario, Canada). Matthew Douma, Lucija Gosak and Gregor Štiglic declare that they have no conflict of interest.

      References

      1. Openai blog chatgpt. (Accessed 1 February 2023, at: https://openai.com/blog/chatgpt/).

        • Kung T.H.
        • Cheatham M.
        • Medinilla A.
        • et al.
        Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models.
        medRxiv. 2022; : 12
        • Antaki F.
        • Touma S.
        • Milad D.
        • et al.
        Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings.
        medRxiv. 2023; : 1
      2. Heart and Stroke Foundation of Canada. Instructor resource for resuscitation programs in Canada. (Accessed 1 February 2023, at: https://resuscitation.heartandstroke.ca/).

      3. Liévin V, Hother CE, Winther, O. Can large language models reason about medical questions?. arXiv preprint arXiv:2207.08143. 2023.

        • Alagha E.C.
        • Helbing R.R.
        Evaluating the quality of voice assistants’ responses to consumer health questions about vaccines: an exploratory comparison of Alexa, Google Assistant and Siri.
        BMJ health & care informatics. 2019; 26100075
        • Miner A.S.
        • Milstein A.
        • Schueller S.
        • et al.
        Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health.
        JAMA internal medicine. 2016; 176: 619-625
        • Picard C.
        • Smith K.E.
        • Picard K.
        • et al.
        Can Alexa, Cortana, Google Assistant and Siri save your life? A mixed-methods analysis of virtual digital assistants and their responses to first aid and basic life support queries.
        BMJ Innovations. 2019; 6: 1