February 23, 2024 | AI and NLP

What One Year of ChatGPT has Taught Me About the Future of Quality Measurement – Part 1. The AI HEDIS® Analyst

It’s been over a year since the release of OpenAI’s ChatGPT. And this is my first blog on the topic of how I think this new technology will fundamentally change healthcare Quality measurement and improvement. Why the wait?

With the extraordinary flurry of activity, publications, and releases, it took time to form a stable and coherent view of where we are and how these innovations will impact our field. Simultaneously, our team at Astrata has been testing out how we will apply this technology — with exciting initial results!

In this two-part blog, I’ll break down these new developments for busy Quality leaders, with the goal of showing you why generative AI technologies like ChatGPT could turn Quality measurement on its head over the next few years.

Rebecca Jacobson, MD, MS, FACMI

Co-Founder, CEO, and President

Recent Blogs

Developing trust in AI for healthcare – How payers and vendors can work together

June 19, 2025 | AI and NLP, Quality Improvement, Uncategorized

Making the Case for Payer Digital Quality – How to Build a Business Case and Communicate with Your Senior Leadership

May 19, 2025 | AI and NLP, Digital Quality Measures (DQM), Digital Quality Transformation, Quality Improvement, Star Ratings, Uncategorized, Year Round Prospective HEDIS®

Digital Quality Measurement is Heating Up – Will you be ready?

April 7, 2025 | CQL and FHIR, Digital Quality Measures (DQM), Digital Quality Transformation, Uncategorized

Why All the Hype?

By now almost every human on the planet has heard of ChatGPT and most have either used it or have used some application that uses it (for example the Bing search engine or other Microsoft products). ChatGPT is a kind of generative artificial intelligence (a.k.a “genAI”), meaning that it can use AI to generate new content. This technology is based on a type of model called a large language model, named for reasons that will become clear in Part 2 of this blog. ChatGPT generates text in the form of answers, conversations, documents, emails, etc. But other genAI technologies and products can also generate images and videos. The apparent creativity and human-like behavior of generative AI applications have created a wave of development, investment, and innovation which has been likened to the early days of the Internet.

Importantly, some GenAI solutions can also make use of tools, including other computer applications. That means that you can give it a problem and it will be able to write a piece of computer code, analyze the resulting data, and produce visualizations and explanations. That is actually a very remarkable achievement and clearly speaks to activities that are constantly ongoing in the measurement of healthcare quality.

Imagine if a HEDIS analyst could simply tell a computer in human language what they wanted to know about a population, and the computer would go off and write the necessary computer code, determine which members are non-compliant, create a table of compliance segmented by value based care contract, and then create specific campaigns with personalized text message to engage members in closing their care gaps. I think this level of automation will be coming in the near-to-mid future.

Has GenAI Been Used in Healthcare?

Yes. Applications that use Generative AI technology are already in use in healthcare. Most notably, they are being used in the ambient documentation space. In this task, the technology “listens” to the conversation of a provider and patient and is able to produce appropriate clinical documentation which the provider can review and enter into the electronic health record, potentially at a fraction of the effort. But generative AI has many other important applications in healthcare and biomedicine. And I would argue that quality measurement is an excellent place to start using generative AI.

What Does This Have to Do With Quality Measurement?

One reason quality measurement is so ripe for this technology is that health plans and VBC organizations need to analyze vast amounts of data coming from heterogeneous sources (many different providers and their electronic health records). The standard way of thinking about this problem most closely aligns with the interoperability requirements of ONC and CMS. Make electronic health records collect structured data, and then make that data standard and interoperable. We’ve undertaken decades of work towards that goal, and we have made significant progress. But this is being done at great cost and with many unintended consequences, including creating an almost impossible degree of provider burden. What if we could easily read and extract the necessary data from free text notes, dictated summaries, and text boxes without forcing providers to turn every item of data into a button press? I do not think this is science fiction anymore.

Another reason quality measurement and improvement is an excellent area to apply GenAI is that the current process is complex, requiring many people and a diverse set of processes and solutions. As every health plan leader knows, this is all very costly. GenAI could reduce these costs while enhancing our ability to manage these populations.

At the same time, quality measurement has less innate risk of harm to patients and members than activities involving clinical decision-making and direct patient care. At the end of the day, there is always a clinician standing between a gap and a member. In fact, quality measurement is one of a set of administrative activities in healthcare that seem especially likely to become automated by AI over the next few years.

In fact, many health plans are already using natural language processing (NLP) methods to deal with specific business problems such as moving to prospective HEDIS®, Risk Adjustment record review, and HCC suspecting. Large language models are in many ways the next big advance in NLP, and they will produce a stunning improvement in the accuracy of the existing technologies. I think this will happen quickly, and will make it possible for risk and quality abstractors to work at greatly increased productivity. Perhaps it will even produce changes to what regulators allow if (or maybe when) these technologies rival or exceed human performance.

While the future of this technology is very bright, there are some storm clouds ahead.

In Part 2 of this blog (“Moving beyond the Hype”), I will dig deeper to explain how these models work, what barriers stand between us and this exciting new world, and what could possibly go wrong. We will also tackle some important questions about how genAI will impact the quality workforce, and how health plans can choose wisely in selecting partners. Finally, I’ll provide a reading list for those that really want to go deep!