June 19, 2025 | AI and NLP, Quality Improvement, Uncategorized
As the number of AI use cases and solutions for payers continues to grow, one thing that we hear consistently is that “trust” is a major barrier to fully adopting AI solutions. A recent publication by McKinsey cites risk concerns as the largest challenge to payers in adopting GenAI. In many ways, I think we use the word “trust” to encompass a lot of different and varied concerns. In this blog I’ll unpack five different concerns that may be underlying issues of trust and then outline what payers and vendors can do as they continue to create and adopt AI solutions.
Rebecca Jacobson, MD, MS, FACMI
Co-Founder, CEO, and President
The extraordinary investment that has gone into the AI ecosystem over the past several years has created a bit of an echo-chamber and has set expectations very high. While there is no question that AI will have a powerful impact on healthcare in the coming years, executives are skeptical that implementations will yield the kind of ROI that is being advertised. Some estimate that only 30% of AI pilots go on to production usage.
Payers should look for evidence of real, measurable value and evaluate vendors for previous, real-world success in AI implementations that go beyond pilot-phase.
Vendors should remember that AI models alone provide limited value to payers, unless they are embedded in solutions that tackle true pain points, achieve better results than what is currently available, and offer a price point that demonstrates ROI. Examples of ROI case studies that Astrata has done can be found here.
Embedded at the core of any healthcare AI solution is measurement of “performance” (which is the word that AI folks use to talk about the accuracy of their models). An AI vendor that knows what they are doing is obsessed with measuring performance and builds performance measurement into every part of their product and implementation lifecycle. The lack of rigorous performance data is frequently another factor in reducing trust. Beyond performance/accuracy, implementations should be measuring KPIs that directly track to ROI – whether that’s increased efficiency or improved outcomes. Payers may not yet feel comfortable evaluating performance of AI solutions on their own. External consultants can be helpful but do not replace the need for rigor and transparency from vendors.
Payers should look for vendors that can describe exactly how they measure performance, including when during the software/model lifecycle (both internally, and after deployment). Transparency around these results, including interval detailed reporting, should be an expectation.
Vendors should use AI operations (AI ops) best practices to test and validate their models prior to deployment, and once retraining or tuning on local data is complete. Evaluation methodology should be appropriate to the use case and output. Expert-labeled gold standards may be needed to evaluate performance, especially to estimate the frequency of false negatives. Modifications to models and software should always be tested to assess their impact before re-release. Astrata incorporates evaluation into every step, starting with NLP measure development, and extending through local customization and deployment. If you’re evaluating performance of AI for quality measurement, you can read more about the difference between performance metrics and KPIs here.
Another trust-buster that comes up frequently is the degradation of AI performance over time as the distribution of the data changes. This is a particular problem right now for payers, because most payers are in the midst of markedly increasing their use of clinical data. When significant new unseen data is added, AI model performance may decline. It’s important to monitor for drift on an ongoing basis, and to retrain or retune models if performance degrades beyond acceptable limits.
Payers should expect ongoing model monitoring and should work with their AI vendors to establish oversight of these programs. Often these programs are tightly integrated with operations, so that anomalies and concerning trends that appear during use of the system are identified early and are addressed.
Vendors operating their systems with human oversight should ensure that any user facing tools make it easy to provide feedback related to performance. For example, Astrata deploys our HEDIS NLP solution within the workflow of prospective payer medical record review (MRR) teams and continuously evaluates disagreements between AI model and MRR teams. When systems are operating with only intermittent human review, automation should be in place to catch significant deviations in expected results (which can alert payer teams to dig deeper with their vendors). h their vendors).
How much human oversight is needed to verify results depends on the use case. Use cases that have material impact on patient care are not yet ready for full AI automation in my opinion. But that does not mean that all back-office functions can be fully automated either. In fact, activities that have regulatory or audit oversight or that have significant financial implications often require humans “in-the-loop”. Examples of these kinds of activities include quality measurement and risk adjustment.
Payers should be cautious as AI programs are being rolled out, ensuring that each use case is separately vetted by the appropriate internal experts to establish the degree of human oversight that is needed. When possible, payers should try to use established workflows for validation and oversight, for example by using AI to provide initial results, while moving staff into a reviewer capacity. This minimizes unnecessary costs while ensuring that the right expertise is in place to supervise.
Vendors should create products that support humans “in-the-loop” until we are ready to gradually shift from partial to full automation, which may take some time. As an example, Astrata’s Chart Review solution provides triage capabilities to existing MRR teams, enabling automated first-pass review and moving abstractors into a role that’s more like a reviewer. This allows efficiency gains that are greater than 700%, which provides a way to transition teams focused only on hybrid season to population-scale prospective HEDIS without increasing team size.
Because of the scale and volume of protected health information that is used, AI systems present unique challenges and require increased scrutiny. At the same time, the rapid growth in AI vendors has created a cohort of startups that have not yet developed the kind of security and privacy maturity that is a prerequisite for working in healthcare. On the payer side, organizational change is needed to provide governance and oversight as new AI solutions meet the real world.
Payers should insist that AI vendors (even startups) meet basic security standards such as HITRUST certification. Payers that establish AI governance early will be better positioned to evaluate how and where models are trained, whether their data is separated from other payer data throughout the training process, and how vendors adhere to foundational standards.
Vendors should adopt risk–based approaches to AI security, including access controls, data protections, as well as appropriate deployment strategies. They should seek to implement guidance from existing governance, risk and compliance frameworks, such as those coming from the Coalition for Health AI (CHAI) found here. At Astrata, we are proud to be contributing to emerging AI standards and accreditations from national organizations focused on using AI for a variety of healthcare purposes including quality measurement.
By following these best practices, vendors and payers can better work together to build and retain trust in healthcare AI systems. We know that trust is the currency of progress in healthcare. And it’s up to all of us to keep it growing!