December 10, 2025 | Digital Quality Measures (DQM), Digital Quality Transformation, Quality Improvement, Uncategorized
Digital Quality HEDIS® Engines – How to Judge Performance Results
As the industry transitions to digital quality, we are likely to see a lot of vendors reporting on the performance of their FHIR and CQL digital HEDIS® engines. Performance in this context means how fast the engine can run, and it’s a good indicator of efficiency of the operations. It’s also very likely correlated with the cost to run the engine, and therefore the cost to customers. It’s clear to Astrata that the new digital engines should significantly outperform traditional engines, which will either let you run these engines more frequently at the same cost, or reduce the product costs for health plans to run at the same frequency you currently run. That makes performance metrics a key set of indicators to track and understand as you look at your digital HEDIS® vendor options.
Rebecca Jacobson, MD, MS, FACMI
Co-Founder, CEO, and President
How is performance actually measured? The CQL/FHIR technology underlying the new digital HEDIS® engines is very different than the technology used in the traditional HEDIS® engines that came before. As vendors develop their digital engines, we are all tracking and improving our performance as we implement more and more ways to increase speed and reduce costs. Typically, vendors are using Synthea data to generate synthetic FHIR data for these experiments. Synthea is an open-source synthetic data generation system, made available by Mitre. NCQA uses Synthea data to generate the FHIR test decks which are used by vendors to complete the NCQA validation (certification) process for digital engines. As vendors make improvements to our digital engines, we test how fast we can go on a specific dataset of members. For example, you may see that a specific digital engine can run 2M members across 79 measures in under 24 hours.
Don’t be fooled. It’s all about the data. These runtime metrics are great, but they leave out critical variables that you must be aware of in order to understand whether you will see similar results on your member population – these critical variables are (1) total number of FHIR resources, and (2) distribution of these FHIR resources across the population. To understand this, let’s unpack the issue of data complexity. Imagine a perfectly healthy 25-year-old who sees her PCP every 2 years. And let’s compare that to a 50-year-old with multiple chronic conditions who sees his PCP at least 6 times a year. The medical chart for our young and healthy member will be small and simple. No medications, few lab tests, and limited diagnoses. The number of FHIR resources needed to represent member data will likewise be small. In contrast, many more FHIR resources are needed to represent the older, sicker member. This is important because the run time results from a digital engine (2M members * 79 measures in <24 hours) are meaningless unless the number of resources and distribution are similar to what you expect to see in your population (or at least a typical real population). If the data used in the test is too simple and does not reflect a real population, then you can bet that those run time results are inflated.
How to compare to real data?
As part of our efforts to continuously increase performance of our eMeasure digital engine, we have created HEDIS® data sets that mirror the complexity of real member HEDIS® FHIR data, based on an analysis of real health plan data. Results of our analysis are shown below. Some caution must be exercised in interpreting a small sample size. We expect to be able to update these averages as our number of implementations grows.
| #1 | #2 | #3 | Average | |
|---|---|---|---|---|
| Average Resources per Bundle | 2482 | 1048 | 1480 | 1670 |
| Median Resources per Bundle | 1699 | 812 | Unavailable | 1255.5 |
It’s important to keep in mind that the complexity of data (and thus the number of FHIR resources) is going to vary – across measures, across populations, and across sources of data. But the numbers shown above should give you a much more realistic understanding of what to look for when you evaluate runtime metrics from a digital engine. If you see statements such as “2M members * 79 measures over 250M FHIR resources in under 24 hours” you can be sure that it is not likely to be reflective of performance in the real world where the number of FHIR resources is likely to be 10 times higher.
Running daily. For most health plans, daily HEDIS® runs have been a long-time goal, although very few plans can accomplish this unless they are very small. The reasons for this are complex – they include both the underlying technology of traditional measurement engines, and the complexity of work processes that support the transformation and loading of the data. Digital engines can be expected to run much faster and to simplify data operations substantially. Daily runs are an expected benefit of digital engines, which makes it very important to evaluate your potential digital vendor carefully. You want to ensure that they can support daily HEDIS® runs across your populations.
Shouldn’t we be testing on real data? By generating synthetic data that is very close to the complexity of real data, we can test variables such as speed over increasing dataset size. But synthetic data can only take us so far. Currently, the number of members, populations, and data sources running through these digital engines is not enough to give us a good picture of performance on real data. However, initial results at Astrata suggest that it will be similar to what we see with synthetic data. Over the next year, we expect to have health plan customers running entire populations across all HEDIS® measures. We think it’s likely that other vendors will get there too. As more performance data emerges, we will be able to more carefully describe the efficiency of these digital engines, and the potential impact on speed and cost.
In summary, be careful about how you interpret performance results based on synthetic data. Make sure that the profile of the synthetic data assesses the applicability of performance results in the real world. Ask questions and evaluate vendors critically for their ability to support digital HEDIS® on real-world data.