Since the introduction of the National Surgical Quality Improvement Program (NSQIP) in 1994 and its subsequent adoption in both the academic and private sectors through the early 2000’s, academic surgeons have been interested in understanding the quality and utility of data collected by the program. High caliber quality metrics are important for a number of reasons, including the delineation of where to focus quality improvement resources as well as which providers and facilities to reward/penalize in an increasingly pay-for-performance driven healthcare industry. Given the stakes involved in the accurate reporting of quality metrics and their use as a surrogate for performance, the topic often inspires passionate debate.
A recently published study in JAMA Surgery from Krell et al provides additional insight into the reliability of quality improvement data for comparing hospital surgical quality. The authors ask a fundamental question: can risk adjusted morbidity and mortality as measured by NSQIP reliably profile hospital performance? They conclude that these measures do not reliably differentiate hospital performance. This question underlies any discussion on the appropriateness of using these or similar metrics to direct payments and quality improvement resources.
JAMA Surgery carried an invited commentary from Rhoads and Wren on the study, and the topic is sure to inspire additional discussion in a variety of circles. One forum where this conversation has been ongoing is Twitter, where past AAS president David Berger (@DHBBaylorMed) posed the question of whether or not this study represents a setback to the measurement of surgical quality. He posed this question to Justin Dimick (@jdimick1, senior author on the paper and AAS President-Elect) and Clifford Ko (@cliffordkomd, Director of ACS NSQIP). The discussion evolved to touch on a few additional key topics which are relevant to all surgeons engaged in quality improvement:
https://twitter.com/DHBBaylorMed/status/444080894576324608
.@DHBBaylorMed @cliffordkomd Don't think so – just need to make sure measures are meaningful. Reliability = power calculation for quality.
— Justin B. Dimick (@jdimick1) March 13, 2014
@DHBBaylorMed Just need to choose our measures carefully, based on reliability among other factors, so we don't waste our resources.
— Justin B. Dimick (@jdimick1) March 13, 2014
https://twitter.com/DHBBaylorMed/status/444090938894192641
@DHBBaylorMed Resisters can use anything as ammunition. I think we need to push the science forward either way.
— Justin B. Dimick (@jdimick1) March 13, 2014
@DHBBaylorMed Focus on combined measures (multiple operations) for mortality and procedure-specific morbidity only for common procedures.
— Justin B. Dimick (@jdimick1) March 13, 2014
https://twitter.com/DHBBaylorMed/status/444098305736187904
We'll always try to improve reliability, but can't be distracted. Need to keep our "eyes on the prize"-which is better care and outcomes.
— Clifford Ko MD (@cliffordkomd) March 13, 2014
As you can see from above a great concern is whether this study could harm the field of surgical quality and be used as ammunition by members of the field who oppose quality improvement projects. As Dr. Berger notes, “Unfortunately many will use this article as justification not to participate [in] a surgical outcomes program.” This is a legitimate concern. Physician reluctance to participate in quality improvement programs is common and has multiple causes. Given psychological tendencies to be consistent in evaluating new topics, it is not unreasonable to expect some surgeons to view this study as confirmatory of flaws in NSQIP.
Although this study might be taken by some who read only the title as a condemnation of NSQIP, the reality is significantly more complex. The authors of the study do not appear to have drawn the conclusion that NSQIP should be discounted, but rather that improvements are needed to maximize the reliability of many surgical quality improvement platforms. The principle issue raised by this study is that a combination of low-volume facilities, infrequent procedures, and sampling methods yield data that may not reliably reflect hospitals’ true performance. Dr. Dimick concisely states that “Reliability = power calculation for quality.” While the study prompts caution and thoughtfulness when interpreting data from NSQIP, it does not justify condemning the utility of the program. Moreover, the study serves to reinforce a principle of quality improvement – iterative, data-driven revision of risk adjusted measurement of practice outcomes. Adjustment and fine tuning is part of the process.
Reliability is just one face of quality improvement metrics. As academic surgeons, our ultimate goal should be to advance patient care while advancing the science behind it. Krell et al have presented a highly informative analysis of how factors such as sampling and caseloads can impact outcomes measures. In light of this study, their suggestions to improve data collection and to account for reliability when assessing performance through outcomes measures are highly valuable. The improvements they suggest, as well as other possible techniques for ensuring that the field has accurate measures of performance, are critical to appropriately focusing quality improvement projects and directing provider reimbursement.
The discussion on Twitter has shown that this topic has the potential to generate interesting discussion on how surgeons collect outcomes data and how they can use those data in practice. These discussions will become increasingly important as pressure rises to use outcomes metrics as a gauge of performance for reimbursement purposes.
- How can current quality measurements be improved?
- How will using quality metrics to determine physician reimbursement change the world of surgery?
- How should surgeons decide where to focus their quality improvement projects?
Let us know by joining the conversation on Twitter (@AcademicSurgery) or in the comments section below. We look forward to hearing your thoughts!