Economic evaluation of policy initiatives in the organisation and delivery of healthcare: a case study of gastroenterology endoscopy services

Background Complex clinical interventions are increasingly subject to evaluation by randomised trial linked to economic evaluation. However evaluations of policy initiatives tend to eschew experimental designs in favour of interpretative perspectives which rarely allow the economic evaluation methods used in clinical trials. As evidence of the cost effectiveness of such initiatives is critical in informing policy, it is important to explore whether conventional economic evaluation methods apply to experimental evaluations of policy initiatives. Methods We used mixed methods based on a quasi-experimental design to evaluate a policy initiative whose aim was to expedite the modernisation of gastroenterology endoscopy services in England. We compared 10 sites which had received funding and support to modernise their endoscopy services with 10 controls. We collected data from five waves of patients undergoing endoscopy. The economic component of the study compared sites by levels of investment in modernisation and patients’ use of health service resources, time off work and health related quality of life. Results We found no statistically significant difference between intervention and control sites in investment in modernisation or any patient outcome including health. Conclusions This study highlights difficulties in applying the rigour of a randomised trial and associated technique of economic evaluation to a policy initiative. It nevertheless demonstrates the feasibility of using this approach although further work is needed to demonstrate its generalisability in other applications. The present application shows that the small incentives offered to intervention sites did not enhance modernisation of gastroenterology endoscopy services or improve patient outcomes.


Background
Clinical interventions are increasingly evaluated by randomised controlled trials (RCTs). The methods of economic evaluation which are used alongside RCTs are becoming well established e.g. [1,2]. In contrast evaluators of policy initiatives tend to eschew experimental designs, advocating more interpretative perspectives [3] and adopting context-dependent approaches like 'realistic evaluation' [4]. These are less amenable to the rigorous economic evaluation methods used in clinical trials.
In recent years, however, there have been efforts to extend the rigour of the RCT to the evaluation of clinical interventions that are more complex than single treatments. For example, the UK Medical Research Council (MRC) has developed a framework for the design and evaluation of 'complex interventions' which includes a definitive RCT [5]. Policy initiatives are arguably even more complex than complex clinical interventions. Given the need for evidence of the effectiveness of policy initiatives, however, the rigour of the RCT is still the goal. This paper describes a study which evaluated a policy initiative using a quasi-experimental approach and economic appraisal.

The case study
In 2002, the National Health Service Modernisation Agency (NHSMA) initiated its Modernising Endoscopy Services (MES) programme to help gastroenterology departments in England modernise their endoscopy services. The specific purpose of the MES programme was to improve waiting times and throughput of endoscopy services by better matching demand and supply in endoscopy units and thus improve clinical outcomes. Following a pilot study, NHSMA chose 26 sites to receive £30,000 to fund approved service redesign plans and provided them with specially designed data software (Toolkit™) to analyse their endoscopy services, together with training and support from the MES itself. The Toolkit™ required the daily input of the number of referrals, patients waiting and lost appointment slots by procedure type and reason. These data were used by the endoscopy staff to monitor services and were uploaded to the MES programme for external analysis. Sites whose applications were unsuccessful were still eligible to use the Toolkit™ outside the MES programme.
This paper describes the methods and results of applying standard economic appraisal principles to evaluate the MES project. The final section discusses how the economic evaluation of this policy initiative differed from that in traditional clinical trials.

Methods
The National Institute for Health Research funded the ENIGMA study ('Evaluating New approaches In Gastroenterology initiated by the Modernisation Agency) through its Service Delivery and Organisation Programme to conduct an independent evaluation of the impact of the MES programme on services, patients and professionals [6]. ENIGMA adopted a quasi-experimental approach by selecting and recruiting random samples of ten of the 26 'intervention' sites chosen by the NHSMA and ten of the 70 'control' sites whose applications for funding to support modernisation had been unsuccessful; we stratified both samples by number of beds in the hospital.
We collected data from patients referred to these sites for endoscopy across five waves from April 2004 to April 2006. We assessed 9,154 patients for eligibility and invited 7,974 to participate; 3,818 consented to take part. ENIGMA gained ethical approval from the Wales Multi-Centre Research Ethics Committee and research governance approval from each site.
The controlled nature of ENIGMA made it a good candidate for economic evaluation. It addressed five economic questions: 1. What was the cost of modernisation in each site? 2. Did modernisation costs differ between intervention and control sites?
3. Was there a difference in use of other NHS resources between intervention site patients and control site patients? 4. Was there a difference in time off work between intervention and control site patients? 5. Was there a difference in health outcomes as measured by quality adjusted life-years (QALYs) between intervention and control site patients?
These are standard economic evaluation questions to address rigorously when undertaking economic evaluation alongside clinical trials. Attempting to apply them to a policy initiative, however, created numerous difficulties which often required creativity to overcome.

Economic methods
The main economic analysis was a cost-utility study comparing total NHS costs with effects in terms of QALYs.

Cost of modernisation
In economic evaluation alongside clinical trials, estimation of direct intervention costs, i.e. the value of the resources used in the provision of the intervention, is normally straightforward in well-defined interventions. Although the MES project sought to modernise endoscopy services, we could not find a widely agreed definition of 'modernisation', on which the cost of the intervention depends. As the stated challenge of the final phase of the MES project was to "introduce new ways of working", for purposes of ENIGMA we defined modernisation as changes in working practices. Examples of such modernisation thus included changing methods (e.g. new IT systems to manage waiting lists), altering staff skill mix (e.g. training nurses to perform endoscopy), purchasing different equipment (e.g. long term decontamination units) and setting up new processes to monitor progress (e.g. modernisation team meetings). However we did not consider doing more of the same, for example by increasing staff numbers, as modernisation, although this clearly represents service improvement.
As many modernising changes began before the start of the evaluation we could not specify costs at the level of precision normally possible in prospective monitoring of resource use in economic evaluations alongside clinical trials. Instead, we estimated the resource costs of modernisation via two rounds of semi-structured interviews with key personnel at each study site, completed one year apart, early in 2006 and 2007. We asked sites to identify the individual with most knowledge of all aspects of the endoscopy service at that site to act as our informant. In response they nominated a range of managerial (e.g. Endoscopy Manager, Clinical Services Manager) and clinical (e.g. Consultant, Nurse Consultant, Endoscopy Sister) staff.
Before the first interview, we asked each site to provide relevant documentary sources describing its modernisation efforts and related resource consequences. We reviewed these and sent summaries to the selected informants before their first interview. These began with an explanation of principles to ensure that respondents had a clear understanding of what we considered a cost of modernisation. The second interview gave them the opportunity to respond to queries from the first interview and identify developments since the first visit.
Following standard methods of economic evaluation, we then measured resources which had been identified as contributing to modernisation in relevant natural or physical units and valued them using local data or national sources. In particular we valued staff hours according to grade; where we could not be sure of the grade, we estimated costs from the mid-point of the scale for similar advertised jobs [7].
Costs of off-site training courses undertaken by staff explicitly for modernisation included the money paid for courses where known, or else the cost for similar courses from web sources [8], plus the value of the trainee's time. In-house training included time of trainer and trainee, again valued according to grade. Where training led to staff re-grading, we included the extra cost of the re-graded post in the costs of modernisation.
We used the equipment costs incurred by sites, or else the most frequently incurred cost for similar equipment by other sites. We considered both training and equipment as one-off investments providing a flow of benefits over time and amortised all such costs by assuming fiveyear lifetimes and using a universal discount rate of 3.5% [9]. We also undertook a sensitivity analysis to test the effect of 10-year lifetimes on the costs of training and equipment. We adjusted all costs to common 2006 prices using the Health Service Price Index.

Patients' use of other health service resources
Modernising endoscopy services may also have indirect consequences for other health service resources. For example, shorter waiting times could affect patients' use of General Practitioner (GP) services. We therefore collected data on patients' use of other NHS services by questionnaires to patients recruited in five waves beginning in April 2004 and ending in April 2006. Each wave completed questionnaire at referral (Baseline Questionnaire -BQ), immediately after the procedure (Post Procedure Questionnaire -PPQ) and twelve months thereafter (12MQ). The time between being referred and undergoing the procedure varied between patients. We accepted returned questionnaires until the end of 2007. Where data were missing for the 12MQ, the last case carried forward method was adopted. For example where BQ (time point 0) and PPQ (time point 1) data were present but 12MQ (time point 2) data were missing, data were carried forward from the PPQ. A further weighting of the carried forward PPQ, based on the relative relationship of PPQ to 12MQ data, was not adopted as it was anticipated to be unlikely to have any major impact on the results.
When we undertook ENIGMA, there was little guidance how to collect resource use by patient recall, e.g. [10]. We therefore adapted a patient recall questionnaire which we had previously used successfully [11]. Questions asked patients about visits to GP surgeries and outpatient clinics, home visits by health professionals, hospital admissions including as a day case, and drugs prescribedall within the three-month period before completing each questionnaire. Although use of a 3 month period produces gaps in the data, it has the advantage of providing more accurate patient recall of resource use than over longer periods [12]. Accuracy was considered more important than completeness as the focus here is on differences between groups. Table 1 shows the unit costs of these resources. Given the short timescale, we did not discount these costs.

Value of patients' lost productivity
Economic appraisals need to specify the perspective of their evaluations. In the UK, the National Institute for Health and Care Excellence (NICE) guides the costeffective use of the National Health Service (NHS) budget by recommending which treatments the NHS should provide. It therefore prefers the perspective of NHS and personal social services [1].
Although there are also theoretical arguments to support this narrower perspective [18] we additionally assessed time off work by service uses as this might be reduced by improved service delivery. This was done by asking questions about time lost from work in BQ, PPQ and 12MQ. We compared lost work time and its value, estimated using average male and female earnings from Table 1.

Quality of life and statistical analysis
For economic evaluation, our primary outcome was health-related quality of life measured by the EQ-5D [19], which are then converted into QALYs. We undertook analysis of covariance (ANCOVA) with QALY scores as dependent variable, group (intervention or control) as primary independent variable, and age, gender, procedure type, degree of urgency, waiting time (i.e. time between BQ and PPQ), NHS resource use before baseline, number of beds and teaching status of hospital, wave of recruitment and innovation scores as covariates.
Recruiting a large sample across many sites or 'clusters' carries a risk of losing statistical power through intra-site correlation. We therefore augmented analysis of covariance with multi-level modelling [20] to address this issue, in particular in costs at PPQ and 12MQ, using MLWin version 2.0 [21].
As PPQ and 12MQ questionnaires varied in the length of waiting time from baseline, an ANCOVA based adjustment was carried out on data from these two questionnaires to account for baseline effects across all resource use items as well as for costs. To address the inevitable skewness in cost data, we also used nonparametric bootstrapping [22] when analysing costs.

Results
Cost of modernisation Table 2 shows the estimated mean costs of modernisation. As it proved difficult to specify exactly when many modernisation activities began, we treated all training and equipment costs that occurred during the 12-month modernisation period between site visits as 'initial' costs occurring simultaneously. We adopted the same approach to 'one-off' costs of other activities which finished when modernisation was complete. Table 2 shows total initial costs in column 7. This is the sum of the Equivalent Annual Cost (EAC) of equipment and training (column 4), the one-off costs (column 5) and the annual recurring costs such as permanent new modernisation-related posts (column 6). Dividing these initial costs by the activity (i.e. average number of endoscopies) in 2004 yields the marginal initial cost per patient i.e. the additional cost per patient due to modernisation. Table 2 also shows costs in subsequent years, namely recurring costs plus EAC of training and equipment, the activity in 2005 and the resulting marginal subsequent cost per patient-year.
All costs varied widely: equipment from zero to £260,000; training from £400 to £31,000; and in one-off costs from zero to £245,000. As an example of the latter, one site produced a 'modernisation initiative endoscopy list' and cleared it by sending patients to the local private hospital (£68,000).

Patients' use of other health service resources
After the first wave of patient recruitment, one intervention and one control site were withdrawn from the study because they were unable to comply with the strict patient recruitment criteria [6]. Patient-level data on use of NHS resources, time off work and quality of life were thus available for 18 of the 20 recruited sites (9 intervention, 9 control). A total of 3,818 BQs, 2,940 PPQs and 2,588 12MQs were available for complete case analyses (CCA). Imputation of missing data increased the number of analysable PPQs to 3055 and 12MQs to 3039. However, comparison of mean differences between patients in intervention and control sites showed little difference in results between methods. For example, all differences that were statistically significant (p <0.05) in the CCA remained significant with imputation and no non-significant results became significant. Thus although the imputation used the full dataset, it had little effect on results. It was therefore decided to report results only from the CCA.
Results of the multi-level modelling (MLM) did not show any statistically significant effects between sites on the five selected cost variables in either PPQ or 12MQ. Estimates of intra-site correlation, which reflect the degree of clustering at site-level, did not show any statistically significant site effects for the observed resource use data. Accordingly we conducted no further MLM analyses and report the remaining results without the minimal adjustments for clustering.  Table 3 shows the raw costs of other NHS resources, split by wave and group. Table 4 shows mean differences in NHS costs, where baseline costs are bootstrapped and post procedure and end of follow up costs are adjusted for baseline differences and waiting time. Most differences were negative indicating that patients from intervention sites incurred lower costs but few differences reached statistical significance. All differences which were statistically significant at PPQ became non-significant at 12MQ. The only statistically significant difference at 12MQ was for drugs in Wave 4 (p = 0.04) but this was not seen in Wave 5. Table 5 shows mean number of days off work and value of lost productivity by wave. There was a statistically significant difference in time off work in favour of the intervention in wave 4 (adjusted mean difference = -1.78 days; 95% CI from -3.47 to -0.09 days) but not in any other wave. When valued using relevant average earnings (male or female), however, the adjusted mean difference of -£27 was not statistically significant (95% CI from -£57 to £3). This result mirrors those seen for other NHS resources.

Value of patients' lost productivity
Taken together, the results suggest that total costs may have been lower in intervention sites, but the difference was not statistically significant.
Quality of life Table 6 shows two statistically significant adjusted mean differences in QALYs: against the intervention in Wave 2 at 12 months; and favouring the intervention in Wave 5 immediately after the procedure. The differences in the other 8 analyses were not significant with five favouring the intervention group and three favouring the control group. The other quality of life measures used in ENIGMA replicated this finding that the MES initiative did not have a statistically significant impact on quality of life [6].

Discussion
The economic questions addressed here were part of a larger study which examined the impact of the Modernising Endoscopy Services (MES) programme on a range of criteria by a variety of methods. These included: questionnaires to study sites about process data and innovation histories; questionnaires to patients about outcomes and waiting times; questionnaires to general practitioners with patients in ENIGMA about the modernisation of endoscopy services at their local study site; interviews to obtain the views of patients and professionals working at study sites; and focus groups to obtain the views of professionals working at non-study sites. Results from all methods were essentially consistent with no conflicting messages [6]. The quasi-experimental design of ENIGMA sought to evaluate whether a policy initiative based on a small financial contribution plus non-financial support from MES programme staff enhanced the modernisation of endoscopy services. We have previously succeeded in randomising Primary Care Trusts to evaluate a policy initiative to facilitate collaboration between general medical practices and community pharmacies in North and East Yorkshire [23,24]. Not surprisingly, however, the NHSMA preferred to choose 26 of the 80 applicant sites which they judged most likely to succeed. Hence the closest we could come to a rigorous quasi-experiment was by drawing stratified random samples from these 26 successful sites and the remaining 70 unsuccessful sites. If the choices made by NHSMA had been better than random, our evaluation would have been biased.
We also had to compromise when applying standard economic evaluation methods alongside our quasiexperiment. While the intervention took the clear form of support from MES, its aim was to expedite 'modernisation', for which there was no clear definition. Identifying which investments to include as costs of modernisation thus required an operational definition. So we chose to define modernisation as doing things differently rather than doing more of the same. We then relied on key personnel to decide whether each activity fitted that definition. Interviewees often had difficulty in accepting that this definition excluded some initiatives of which they were proud, for example creating new consultant and nursing posts which had reduced waiting times. However not all   examples, were as clear; thus the need to rely on informants' judgment whether an initiative represented modernisation or improvement was unavoidable and open to bias. It also proved difficult to isolate investments supported by the £30,000 provided to intervention sites from the other investments in modernisation.
Timing also posed several problems. Clinical trials normally introduce interventions at a defined time, whereas many study sites had started modernising long before MES began. This made it difficult to judge whether the cost of each modernisation activity fell within the relevant timeframe. Thus the wide variation in the equipment costs associated with modernisation, from zero to £260,000, depended on the situation at the start of the modernisation period. For example, one endoscopy service which reported no new equipment costs had recently occupied a new unit at a cost of some £2.5 million. Nevertheless, since its new equipment was in place before MES began, we could not attribute the costs of this equipment to MES. At the other extreme, one financially constrained service made no investments in new equipment. Similarly, the wide variation in training costs, from £444 to £31,100, reflected training needs at the start of the modernisation period.
The fact that the Department of Health launched MES at a time when modernisation was one of their key policy objectives for the NHS as a whole exacerbated these problems. Thus, despite our use of control sites to adjust for extraneous factors, the momentum of modernisation meant that the specific effects of an endoscopy-specific initiative were sure to be small. In addition, the policy context did not allow the study design to minimise contamination between groups as could have been done within an RCT. Control sites were not barred from using the Toolkit™ and while we do not know how many did so, any such use would dilute the intervention effect.
As a major aim of the study was to investigate whether standard methods of economic evaluation could be applied to a policy initiative, we intentionally applied a rigorous costing methodology. While the difficulties discussed above reduced the precision of our cost estimates, ENIGMA showed that many standard methods of economic evaluation could be applied without difficultyin particular to the estimation of the cost of patients' use of other health services and the value of patient level lost productivity and health related quality of life.

Conclusions
This case study has shown that even for an initiative as unspecified as Modernising Endoscopy Services, it is possible to design an economic evaluation alongside a quasi-experiment, albeit with caveats about the conclusions drawn. This design enabled us to compare the effects of the MES programme with those of a plausible alternative.
Subject to those caveats, ENIGMA has concluded that small incentives such as those offered to intervention sites by the NHSMA do not stimulate further investment. Furthermore they do not affect patient outcomes, other demands on the NHS or losses to industry from sickness absence. While less rigorous designs can also draw such economic conclusions, they suffer from the weaknesses inherent in studies which do not define a comparator.
Our use of a case study to explore the broad issue of whether standard economic evaluation techniques can be applied to policy initiatives using quasi-experimental designs inevitably raises questions about the extent to which the messages are generalisable and further applications are clearly required.