Review of Australian health economic evaluation – 245 interventions: what can we say about cost effectiveness?

Background There is an increasing body of published cost-utility analyses of health interventions which we sought to draw together to inform research and policy. Methods To achieve consistency in costing base and policy context, study scope was limited to Australian-based cost-effectiveness analyses. Through a comprehensive literature review we identified 245 health care interventions that met our study criteria. Results The median cost-effectiveness ratio was A$18,100 (~US$13,000) per QALY/DALY/LY (quality adjusted life year gained or, disability adjusted life year averted or life year gained). Some modalities tended to perform worse, such as vaccinations and diagnostics (median cost/QALY $58,000 and $68,000 respectively), than others such as allied health, lifestyle, in-patient interventions (median cost/QALY/DALY/LY all at ~A$9,000~US$6,500). Interventions addressing some diseases such as diabetes and impaired glucose tolerance or alcohol and drug dependence tended to perform well (median cost/QALY/DALY/LY < A$3,700, < US$5,000). Interventions targeting younger persons < 25 years (median cost/QALY/DALY/LY < A$41,200) tended to perform less well than those targeting adults > 25 years (median cost/QALY/DALY/LY < A$16,000). However, there was also substantial variation in the cost effectiveness of individual interventions within and across all categories. Conclusion For any given condition, modality or setting there are likely to be examples of interventions that are cost effective and cost ineffective. It will be important for decision makers to make decisions based on the individual merits of an intervention rather than rely on broad generalisations. Further evaluation is warranted to address gaps in the literature and to ensure that evaluations are performed in areas with greatest potential benefit.


Conclusion:
For any given condition, modality or setting there are likely to be examples of interventions that are cost effective and cost ineffective. It will be important for decision makers to make decisions based on the individual merits of an intervention rather than rely on broad generalisations. Further evaluation is warranted to address gaps in the literature and to ensure that evaluations are performed in areas with greatest potential benefit.

Background
Because resources are limited not all potentially beneficial services can be funded. Choices must be made in allocating scarce resources. Economic evaluation can help inform resource allocation choices by comparing costs and consequences of two or more alternatives. Comparisons between interventions will be more robust where they are country specific, at least in terms of input costs, which differ considerably between countries. To date Australian economic evaluations have not been systemati-cally described, appraised or explored, except for decisions of the PBAC (Pharmaceutical Benefits Advisory Council) between 1991 to 1996 [1]. However, given confidentiality of data, the performance of specific interventions was not reported. There is now a substantial body of published health economic evaluations in Australia that have used 'final and global' measures of performance (life years, quality adjusted life years and disability adjusted life years) which allows comparison across health care interventions.
The aim of the current paper is to describe and explore Australian published economic evaluations and to analyse the distribution of published cost-effectiveness ratios. This analysis will determine whether there is any identifiable pattern in published cost-effectiveness ratios. The results will potentially to assist policy makers with resource allocation decisions and will identify gaps in the types of interventions evaluated.

Searching for cost-effectiveness studies
The Medline OVID database from 1966 to present was searched in April 2005 for relevant studies using key words for "cost effectiveness" and "economic evaluation" combined with the key word "Australia". In addition websites of Australian health economics centres and government health departments were searched [See Additional file 1]. Key words such as "cost", "economic" and "evaluation" were used separately. Bibliographies of the articles reviewed were searched for further relevant articles, and a key author search was conducted for authors identified with multiple relevant publications. No restrictions were made by year of publication, and all publicly available reports and papers were eligible for inclusion.

Selection
Studies of economic analysis of lifesaving or quality enhancing "health" interventions were eligible for inclusion, defined as broadly fitting within the context of the health care system. An initial selection of potentially relevant articles was made by one reviewer (KD). This selection was broad and overly inclusively.
The following inclusion criteria were then applied by two reviewers (KD and DM) independently to each full text article initially identified as potentially relevant. Consensus was reached by discussion.
• Resources were estimated in Australian dollars.
• The economic evaluation presented as cost per LY saved, death averted, QALY gained or DALY averted, or this could be simply calculated from the figures provided.
• The article was published in English.
• The article was not a duplicate publication. The most complete or recent work by the authors was selected for inclusion with supplementary information retrieved from other reports. Publication on similar interventions by different authors did not class as duplicate.
• The study was primary research. Review articles citing the work of others were excluded, although the reference lists were searched for additional relevant publications.

Validity assessment
An assessment of the quality of the economic evaluations was performed by one reviewer (KD) following study inclusion. The quality criteria reflect items taken from a framework for quality of cost-effectiveness models developed by Sculpher et al [2]. This instrument was chosen as it incorporates economic modelling as well as evaluation and is therefore broader in scope than other quality appraisal checklists that only apply to economic evaluation. The Sculpher framework provides a list of dimensions of a quality economic model and what constitutes good practice. In addition a list of questions is provided in order to enable the framework to be used as a practice tool for critical appraisal. There are a number of dimensions to the framework including structure, disease states, options (comparators), time horizon, cycle length, data identification, data incorporation, internal and external consistency. The items deemed most appropriate for our brief appraisal were taken from the categories 'options', 'data identification' and 'data incorporation'.
The strength of underlying evidence was rated strong (RCT or meta-analysis) or limited (not RCT or meta-analysis). The comparator chosen for the evaluation was rated either as appropriate (described and justified) or inappropriate (not described or justified). Measurement of costs was rated as appropriate (marginal, clearly described, sources of price and quantity data cited) or inappropriate. Each evaluation was rated as having sensitivity analysis performed or not performed.

Data abstraction
Where articles included analysis of more than one intervention, data were extracted for each separate intervention. Data abstraction was performed by one reviewer (KD) with checking of key variables by a second reviewer (LS). The following types of variables were extracted: the characteristics of the target disease and patients, details of the intervention, nature of publication and study methodology, and estimated performance. Policy relevant variables, including funding status were separately ascertained (Table 1). These variables were chosen for their possible relationship to cost effectiveness, based on the author's knowledge of the literature and their experiences with priority setting exercises.

Data synthesis
Cost per LY/QALY/DALY estimates were reviewed and recalculated where necessary to ensure each referred to marginal costs and benefits. Estimates were standardised by translating values into June 2005 estimates using the health component of the CPI [3]. If a study reported a range for the cost-effectiveness results, the study was examined to determine if different estimates related to different interventions and/or distinct target populations. If this was the case, the cost-effectiveness ratio for each distinct population and/or intervention was extracted. However where such sub-groups were the result of post hoc analysis not consistent with delivery of the intervention a standardised figure across all groups was calculated using Australian population data (eg proportion male/female in target age group). If the range simply represented upper and lower limits from sensitivity analyses, a central estimate was used where reported or calculated as the mean if not.
In the event that a reference year was not reported for costs, we used the publication year minus two to reflect the usual delay in publishing original research. Categorisation of the type of intervention, type of patients and results was possible for all studies included in the review. The only sources of missing data were discount rate, time horizon and length of intervention benefit which were purely descriptive variables.

Analysis
Data were described using medians and interquartile ranges for continuous data and proportions for categorical data. The pattern of cost-effectiveness results across the 245 interventions was explored through a combination of descriptive and regression analyses. Ordinary least squares regression was undertaken to identify variables that might explain variation in the cost per LY/QALY/DALY estimates. Ordered logit regression was undertaken to iden-tify variables that might explain variation in the cost per LY/QALY/DALY group. All regressions adjusted for intracluster correlation present in the data because data on multiple interventions were drawn from many of the papers included in our review. We used the robust Huber/ White sandwich estimator to adjust population-average models for intra-cluster correlation, yielding robust standard errors suitable for calculating confidence intervals around estimated regression coefficients [4].
All potentially relevant intervention and publication characteristics listed in Table 2 were initially included in the regression and retained on the basis of their contribution to the regression as evaluated by t-and F-tests (enter p ≤ 0.05) for individual and joint significance, with care taken to ensure stability in the magnitude and direction of the beta coefficients when adding or dropping a potentially relevant variable. Collinearity between included variables and potentially relevant variables excluded from the regression was investigated using standard diagnostics and by methodically entering, removing and re-entering combinations of variables. Results were confirmed by examining outputs from backwards and forwards stepwise regression analyses as evaluated by the probability of F (enter p ≤ 0.05, remove p ≥ 0.10).

Trial flow
The Medline search lead to 912 results, of which 42 (4.6%) were identified as potentially relevant through screening titles and abstracts. An additional 11 papers or reports were identified through key author searches, 9 through reviewing bibliographies of identified articles and 52 through the website searches. A total of 114 full text documents were examined for inclusion in this review ( Figure 1 describes the exclusion process) with a total of 77 (68%) included.

Descriptive results
Of the 77 included documents, sufficient information was available to calculate cost per QALY, DALY or LY estimates  Government report 18 (7) Other peer-reviewed report 26 (11) Other non peer reviewed report 16 (7) Publication type General medicine 52 (21) Specialist medicine 85 (35) Health economics/policy/HTA/public health 108 (44)  for 245 interventions. Table 2 summarises the patient/disease characteristics, intervention details, methodology, quality and implications related to these 245 interventions.

Cost effectiveness
For studies reporting LYs, the median value was A$18,720 per LY gained and for those reporting QALYs or DALYs, the median value was A$17,830 per QALY/DALY. The median economic performance using QALYs/DALYs where available or LYs otherwise across all 245 interventions was A$18,100 per LY/QALY/DALY. Eleven interventions (5%) were more costly and less effective than their comparators and were therefore dominated, 21 interventions (8%) were both more effective and cheaper than their comparator and thus dominant. Figure 2 illustrates the distribution of incremental cost per LY/QALY/DALY ratios. A large number of interventions (n = 91, 37%) reported ICERS that were less than A$10,000 per LY/ QALY/DALY, (including the 8% that were dominant). One hundred and forty-six interventions (60%) reported ICERs that were less than A$25,000 per LY/QALY/DALY. A further 41 interventions (17%) were reported with an incremental cost of greater than A$100,000 per LY/QALY/ DALY (including the 5% that were dominated). Table 3 presents the median, 25 th and 75 th percentile costeffectiveness ratio of each category and reports statistical significance. Statistically significantly higher median incremental cost-effectiveness ratios (ICERs) (performed worse) were found for interventions targeted at children/ youth compared to adults, for medical interventions compared with lifestyle interventions, vaccinations compared to all other modalities, evaluations where downstream cost impacts were not included. In relation to quality variables, the small number of evaluations that did not use an appropriate comparator and did not meet minimum standards of quality performed better. Evaluations based on strong quality evidence (strength of evidence) with regards to treatment effect were associated with similar median cost-effectiveness estimates as evaluations with limited quality evidence. Those that were associated with statistically significantly lower median ICERs, included the following: • non-medical interventions (allied health community, media, education) compared to medical (physician consult, pharmaceutical, in-patient, vaccinations), • treatment interventions compared to diagnosis/screening/prevention, • interventions where the individual was able to reduce their own risk of disease or injury, • interventions where the condition was cause by patients' own behaviour, and • interventions that were partially funded (some government subsidy but not to meet all clinical need) rather than fully or not funded all.
Figures 3, 4 and 5 illustrate results for the variables modality, objective and type of disease (DRG). Diagnostic tests were associated with higher cost-effectiveness ratios and greater variation than were screening, treatment and prevention. Because there were small numbers of interventions in some DRG groups, we have reported on the 6 DRG groups containing a sufficient number of interventions for meaningful between-group comparisons. The cost-effectiveness ratios varied across DRG groups with the 'alcohol and drug use' and 'metabolic disease' categories having relatively little variation around a particularly low median cost-effectiveness ratio, the 'mental disease/ disorder' group having the highest median cost-effectiveness ratio and the 'musculoskeletal' and 'infection groups' having the most variability. Examining modality; pharmaceuticals and vaccinations had higher and more varied cost-effectiveness ratios than other modalities, whilst allied health interventions and inpatient care had the lowest median cost-effectiveness ratios.
That said, the extent of variation in the data is such that there were examples of highly cost-effective and cost-ineffective care within most categories.
Description of study flow Figure 1 Description of study flow.

Exploring determinants of cost effectiveness
Linear regression analysis using the enter method was undertaken to identify variables from those listed in Table  2 that might explain variation in the cost per LY/QALY/ DALY estimates. The intra-cluster correlation coefficient for cost per LY/QALY/DALY (ICC = 0.332, 95%CI: 0.17, 0.49) suggested that some adjustment should be made for clustering by paper in this analysis. Table 4  Interpretation of the parameter estimates is straightforward. Pharmaceuticals (compared to non-pharmaceuticals) and interventions primarily benefiting persons aged between 25 and 65 years would generally have a lower cost per LY/QALY/DALY than an intervention benefiting older or younger age groups. The quality of evaluation also made a significant contribution to the regression such that a failure to conduct sensitivity analysis was associated with a lower cost per LY/QALY/DALY ratio. Interventions targeting persons able to reduce their own risk of death/ disease and interventions that are partially funded out of patient contributions would also generally have a higher cost per LY/QALY/DALY than otherwise. It is, however, important to note that the regression explains only a small proportion of the overall variance in cost per LY/QALY/ DALY group.
We also undertook an ordered logit regression to identify variables from those listed in Table 2     ysis. Table 5 summarises parameter estimates, model fit and individual significance of included variables from the Huber/White sandwich estimator. TARGET (general population = 1 versus specific = 0), DISEASE STAGE (limit disability after harm has occurred = 1 versus avert, slow or halt disease or injury = 0), CAUSED BY (patient's own behaviour contributed to condition = 1 verus not = 0) DOWNSTREAM (downstream costs/savings = 1 included versus not = 0), NOT FUNDED (not funded = 1 versus fully or partially funded = 0) and Q-OVERALL (adequate comparator, costs and sensitivity analysis = 1 versus not = 0) were significant predictors but explained just 8.1% of variance in economic performance as expressed in terms of cost per LY/QALY/DALY group. CAUSED BY was the most important independent variable based on the size of the beta coefficient (1.4, P < 0.001).

Discussion & conclusion
Through this study, data are now available on the economic performance, expressed in Australian costs, of a wide range of interventions that address different health problems, using alternative modalities and intervening at various stages in disease development. The identification of a large number of interventions (37%) reported at less than A$10,000 per LY/QALY/DALY (including 8% that were dominant), which is below any putative funding threshold is important in itself. It raises issues about the relationship between cost effectiveness and funding decisions and the appropriateness of current funding thresholds. These matters are explored elsewhere [5].
We identified some interesting findings by category, for example that interventions targeted at children were generally less cost-effective than those targeting adults. This is perhaps not surprisingly, especially in relation to chronic disease prevention where benefits are typically delayed at least into middle age. Similarly, 'population approaches' were not found to be more cost effective than more targeted approaches, which may reflect very large differences in effectiveness. It would be interesting to explore the especially good and especially poor performance of some classes of intervention; such as the poor performance of diagnostics and vaccinations or the favourable performance of allied health and lifestyle interventions and those addressing diabetes and drug/alcohol abuse. That said category averages should be interpreted with care due to the identified wide variation in cost effectiveness with no 'magic bullet' answers to resource allocation. In terms of policy decision it would be best to assess each potential intervention on its own merits rather than rely on broad generalisations [6][7][8][9][10].
We also note that this is the first review of publicly available Australian economic evaluations, which provides valuable information to guide policy and research, but also highlights the continued need for improvement in quality of economic evaluation and transparency. This type of exercise, summarising the cost effectiveness of different interventions and subgroups has been proposed as a useful priority setting task [11], with precedents in the United States [12,13]. This review, in summarising all the published Australian economic evaluations also provides a platform for investigating where evaluations have been targeted and what this says about implicit priorities. It also allows an exploration of the distribution of cost-effectiveness ratios relative to funding thresholds and an analysis of the quality of evaluations. From this work we can for instance map the areas subject to economic evaluation in Australia against the existing burden of disease, and  assess the scope of coverage of modalities and delivery settings to check for alignment of research priorities. In order to achieve system wide allocative efficiency in health care, information is required across a broad range of interventions, considering target diseases, age groups, disease stage, modality and delivery settings.
The limitations of this review include a reliance on publicly available evaluation reports. While it is possible that some studies were missed through our original search focus on Medline, a later search of the HEED (NHS Economic Evaluation Database, Cochrane Library) database using the same search terms identified no additional studies.
With regards to quality, this review has inherited the quality of the original work, which we have attempted to describe. Interestingly, the pattern of cost effectiveness of interventions where evaluations were based on limited non-RCT evidence did not differ from those based on stronger RCT evidence. There is no reason to presume that potential biases will systematically impact on cost-effectiveness results.
A significant limitation of this work is that the economic evaluation methods varied significantly between interventions thus impacting on the comparisons made. This is illustrated in identified differences in discounting, perspective, time horizons, choice of comparators and strength of underlying evidence. The strength of this work therefore lies in the rich description of existing evaluations. Ideally all outcome measures would be identical to assist with comparisons. However, we would contend that there is enough common ground between the outcome measures QALY, DALY and LY for cost-effectiveness ratios to be sensibly compared. Evaluations reporting cost per LYs gained may have generally focused on length of life because quality of life was not expected to vary greatly relative to the impact on mortality. Despite differences in the concept of 'health' underlying adjustments for morbidity using the QALY or the DALY, these do include both mortality and morbidity effects. However, we acknowledge that this is a potential source of error. We were limited in that study resources only permitted one person to perform data extraction of variables. This is unlikely to have lead to bias against single interventions or group of interventions, but may have involved a particular interpretation of variables extracted across all studies.
The list of interventions and associated cost-effectiveness ratios is reported [See Additional file 2](the authors would be pleased to provide a copy of the full database on request). However, the use of these cost-effectiveness results as a strict league table was not the intended purpose of this exercise; rather this work was intended as a broader information resource for research and policy. The review is not a complete priority setting tool as it does not include all potentially important interventions and in that context, methodological differences between studies that we have drawn on are important.

Relation to previous research
The cost effectiveness of Australian Pharmaceuticals has been previously reported in a review of PBAC (Pharmaceutical Benefits Advisory Council) decision making from 1991 to 1996 [1]. Twenty-six submissions were analysed with a median cost per LY of A$43,550 ($1998/1999) which is higher than the median estimate for pharmaceu-   ticals reported here of $A22,000 ($2005). The interventions were all drawn from submission by the pharmaceutical companies to the PBAC. Companies have a vested interest in these evaluations which are used both to inform whether a drug will be listed on the PBS (Pharmaceutical Benefits Schedule) and subsidized by government and the approved price. This creates an incentive to report a cost/LY just below the apparent funding threshold, which on the basis of funding decisions would seem to lie within the range of $A40,00 to $A70,000/LY or/ QALY [1]. Our sample of pharmaceuticals was also larger than the previous sample. However, it is also true that the cost-effectiveness profile will depend on the actual list of interventions included, with individual results also impacted by any 'agendas' of the researchers. For this reason our review was limited to reports in the public domain.
In the US a large scale review exercise was undertaken of 500 life saving interventions across the areas of health, transport and environment [14]. Tengs  A more recent review was conducted of cost-utility analyses in the United States (494 studies and 1433 cost-effectiveness ratios) [12,13,15]. The results of this review are also comparable with a median ICER of US$20,133 (with dollar estimates taken from studies covering 1976 to 2001 -unadjusted and non-standardised).
Published cost-effectiveness results may reflect the research interests or priorities of researchers or industry, the visibility of certain diseases, the strength of advocacy and industry backing rather than the health needs of society [13]. The results of this review identify implicit priorities. Knowing where economic evaluations have been focused in the past, it would be useful to determine where cost-effectiveness efforts in Australia are likely to yield the greatest benefit. A more coordinated approach to health economic evaluation may lead to a better coverage of the priority health areas and important interventions and could also be used to encourage greater consistency in results, aiding comparability.

Implications for further research
This exercise should be repeated for other countries, as findings are likely to vary according to the delivery arrangements and costing structure of different health systems. There is the opportunity using datasets such as this for a more in depth analysis of the quality of economic evaluation, which could be used to inform evaluator training and guide methodological advances. It would also be possible to compare the quality of evaluations over time to assess improvements.
Another application of this work is to explore the extent to which economic evaluation informs policy making. Our recent extension to this work [5] addresses some of the issues concerning the funding of interventions including an exploration of the characteristics of interventions that are related to a higher chance of funding at particular costeffectiveness thresholds. This provides evidence of the apparent success of current priority setting arrangements in guiding the health sector towards a more efficient allocation of resources across modalities and across diseasestage [16].