### Quantile regression

QR was used to explore differences across the expenditure/outcome relationship as between 151 local commissioners of NHS health care in England, focussing on understanding differences as between those with low and high mortality rates in different clinical areas (detail on the included periods and variables considered is presented in "Data" section).

In classical linear regression, the estimated covariate effects are the same across the data distribution. QR provides a more complete picture of covariate effects by estimating a family of conditional quantile functions [12]. It estimates a point at any part of the distribution, without splitting the sample into different groups. Different quantiles are obtained by minimising a sum of asymmetrically weighted absolute residuals, with the median (0.5 quantile or 50th percentile) obtained by minimising the unweighted absolute value. Other quantiles use weights, for example, we can estimate the 0.75 quantile (75th percentile) which leaves ¾ of observations below and ¼ above the quantile. The point estimate of this conditional quantile can be obtained by minimising the sum of absolute residuals but penalising underpredictions more than overpredictions, where the weight assigned to underpredictions (0.75) represents the quantile. Classical linear regression models have a unique slope coefficient, which in this case represents the percentage impact on mortality of marginal changes in expenditure, hereafter referred to as *mortality elasticity*. QR estimates a slope coefficient, and therefore a different mortality elasticity estimate, for each quantile by introducing different weights at different points of the outcome distribution.

For comparability with the estimates published by Lomas et al. [4], we used the same specification for the QR model as their linear regression model: an outcome function linking mortality and expenditure in a clinical area, including covariates correlated with expenditure and mortality, such as different health needs due to demographic composition and socioeconomic factors. The estimator is:

$${Q}_{h}\left({\tau }_{i}|{n}_{ij}, {x}_{ij}\right)={\alpha }_{j}\left({\tau }_{i}\right)+{\beta }_{j}\left({\tau }_{i}\right){x}_{ij}+{\gamma }_{j}\left({\tau }_{i}\right){n}_{ij}+{w}_{ij}$$

(1)

where \({Q}_{h}\left({\tau }_{i}|{n}_{ij}, {x}_{ij}\right)\) is the \(\tau\)^{th} quantile on mortality rate *h* for each commissioner* i* in each clinical area *j*, conditional on health care need *n*_{ij} and local health expenditure per head *x*_{ij}. The random error *w*_{ij} is allowed to be correlated with *x*_{ij} to consider endogeneity of health expenditure. The effect of expenditure on mortality is measured by \(\beta\), which can be assessed at any point \(\tau\) of the mortality distribution in the range (0, 1). Edney et al*.* have estimated a similar QR model of mortality reduction for Australia [13].

There are 151 quantiles for each clinical area equation, each of which, \(\tau\)_{i}, represents a different local commissioner ranked according to Standardised Years of Life Lost Rate (SYLLR). Within each clinical area, the first quantile is therefore 1/151, for the commissioner with the largest SYLLR, and 151/151 = 1 for the one with the smallest SYLLR.

QR produces more robust estimations in the presence of non-normally distributed errors and outliers and preserves the conditional quantiles in transformations of the variables such as the logarithmic.

Our method accounts for the potential endogeneity of expenditure by commissioner which may result, for example, from poorer health outcome areas getting more funding; health expenditures per person are adjusted according to population needs measured by the “unified weighted population index” [14]. Our Instrumental Variables (IVs) are socioeconomic variables justifiable on an empirical basis and related to IVs proposed on theoretical grounds, for example as part of the funding rule used to allocate health budgets across local authorities [15, 16]. The exogeneity and validation tests of IVs were performed using Generalized Method of Moments (GMM) estimation, the results of which are in given in Additional file 2 Supplementary Material: Tables S2 to S7. However, GMM estimates are sensitive to the number of near redundant instruments which produce a finite sample bias towards underestimation of the mortality elasticity in a similar sample of England NHS local commissioners [15]. We used GMM estimation in four models, using two IVs in two models, and three and four IVs for the other two models, so the test of overidentifying restrictions and GMM estimation of the elasticity is unlikely to be affected by redundant instruments.

Where the exogeneity hypothesis was not rejected, we estimated the conditional mean of the mortality distribution by Ordinary Least Squares (OLS) and the mortality elasticity at different quantiles using a simple QR model. Where there is evidence of endogeneity, the QR model accounts for this. The method is a two-step generated regressors approach, proposed in the context of QR for recursive structural equation models by Ma and Koenker [17] and recently extended by Chen et al. [18]. To compare for robustness, we also tested in one PBC the control function approach, a similar two-stage method proposed by Chernozhukov et al. [19], resulting in almost identical mortality elasticities at different quantiles for Cancer.

The first of the two stages is the IV two stage least squares (2SLS) estimator and the second is a system of QR models where the joint variance and covariance of the system are estimated by bootstrap, accounting for adjustment of the measurement error in generated regressors, and improving the robustness of inference in small samples.

### Data envelopment analysis

DEA is a linear programming-based method that establishes a best-practice production frontier in which each production unit’s efficiency can be judged against the performance of similar units [20]. In this case, the production units are local commissioners. DEA does not assume a specific functional form for the production function that underlies the frontier and allows analysis of multiple inputs and outputs. It therefore allows us to include more than one health outcome in addition to or replacing mortality in analysing the relationship between expenditure and health outcomes. The estimate of the relative efficiency of each commissioner is in effect the potential that they have to change expenditure in a clinical area without affecting health outcomes. ‘Input oriented’ DEA allows us to observe how much the inputs (in our case healthcare expenditures) of less efficient commissioners could in principle be decreased without affecting outcomes. The opportunity cost of funding a new health technology in terms of health outcomes will be lower if it is possible to release funds by improving the efficiency with which existing services are provided.

DEA constructs a measure of technical efficiency based on the distance between composite inputs and composite outputs. It identifies the most efficient commissioners, those that achieve the highest level of health outcomes at a given expenditure, which form the production frontier. An efficiency score is obtained for each commissioner, where full efficiency = 1 and < 1 means it operates at less than best practice efficiency, below the frontier.

Figure 1 illustrates efficiency scores and the possible decrease in expenditure that a commissioner could achieve without affecting outcomes in a particular clinical area. A and B represent efficient commissioners who would reduce health outcomes if they spent less; C and D represent inefficient commissioners who could reorganise their production of health to achieve the same outcomes with lower expenditure, that is without incurring opportunity costs. The expenditure reduction by commissioner D (∆*) would improve efficiency without affecting health outcomes. The ratio of ∆* to Ω** shows the proportion of current expenditure that could be reduced without affecting outcomes.

If there are economies of scale in health production, then the size of the unit will influence efficiency. If present, this needs to be adjusted for, in order to focus on technical efficiency and inefficiency. We used Simar and Wilson's returns-to-scale test for input-oriented DEA to guide the choice of model, which tests a constant returns to scale (CRS) assumption against the alternative of variable returns to scale (VRS), using the ratio of means [21] and the mean of ratios less one [22]. The Kruskal–Wallis rank test examined frontier shifts between CRS and VRS.

DEA efficiency scores are sensitive to outliers. The Bogetoft and Otto test [23] was applied to identify outliers. Commissioners with a test statistic below 0.975 were considered outliers and excluded from the estimation.

As noted above, DEA may include environmental variables (EVs), which are exogenous factors that impact outputs (health outcomes) but are not under the control of the commissioners. Commonly used methods to consider EVs in DEA have two problems: prior assumptions about the direction of the effects are needed and estimated efficiency scores cannot be directly linked to the efficiency frontier. We avoid these following the Fried et al. three-stage procedure [24]. First, DEA is applied to health outcomes and inputs only, to identify outliers and obtain initial measures of commissioners’ performance. Secondly, stochastic frontier analysis (SFA) [20] is used to regress first stage performance measures against selected EVs. This provides, for each input, a three-way decomposition of performance variation into that attributable to EV effects, inefficiency and statistical noise. Thirdly, inputs are adjusted to account for the impact of the EVs effects and the statistical noise uncovered in the second stage, and DEA is used to re-evaluate commissioners’ efficiency.

We apply three exclusion criteria for outcome variables: more than 20% of data missing; intermediate rather than final outcome; and the outcome is less important for the estimation of the efficiency scores, according to a Kolmogorov–Smirnov test, than a second outcome with which it is highly correlated (R > = 0.5).

### Comparing DEA and QR

DEA and QR explore health system efficiency from different perspectives. QR estimates parametrically the mortality elasticity assuming an underlying production function, while DEA estimates non-parametric efficiency scores for all units at or outside the production function. We used two comparison methods to assess the consistency of QR and DEA findings and what they tell us in combination about the efficiency of expenditure at the margin: Spearman rank correlation to represent the sign of the pairwise correlation between the efficiency score and the absolute value of the mortality elasticity; and a t-test comparing the mean mortality elasticities between efficient and inefficient commissioners.