As a starting point for discussion, resource allocation relates to how basic measurements are done. With a fixed budget, if all one desires is a prioritized rank order of investments, then ordinal scales (placing choices in the desired order) or interval scales (such as Fahrenheit and Celsius temperatures) suffice. Many forms of multi-criteria decision analysis produce interval scales. In this setting, investments are made until the fixed budget is exhausted.
A more refined approach might allocate a fixed investment budget across scalable investments by choosing how large or small each potential project might be to maximize the overall value of the investments. Assigning appropriate budget shares to each potential investment option requires that the multi-attribute model provide a ratio scale, not just an interval scale. Finally, if one wishes to have a clear decision about whether or not to invest—analogous to the outcome of cost–benefit analysis—then benefits and costs must be measured in the same monetary units (e.g., dollars, euros, yuan or rupees). Since multi-criteria models do not automatically convert benefit measures to monetary units, and often only determine ordinal priority ranks, they do not yet provide generalizable resource allocation rules.
In some settings, resource allocation does not enter the picture. For example, an individual health care patient choosing among insured treatment alternatives need not consider resource costs, but rather selects among the options provided by the patient’s health plan. But in most real decisions—such as insurance coverage decisions and health technology assessment—budget constraints invariably enter the picture. In our own recent work focused on systems analysis for prioritizing new vaccine research and development, the issue of choosing investments within a fixed budget—or determining the level of such a budget—was not a design factor [15, 20, 21]. But in some settings, more guidance about resource allocation is needed.
The recent ISPOR task force assessing multi-criteria decision analysis models discussed this challenge, summarizing previous work on the topic and offering three alternatives (none of which they deemed wholly satisfactory), and urged further research on the issue [19]. The approaches they identified from that literature survey included the following:
Directly include costs
One approach to solving this problem directly includes cost as an attribute in the multi-criteria analysis—lower costs being better—with a user-assigned weight within the model. This has the equivalent effect of asking the decision maker (when establishing the weights) to evaluate willingness to pay for the benefits. But as the ISPOR task force report notes “stakeholders do not have the knowledge to estimate the benefits that would have to be forgone to fund an alternative. Instead, this would require the forgone alternatives to be identified and evaluated using the same” multi-criteria framework.
Score comparable interventions
The scoring approach seeks to find existing interventions that might be eliminated to free-up funds for the new investment, hence identifying the “opportunity cost” of the new intervention. The scores for the candidates are generated by the multi-criteria model providing a comparative view of their performances. But this approach contains circular logic: how does one know in which programs to disinvest until all programs have been evaluated in the same multi-criteria metric? Those interventions that may seem available for elimination in cost-effectiveness analysis may look very good in a multi-criteria model, and vice versa. So the selection of a group of comparable interventions is an incomplete and defective approach. Rankings of investment priorities can readily shift importantly when other criteria—such as public fear during a disease outbreak—enter the analysis beyond cost-effectiveness [10].
Modified cost–benefit calculations
This approach omits “cost” in the multi-criteria model, evaluates each of the various options, and then calculates a cost–benefit ratio, similar to an incremental cost-effectiveness ratio. The only difference is that here, the “benefit” metric has multiple dimensions—unlike the unidimensional QALY, DALY, or similar health benefit measure in cost-effectiveness analysis. Multi-attribute models create an index specific to each decision makers’ preferences, and thus such indexes are not comparable with one another. But in this situation, unless all multi-attribute evaluations use the same ratio scale measurement, comparing cost–benefit ratios is impossible. Yet, forcing all measurements into a single multi-attribute framework defeats its very purpose—that is, allowing different stakeholders the ability to specify their own preference functions.
In a similar fashion, one recent analysis recommends calculating the ratio of multi-criteria value scores to cost [22]. This approach then sorts the available choices from the most favorable to the least favorable, and then proceeds until the investment budget is exhausted. Unfortunately, this approach does not provide advice on the proper investment budget size—it is exogenous in their analysis. Nor does it allow for the possibility that at least some of the possible investments are scalable, which would introduce further investment options beyond those originally considered. This rule is akin to calculating QALYs per cost (the inverse of the usual metric) and investing in the most favorable until the budget is exhausted. It provides neither a cutoff rule—as has become common in cost-effectiveness analysis—nor a mechanism for budget setting mechanism; so while it places organizations on the efficient frontier, it does not identify the best part of that frontier. Economists call this technical or “X-efficiency” but it is an incomplete measure of overall efficiency, since it ignores “allocative efficiency” [23].
In real world settings, budgets do not ultimately descend from the top: somebody has to work through and determine the desired level of investment. Thus something further is needed to guide such decisions. To make further headway, we need an approach to guiding either cutoff values or a mechanism for shaping an optimal budget. The two tasks cannot be done independently—one implies the other.
League tables
One could also construct a league table for multi-attribute models of various interventions to provide guidance, just as early cost-effectiveness analysis used league tables to guide resource allocation before people became comfortable with choosing a cutoff value. Earlier use of league tables for cost-effectiveness contains a strong assumption, namely that previous decisions about health care interventions were made with implicit cost-effectiveness tradeoffs in mind. But if those league tables also included (even informally) the value of other attributes, then they could overstate the willingness to pay for a QALY. This approach is logically equivalent to one of the approaches evaluated by the ISPOR task force and it contains the same defect. One does not know ex ante which technologies appropriately belong in the league table and which are simply out of bounds.
Willingness to pay and accept
As a proxy for resource allocation, one could also simply survey the measures of willingness to pay (WTP) and willingness to accept (WTA), as is common, for example, in environmental policy [24]. However, behavioral scientists cast serious doubt on the validity of such approaches, arguing that the responses may reflect attitudes, but do not represent true willingness to pay [25]. A prominent concern is that of framing, where responses depend on the way the question was posed. The distinction often hinges on WTP (as in “how much would you be willing to pay to avoid unpleasant situation X”) versus WTA (as in “how much would you need to be paid before you would accept unpleasant situation X”). Answers differ greatly in these issues depending on the framing: whether or not the object is currently owned, and is available or not.
A review comparing numerous WTA and WTP studies on the same economic area (e.g., health, environmental) concluded that WTA values regularly exceed WTP values, the gap highest for non-market goods. The authors of this summary concluded that the less the good is like an ordinary market good—that is, it cannot be readily be bought and sold—the higher the ratio [26]. They reported a typical WTA/WTP ratio of 7.2 for analyses carried out in a number of different subject areas (including health, environmental, water resources and others). A more recent review found a WTA/WTP ratio of 5.1 for goods involving health and safety [27]. These results appear to show that WTA studies—such as those involving wage premiums for risky occupations—severely overstate the more desirable measure of willingness to pay. If so, then relying on these WTA measures instead of agreed-upon cost per QALY measures would be inappropriate.
Since “health” is perhaps the quintessential non-market good, one might expect the WTA/WTA ratios for the value of life and life years to possibly be significantly higher than for typical market goods. However, some proponents of the use of labor force studies (wage differentials for risky occupations) to measure value of life argue that the gap is not nearly so large as this literature suggests, if properly interpreted, thus seeking to restore confidence in the large value of life measures found in the health and safety literature [28].