Comparing the monetary value of a quality-adjusted life year from the payment card and the open-ended format

Background The payment card (PC) format and the open-ended (OE) format are common methods in eliciting willingness-to-pay (WTP) of one additional quality-adjusted life year (QALY). The aim of this research is to compare these two formats in eliciting the monetary value of a QALY. Methods A contingent valuation survey was carried out using a pre-designed questionnaire with various hypothetical scenarios. The difference between the PC and the OE formats was evaluated by a two-sample equality test. Furthermore, generalized linear models were carried out to control observed heterogeneity and to test theoretical validity. Results In total, 461 individuals were involved, among whom 235 (51%) answered the PC question, while 226 (49%) answered the OE question. Excluding zero response, the mean WTP values of these two formats for different scenarios varied dramatically, which was from 13,278 to 280,177 RMB for the PC, 18,119 to 620,913 RMB for the OE. The OE format tended to elicit lower values for less serious condition and higher values for more serious condition. However, equality test of mean and median demonstrated insignificant difference of these two formats for all scenarios. For both OE and PC format, most variables were found to have significant effect on the value of WTP/QALY. Moreover, joint estimation indicated a statistically significant positive effect on the OE results. Further analysis demonstrated that the imbalanced zero response distribution caused the main difference of these two formats. Conclusions This research indicated insignificantly different WTP/QALY estimates of the PC format and OE format with the grouped data whereas significantly higher estimates of the OE format from the pooled data. These two formats were found to be valid. More research about the difference and the validity of various WTP eliciting methods would be recommended for a robust estimation of WTP/QALY. Supplementary Information The online version contains supplementary material available at 10.1186/s12962-021-00298-0.

cost, indicating that whether a medicine is worthy of its cost depends on whether the amount of health-related outcomes it produces is larger than the health outcomes that could have been generated if some other medicine got funded [3]. Nevertheless, demand-side methods are in line with the method taken in other public sectors as well as a welfarist approach, where the monetary value of one additional QALY is estimated as willingness-to-pay per QALY (WTP/Q) by contingent valuation (CV) surveys. It is believed that WTP/Q can help improve efficiency in the margin within the healthcare sector as well as between sectors [4]. CV is usually used to elicit monetary values of a nonmarket good or service [5] by requesting participants to state their willingness-to-pay (WTP) for obtaining a good, in this context, for QALY (always a small amount). In the last decade, there are numerous studies estimating WTP/Q [6][7][8][9][10][11][12][13]. Typically, individuals have been asked about their WTP for health gains for which utility values were measured by EQ-5D population tariffs, Time-Trade-Off, Standard Gamble or Visual Analogue Scale.
However, great disparities exist in the type of health gain, respondents' characteristics, and survey methodology-all of which may influence the perceived estimates of WTP/Q. Ryen et al. [14] included 24 studies and indicated that the WTP/Q value is significantly higher if the QALY gain comes from life extension rather than quality of life improvements. By comparing 2 similar surveys, Bobinac and colleagues [8] stated that WTP/Q is higher when the health gain in the survey scenario is uncertain. However, the impact of different CV questionnaire format has been barely investigated.
CV questionnaire format denotes the approach by which the respondent is required to provide their WTP, of which four classical techniques have been in use: iterative bidding, dichotomous choice, open-ended (OE) and payment card (PC) [15]. In this research, we focus on the latter two techniques, the OE and the PC format.
The PC technique was proposed by Mitchell [16] and first used in the general economics literature by Jones-Lee et al. [17]. Respondents were given a specific range of monetary values and asked to select the maximum value they would be willing to pay for a particular benefit. On account of the good performance of imitating real life by letting respondents ponder their WTP, the PC has become a prevalent method of eliciting WTP in health economics. The OE elicitation technique directly asks the respondent the maximum they would be willing to pay in a hypothetical scenario. As respondents are prone to anchoring on proposed values when the elicitation technique suggests the values, the OE method can lead to a more precise and independent WTP value than other elicitation techniques, as it does not suggest an answer [18]. It was further verified that the OE format is an effective technique if the final decision depends on a quantile instead of the mean [19].
There are several reasons why the PC and the OE method are chosen for this research. First of all, these two methods have been used broadly in estimating WTP per QALY [7][8][9][10]. Moreover, the advantages of using the PC and the OE method were that they were easier to understand and they required a short time for interviews, which is really important considering the respondent burden is a major concern due to the complexity of hypothetical scenario in estimating WTP/Q.
Given the popularity of the PC and the OE in health economics, more specifically, in estimating the monetary value of QALY, a plausible development is a direct comparison of these two formats. Although there is no research comparing these two methods in estimating WTP/QALY, studies have examined the discrepancies of eliciting methods in other fields. A general finding is that for health-related goods, the OE format causes lower WTP values [20,21]. However, for environmental goods [22] or an ambulance helicopter service [23], relatively equal values were reported.
The aim of this research is straightforward, taking focus on the comparison of the PC and the OE formats. First, we examined the difference of WTP/QALY estimates from these two methods. Furthermore, we investigated the theoretical validity of each method to determine which method elicits more valid monetary value of QALY.

Study design and sample
We conducted a CV survey on general Chinese population between June 1st, 2019 and August 10th, 2019. A relatively low response rate was observed in the pilot study of the probability sample survey. Hence, quota sampling was used in the final survey with quotas based on sex, age, and income. First, study participants were recruited in-person by trained interviewers, then we interviewed those who satisfied the quotas. A questionnaire that measures maximum WTP per QALY for various hypothetical scenarios was used in this research. This survey was carried out with trained interviewers through telephone (a mobile app "WeChat"). Five different health statuses were defined using five-level EuroQol fivedimensional questionnaire (EQ-5D-5 L) descriptions [24,25], including three treatment settings and two end-oflife scenarios. More details will be discussed in the next section. All subjects were asked for their full consent to participate in the study and no financial incentives were offered.  Table 1). For treatment scenarios, a hypothetical scenario with description of EQ-5D-5 L (the health states mentioned in Table 1) was explained to participants. Without any treatment, they would live with the described health state for XX months. After XX months, they would fully recover. For each hypothetical health state, the WTP value was measured by the respondents' willingness to purchase the treatment.
We also specified the following conditions to each respondent to clarify the assumed situation; (a) the treatment was not reimbursed by public health insurance, the full amount had to be paid beforehand; (b) loss of income due to the illness need not be considered (it is compensated by social security.); and (c) payment for the treatment will influence the respondents' household.
"Terminal illness scenario" reflected the assumption that participants suffered a terminal disease with 3 months in severe health state (EQ-5D-5 L description: 44,332). A newly developed treatment could prolong life expectancy by 12 months (0.2 QALY) or 23 months (0.4 QALY) in that severe health state. For "immediate death scenario", we assumed that because of fatal sickness, the respondents would die immediately. However, in this scenario we hypothesized that there was a treatment that could prolong life expectancy by 3 months (0.2 QALY) or 6 months (0.4 QALY) in health state 11,115. 1 The WTP payment was defined as the amount of outof-pocket expense to purchase an assumed intervention. Participants were asked if he or she would pay for the treatment. Those who replied "No" were then asked to give their reasons. If the answer was "yes", the participant Table 1 Scenarios of questionnaire a Since QALY = the period of life length (year) * utility of health state, the period was calculated as follows For treatment scenarios, the period (month) = QALY gain/(utility of health state after treatment − utility of health state before treatment) * 12 Health state after treatment is perfect health, hence, the period (month) = QALY gain/ (1 − utility of health state before treatment) * 12 For terminal illness and immediate death, the treatment can prolong life expectancy in assumed health state, which should result in 0.2 or 0.4 QALY gain. Hence, for terminal illness the period (month) = QALY gain/utility of health state*12 + 3. For immediate death, QALY gain/utility of health state * 12  USD 14,753). We sent the payment card to respondents before the survey started, and those who agreed to pay for the assumed intervention were asked to choose their maximum WTP from the payment card.

Data analysis
Previous studies have applied two different methods of converting the data on WTP and QALY gains into WTP per QALY estimates, namely aggregated method and disaggregated method. The aggregated approach calculates the ratio by dividing the mean of WTP by the mean of QALY, whereas the disaggregated method estimates WTP/QALY for individuals, and subsequently estimates the mean value of WTP/QALY, which was proved to be a more appropriate method as it takes account of heterogeneity in preferences as well as individual's marginal rate of substitution between health and money [26,27]. Hence, the disaggregated method was applied in this research.
Descriptive statistics (mean, SD, median, inter-quartile range, minimum, maximum) for the WTP values of the PC and the OE formats were computed. Zero response of each format were compared and excluded for further analysis. We compared the mean and the median WTP/ QALY obtained from the two elicitation methods of diverse scenarios using a two-sample equality test with bootstrapping.
Generalized linear models (GLMs) were carried out to control observed heterogeneity and test theoretical validity. In a broad sense, the theoretical validity of WTP/QALY estimates refers to whether the estimates concur with the underlying theory. The subsequent variables 2 were selected for regression analysis in conformity with previous research [11][12][13]: age, income, hypothetical health state, and QALY gain. Age was proven to be a significant factor of WTP/QALY in previous research [11], indicating that being younger led to a higher WTP/ QALY. Income is positively associated with WTP/QALY [12] and thus should be captured in the regression analysis. Furthermore, we also assumed that worse health state scenario [13] and smaller QALY gain should lead to a higher WTP/QALY [9]. For the base-case analysis, we included only positive WTP. For the further understanding of the difference of these two formats, we included all WTP responses, where zero WTP/QALY was converted into 1 RMB. In order to reduce the impact of outliers, the top 1% of values in both the OE and PC formats were trimmed for sensitive analysis. Moreover, we deleted all 18 samples which agreed to pay for intervention but did not give exact answers. Categorical variables were coded with dummy variables. We estimated GLMs with a loglink relationship. In order to choose an appropriate variance function for the GLMs, we performed modified Park test, which indicated gamma distribution. As for log link, this has the advantage of focusing on differences between groups of participants with respect to arithmetic rather than geometric means. Statistical analysis was performed with IBM SPSS version 23.0. and stata version 14.0. Table 2 displays the demographic characteristics of respondents. In total, 461 individuals were involved, among whom 235 (51%) answered the PC question, while 226 (49%) answered the OE question. 61% of participants had a college degree. Around 35% of respondents had income less than 3000 RMB per month. Almost 19% participants in this research proclaimed that they were having some health problems. However, for all the dimensions in EQ-5D-5 L, most respondents reported no problem. The mean utility score of respondents was 0.95. A small portion of respondents (5%) had experienced hospitalization during the year. We found no significant differences between elicitation methods for all variables except education (p = 0.001).

Comparing formats with unconditional analysis
The distribution of WTP/Q of the PC and the OE formats is displayed in Fig. 1. Furthermore, Table 3 presents descriptive statistics of WTP/QALY for the two elicitation methods. This research showed a small number of zero response, which is 12 (5.1%) for the PC, 34 (15.0%) for the OE. Detailed information about zero response can be found at Table 3. The range of median values for PC format for different questionnaires is 8000 to 258,000 RMB, whereas for OE is 7500 to 500,000 RMB. Figure 2 displays the ratio of accepted bids according to the elicitation method. These two crossing lines indicated that the OE format tended to elicit more extreme values, though the difference between two elicitation methods did not seem to be substantial.The results of equality tests of mean and median were presented in Table 4, Ye et al. Cost Eff Resour Alloc (2021) 19:45 which were, to some degree, consistent with the figure of the ratio of accepted bids. The general tendency was that for mild health state scenario, the PC yielded higher mean value, whereas for all other four health scenarios, the OE method was witnessed with much bigger mean WTP/QALY valued except terminal illness scenario with 0.4 QALY gain. Equality test of mean with bootstrap indicated insignificant difference of the means of these two techniques for all types of questionnaires. No differences were found in the median for these two formats in all five scenarios

Comparing formats with conditional analysis Separate estimations by elicitation format
For each elicitation method we looked for the determinants of WTP/QALY with GLMs (Table 5). For both OE and PC format, most variables were found to have significant effect on the value of WTP/QALY. The monetary value of QALY was proved to be significantly influenced by valuation scenarios in both models, while participants were prepared to pay more for more serious conditions, even though the difference of WTP/QALY values for terminal illness and base-case group (immediate death) was not significant in both formats. Furthermore, we confirmed that for both formats, smaller QALY gain led to a higher WTP/QALY. We found a positive effect of income on WTP/QALY for both formats-which argues for the validity of the stated-preference survey [24]. Age was assumed to be negatively related to participants' WTP: as respondents' age increase, their WTP/QALY decreased. However, for the PC technique, participants' age was not a statistically significant variable, whereas for the OE format, age was a statistically significant factor. The sensitivity analysis proved the robustness of these findings, which can be found in the Additional file 2.

Joint estimation over the two elicitation formats
We studied the impact of the elicitation technique on WTP over the whole sample by introducing dummy variables for the OE format (the PC format as the reference).  The GLM with whole sample including only positive WTP indicated statistically significant positive effect on the OE results, demonstrating that OE format has higher chance to elicit higher WTP/QALY than the PC format. Nevertheless, when we converted all zero WTP/Q into 1 RMB, the difference between these two formats became insignificant. We may conclude that the imbalanced zero response distribution cause the main difference of these two formats. Regarding the determinants of WTP, the joint estimation confirms previous results: a significant and negative effect of age and QALY gain, a significant and positive effect of income. WTP/QALY was also proved to be affected by valuation scenarios.

Discussion
We compared WTP/QALY estimates generated from the PC format and the OE format and found that the mean WTP values of these two formats varied dramatically for different scenarios and QALY gains. The OE format tended to elicit more extreme values, indicating that for  The theoretical validity can be examined by determining whether the results are consistent with theoretical constructs. Probing the theoretical validity is the most popular test of validity applied to stated-preference techniques mostly since it is comparatively easy to perform. The performance of both PC and OE format appears to be highly satisfactory, whereas the PC format failed to comply with the assumption that being younger leads to a higher WTP/QALY. In comparison with the PC format, the OE technique had a stronger association with most variables in the regression model. In theory, the PC question tends to cause range bias. In the OE form, only after a careful reflection can respondents answer WTP question [18], which might be a fundamental procedure in assessing the value of health.
We did not pool all the data and present overall mean WTP/QALY estimates for each format because of the variation of mean WTP/QALY for different scenarios with different QALY gain. Instead, detailed information of descriptive statistics of different types of questionnaires were reported for each format. It was found that the OE format was related with lower WTP/ QALY for less serious condition and higher values for more serious condition. Moreover, the results of GLM regressions demonstrated that the OE format tended to higher WTP/QALY, which is inconsistent from previous research in healthcare. By asking questions of women's WTP for a screen process, Donaldson et al. [20] concluded that the PC format was related with higher mean and median WTP, which was proved again by the study on colorectal cancer screening of Whynes and colleagues [21]. We may argue here that body screening as well as mild treatment scenario in this research could be considered as less serious condition, where the OE format tended to elicit lower values. This is the first study to compare WTP/QALY estimates generated from the PC and the OE formats. However, we have encountered certain practical limitations. First, only the theoretical validity of the two eliciting methods was performed; essential elements like external validity and reliability were not assessed in this study. Second, quota sampling instead of probability sampling was applied in this research. Hence, the participants used in this study may not be a perfect representation of the Chinese population. Due to the cognitive challenge of this type of survey, most studies of WTP include a sample with a higher education compared to the general population [28,29]. In this study, those with higher levels of education were over-represented. However, we found that education level had no