The remarkable growth, in recent decades, of expenditure on healthcare in many countries has directed attention to the analysis of efficiency, the performance of public sectors and the need to provide policy-makers with evidence-based knowledge on which to base informed decisions [5, 48]. We reviewed studies that measured technical efficiency, which is defined by Farrell as producing the maximum amount of output from a specific amount of input or producing a given output from minimum input quantities . We assessed relevant studies conducted in public hospitals in the Gulf, Iran and Turkey. Despite dissimilarities between GCC and Iran and Turkey, there are similarities as well in the culture and the health system. These similarities give the latter two countries justifications to be included in the review and such an inclusion gives the opportunity to share the knowledge across countries in the similar settings for future empirical analyses of the public health systems.
We assessed the impact of model characteristics on the reported efficiency scores using meta-analysis based on 25 extracted observations from 22 different studies. Most of these studies were found in six high-quality databases of scientific publications, but this did not yield studies of GCC countries. We had to search the grey literature for Gulf-focused papers, which were not found in the indexed scientific databases because efficiency analysis is a new approach of research in the Gulf region. The studies found as published literature and those sourced as grey literature were mutually exclusive. To the best of our knowledge, this is the first attempt by researchers to conduct a systematic review and quantify the effect of model specifications on hospital efficiency scores in the GCC countries and comparable nations.
We found that DEA was the dominant method by which public hospital efficiency was assessed in the reviewed studies: just three studies applied the SFA method, all conducted in Turkey [41,42,43]. In the Gulf region and in Iran, efficiency was exclusively measured via DEA and other systematic reviews have found the same method to be common internationally [12, 25]. The use of DEA is well justified by its capability to handle multiple inputs and outputs in different units, and also its functional flexibility in practical application [10, 49].
The reviewed studies originating from Iran and Turkey primarily used the technology orientation of input, whereby output was fixed, and the scholars explored proportional reduction in the input. Such an approach is very practical, since hospital managers and policymakers have more control over inputs than they have over outputs, as shown in previous research [50, 51]. In contrast, two of the four studies arising from Gulf countries applied an output orientation model [45, 47], while the remaining two studies employed both input and output orientation model [28, 46]. Thus, the health-related policy objective within the GCC was to retain the inputs and explore proportional expansion in output. This approach complements the target of Gulf governments, which is to enhance the provision of national and domestic health services to meet the growing demand for healthcare. In such countries, this is the primary goal of health care development strategy plans [2, 52]. Furthermore, this approach was appropriate because reduction of the existing health resources is not the priority of Gulf nations’ health strategies, at least in recent years [2, 45].
Our meta-analysis showed no significant differences between the estimated efficiency in both technology orientations of efficiency analysis. Due to the scarcity of efficiency estimates and related knowledge in the Gulf region, we encourage further investigation and more research in this area. Ideally such study should be undertaken using a variety of technology orientations, considering the goals and functions of the public hospitals.
The studies we reviewed often had limitations, which included aggregation of inputs, mainly in the labour category  and aggregation of costs of different types of capital and labour prices . Outputs mainly focused on healthcare activities, ignoring health outcomes and offering no adjustment for differences in case mix or quality of care across hospitals. This might be the reason for high efficiency scores in some hospitals, despite a low quality of care . Further limitations were heterogeneity in sample (number and size of hospitals in each study; activities of the hospitals, etc.), which might affect efficiency scores since in general, the studies did not make appropriate adjustments in light of such heterogeneity. The studies often failed to describe the causes of inefficiency, did not try to evaluate the misspecification in efficiency models and also lacked internal validity of efficiency findings, which could skew the policy implications. Moreover, like Varabyova in 2016, we found that the quality assessment of the studies revealed frequent failure to report production theory and the absence of justification/rationalisation of model assumption choices, reporting study limitations and the presence of outliers . These limitations raised many issues of accuracy, reliability and generalizability of these studies. We suggest that researchers concentrate on the characteristics of the efficiency models and related methodological issues, and encourage transparent reporting of the relevant findings.
We observed, as other authors have done, that scarcity of data underlies many of these limitations. Most studies included in this review selected their variables according to the available secondary data sources, rather than collecting new and more relevant data to construct the best possible measure of performance [51, 53]. It has been argued (separately) by Afzali  and Hollingsworth  that many hospital databases suffer from insufficient data regarding a broad range of hospital functions and quality of care, including preventive care, health promotion and staff development activities. The GCC Health report 2015 confirms that the same data discrepancies occur in the GCC . Thus, improving hospitals’ databases, through quality data collection and processing techniques, the inclusion of data from different health provision levels, and the capture of valid data that reflects the demand, quality of care and pattern of activities around health care are critical steps towards better quality hospital efficiency studies [17, 53]. Such improvements would enhance further efficiency research by indicating the weaknesses in healthcare production process, and as a result would guide the policy-decision makers to potential reforms in the region.
The findings from our meta-analysis showed no significant differences in the estimated efficiency scores, irrespective of the analysis methods employed, i.e. SFA and DEA. Among the Turkish papers, three studies applied SFA methods and five used DEA. Although SFA reported higher efficiency scores, the difference was not statistically significant and such finding was along the same lines as most previous reviews [12, 50].
Technically, in the DEA approach the entire distance from a decision-making Unit (DMU) to the efficient frontier measures the inefficiency, while in SFA this distance includes both inefficiency and estimation error and consequently, the inefficiency shows a higher value in DEA than in SFA even if we use the same data . Although the choice of DEA or SFA may have a substantial impact on the results, there is no agreement in the literature as to which of these methods reflects the best practice [10, 25]. However, the choice of nonparametric and/or parametric methods in any analysis relies on the specification of the production function, the assumptions about the distribution of the error components, production theory orientations and the perspective of selecting returns to scale assumptions [23, 25]. Our analysis in this study found that DEA studies that applied VRS reported higher efficiency scores, though not to a significant extent, compared with those which used CRS assumptions, since the DEA under VRS assumption tightly enveloped the data and more hospitals were placed on the frontier [10, 25].
Our analysis found a negative relationship between sample size and the estimated efficiency scores, as observed in other studies [36, 40]. Similar findings have been reported in previous literature reviews, which argued that inflated efficiency scores may occur with small sample size due to sparsity problems, meaning that a hospital can be considered efficient just because there is no comparator within the sample [12, 16, 25]. Moreover, overestimates of efficiency scores on DEA can occur if the number of hospitals is small relative to the number of input and output variables . Several empirical analyses have had a small sample size in comparison with the number of the variables used and reported high-efficiency scores [27, 31, 35, 39, 40]. To remedy such problems, Hollingsworth suggested that the number of units used in efficiency assessment should be at least three times the combined counts of inputs and outputs altogether . Apparently, further development of the efficiency models to meet the complexity of production in the public hospitals and demonstration of the efficiency findings is required.
Although we conducted a comprehensive literature search across several databases in our current review, we might have missed some relevant studies. To overcome this, we hand-searched the references and the grey literature to identify more studies. Our findings regarding SFA could be better justified if more than three studies had been found for critical analysis in this review. The study site chosen for our review (the Gulf region), however, may generate strong interest among policy-makers, stakeholders, researchers and academics. Another interesting point arising from our review of studies of Gulf Region is that the output-orientation was mostly preferred to the input-orientation, while studies originating in other countries commonly used the input-orientation.