Neural networks and hospital length of stay: an application to support healthcare management with national benchmarks and thresholds

Background The problem of correct inpatient scheduling is extremely significant for healthcare management. Extended length of stay can have negative effects on the supply of healthcare treatments, reducing patient accessibility and creating missed opportunities to increase hospital revenues by means of other treatments and additional hospitalizations. Methods Adopting available national reference values and focusing on a Department of Internal and Emergency Medicine located in the North-West of Italy, this work assesses prediction models of hospitalizations with length of stay longer than the selected benchmarks and thresholds. The prediction models investigated in this case study are based on Artificial Neural Networks and examine risk factors for prolonged hospitalizations in 2018. With respect current alternative approaches (e.g., logistic models), Artificial Neural Networks give the opportunity to identify whether the model will maximize specificity or sensitivity. Results Our sample includes administrative data extracted from the hospital database, collecting information on more than 16,000 hospitalizations between January 2018 and December 2019. Considering the overall department in 2018, 40% of the hospitalizations lasted more than the national average, and almost 3.74% were outliers (i.e., they lasted more than the threshold). According to our results, the adoption of the prediction models in 2019 could reduce the average length of stay by up to 2 days, guaranteeing more than 2000 additional hospitalizations in a year. Conclusions The proposed models might represent an effective tool for administrators and medical professionals to predict the outcome of hospital admission and design interventions to improve hospital efficiency and effectiveness.

purposes are not contradictory, since shorter hospital stays have obvious economic effects but, when associated with a rigorous clinical pathway, they also have huge health benefits, e.g., limiting the risks of hospital infection [3], thrombosis [4], and reduced physical autonomy [5]. Moreover, inefficiency invariably causes reduced accessibility: overcrowding of emergency departments and medical units is an alarming cause of delayed access to life saving services. Smooth in-hospital flows require a complex mix of conditions that have to come together in the pursuit of a shared goal. These conditions include: clinical choices, logistics, information technology and quality of health records, connections between units and services, and also personal motivation and team working. At the basis of any effort to change, however, there is the ability to select measurable indicators, together with tools for predicting the type of hospital stay and its possible outcomes, and this is precisely the purpose of our work.

Literature review: length of stay and prediction models
Length of stay (LOS) is one of the most widely used and easily available efficiency indicators [6]. This indicator assesses the speed of the clinical pathway and has the advantage of being uniformly measured and quite comparable. In the last two decades, several prediction models have been developed by scholars using consolidated techniques (e.g., Markov models, Multistage models, Discrete-Event models) to study LOS and the impact of selected covariates on this outcome (e.g., sex, age, admission method, diagnosis, severity of illness, hospital characteristics). As for the observations under investigation, the literature is quite heterogeneous, focusing on specific medical units or departments, as well as single hospitals or the whole healthcare system. Some authors adopt a continuous-time hidden Markov model with discrete states to model inpatient behavior, representing the flow of patients around departments of geriatric medicine and investigating the effect of two selected covariates on occupancy time, i.e., age and sex [7]. Other authors propose a Gamma mixture risk-adjusted model to analyze maternity LOS within obstetrical Diagnosis Related Groups (DRGs) and connected determinants, i.e., patients' demographic characteristics, health provision factors, and other clinical and hospital factors [8]. In both cases, the determination of pertinent factors would help hospital administrators and clinicians to efficiently manage LOS and expenditure. Taking a Texas Medical Center in Houston into account, Kapadia and colleagues model the flow of patients in a pediatric intensive care unit using a discrete Markovian process [9]. The aim of their work is to represent the flow of patients in relation to their illness (measured through the Pediatric RISk of Mortality, PRISM) in order to minimize total costs associated with hospitalization. Note that the PRISM score is based on seven physiological and seven laboratory variables, each reflecting an organ system dysfunction, which are combined together into a score that represents a proxy for the severity of the illness and the risk of mortality in the current hospitalization [10]. The results suggest that stochastic models, like Markov chains and Monte-Carlo simulations, can be very successful in assessing LOS if the illness risk is well defined.
Jeon and colleagues propose another study, with practical values that are useful in resource planning, relying on 5 years' retrospective data for patients admitted to the Belfast City Hospital with a diagnosis of stroke [11]. Using a phase-type recovery model, they investigate LOS according to patients' age, type of stroke (i.e., hemorrhagic, cerebral infarction, and transient ischemic attack) and mode of discharge (i.e., the patient may die, be transferred to a nursing home, or be discharged to the individual's usual residence). On the other hand, Akkerman and Knip analyze the department of cardiac surgery of a Dutch hospital, looking at the relationships among patients' LOS, beds availability, and hospital waiting lists [12]. In detail, they investigate patients' LOS in hospital wards following cardiac surgery, describing and evaluating several scenarios for hospital management using Markov chain theory and simulation experiments.
Taking a large integrated healthcare delivery system into account (more than 300,000 hospitalizations occurring over 2 years in 17 hospitals), Harrison and Escobar adopt Multistage models to describe LOS distributions, trying to establish whether such models could be used for patient groups restricted by diagnosis, severity of illness, and the hospital supplying the treatments [13]. Considering the Irish healthcare system and focusing on delayed discharges, Rashwan and colleagues illustrate a system dynamics methodology used to model the flow of elderly patients, in order to better understand the dynamic complexity behind this phenomenon [14]. According to their results, there is evidence that the proposed system dynamic methodologies can support healthcare management in analyzing both different care pathways and delayed discharges of patients, optimizing their LOS. Finally, concentrating on elderly patients treated by the Regional Healthcare System of Italy's Abruzzo Region, Gordon and colleagues introduce a new methodology, based on a series of conditional Coxian phase-type distributions, that clusters patients according to their covariates (i.e., age, gender, and admission method) and LOS in hospital and then models patient pathways, making it possible to predict their LOS in hospital and community care [15]. This approach can shed new light on the rates at which patients move between healthcare suppliers (i.e., hospital and community care), supporting managers in reducing the negative effects of bed occupancy and of the premature discharge of patients without a suitable period of convalescence.
Clearly, according to the aforementioned literature, the problem of correct inpatient scheduling is relevant for healthcare management. On the one hand, the care pathways and internal organization of beds can affect the patients' chances to receive treatment and be hospitalized. On the other hand, bed occupancy represents a missed opportunity to increase hospital revenues by means of other treatments and/or hospitalizations, increasing the supply of healthcare services on the market. Considering public healthcare systems, this means longer waiting lists and growing inefficiency, which policy makers will have to face. Although the literature proposes interesting and effective methods to identify the best settings and approaches to managing beds [16,17], there is a gap concerning the latter point, i.e., estimating the opportunity cost of such interventions, and this is a fundamental step to support the implementation of the proposed interventions by policy makers. With the purpose of filling this knowledge gap, our paper applies a deep learning technique, i.e., Artificial Neural Networks (ANNs) to identify inpatients with a negative outcome and then, assuming effective interventions by the medical professionals, quantifies the increase in revenues for the additional hospitalizations that might be achieved. According to Shahid and colleagues, ANNs are leveraging machine-learning techniques that can support physicians with their diagnosis, as well as managers to support their decisions and to improve delivery of efficient care [18]. Indeed, according to Shahid and colleagues, ANNsbased solutions applied on the meso-and macro-level of decision-making suggests the promise of its use in contexts involving complex, unstructured or limited information, supporting an effective and efficient supply of care by medical institutions.
In detail, our work analyzes LOS according to selected covariates extracted from an administrative hospital database, adopting national benchmarks and thresholds to identify "too long" hospitalizations and presenting predictive models based on ANNs. Both the methodology applied to the case study (i.e., ANNs) and the dataset used (with more than 16,000 hospitalizations and 20 covariates) represent further contributions to the current knowledge, supporting both hospital administrators and clinicians in efficiently managing LOS and expenditure. Indeed, these prediction models may represent an effective tool for administrators and medical professionals to predict the outcome of hospital admission, selecting the patient groups at higher risk of negative outcomes, and to design interventions to improve hospital efficiency.

Methods
We investigate the Department of Internal and Emergency Medicine (DIEM) of a general hospital located in the North-West of Italy. In detail, the DIEM is structured around 13 clinical units: cardiology, hematology, geriatrics, infectious diseases, internal medicine, emergency medicine, nephrology, neurology, coronary care, gastroenterology, oncology, respiratory diseases, and rheumatology.
Administrative data were extracted from the hospital database, collecting information on more than 16,000 hospitalizations between January 2018 and December 2019. Then, the data were combined together (as inputs or outputs), applying ANNs, to create prediction models that may be adopted by physicians to support their managerial and clinical decision making. Such decisionmaking support, based on ANNs, may be able to identify patients with negative outcomes, guiding physicians and managers in the necessary interventions. In detail, considering hospitalizations in 2018, we performed a macro analysis on the whole sample without any distinctions among clinical units (i.e., the whole DIEM) and a micro analysis on the main clinical units and their hospitalizations. Lastly, we took hospitalizations in 2019 into consideration and tested the proposed prediction models, estimating expected clinical and managerial benefits.

Artificial Neural Networks (ANNs)
ANNs are complex models organized in layers, (multilayer) formed by neurons (also called perceptrons) and interconnected via synapses (weights). Due to their wellknown ability to generalize behaviors [19], ANNs have been successfully applied to many clinical fields [18], such as urinary tract infections and celiac disease [19,20], total hip replacement surgery [21], early ruling-in/ ruling-out of patients with suspected acute myocardial infarction using frequent biochemical monitoring [22], as well as in hospital management such as, for example, to evaluate patients' admissions to EDs [23,24], patients' readmission [25], and patients' appointment scheduling [26].
As highlighted in Fig. 1, the first layer is called "input layer" and it is composed of a number of neurons (or nodes) equal to that of the variables analyzed (in our specific case, as many neurons as the available pieces of information on the patients). The last layer is the "output layer", from which the results of the models are derived. The number of nodes in this layer depends on the type of answer expected. Typically, there is only one neuron because the result is expressed in dichotomous form. Between the input layer and the output layer there are hidden layers, which can be more than one. The literature points out that a single hidden layer can approximate any functional form [27,28]. The number of neurons in the hidden layers must be found empirically [29,30], although some authors have tried to define specific rules [31][32][33]. 1 The links between the layers are the "synapses", mathematically called weights, which collect information about the relationships between the input variables and the expected outputs. These relationships are formalized through activation functions that are generally non-linear in the first layer (i.e., tansig, logsig, hardlim, and so on). In our model, these functions are non-linear from the input layer to the hidden one (tansigmoidal in the first architecture and logsigmoidal in the second one) 2 and linear from the hidden layer to the output one. The relationships between the layers are collected in weight matrixes and their analysis makes it possible to evaluate the contribution of each piece of information to the definition of the expected outputs [34][35][36]. The Multi-Layer Perceptron (MLP) network is represented in Fig. 1 and its links are feed-forward because the connections come from the input layer to the hidden one and from the hidden layer to the output one. Backward or recursive relationships are not considered in this framework. The feed-forward MultiLayer Perceptron works with a supervised learning technique through a back-propagation algorithm. Note that there are some network frameworks that use an unsupervised procedure, i.e., Self-Organizing Map or Kohonen networks [38]. The supervised learning with the back-propagation algorithm works following this procedure: the initial sample is divided into two sub-samples, i.e., the training sample and the validation sample. In the first phase, only elements from the training sample are introduced into the model and, through the back-propagation algorithm, the network attempts to minimize the mean-square output error over the entire training set. The ANN computes weights matrixes until a predefined error threshold is reached [37]. In this step, the model is fed information about the patients but also about the selected target. This stage is very important because the ANN learns from the data and collects, within weight matrixes, information about the relationships between the variables. It clearly emerges that dividing the initial sample into training and validation is critical: the training set must represent all possible types of patients with their specific characteristics. In our case, we adopt the following proportions: 2/3 in the training sample and 1/3 in the validation sample.
Once the weights and ANN framework are defined (i.e., type of activation functions between layers; number of hidden layers and their nodes; other technical parameters, such as the search function for the optimal gradient, etc.), these parameters are applied to the validation sample, which is introduced into the ANN without any information on the selected outcomes. The ANN applies the framework to the new data in order to evaluate results and the ability of the model to provide correct classification. In our model, information about the patients is introduced into the input layer with the aim to obtain an outcome for each patient, indicating whether the selected outcomes are likely (1 if they are, 0 if they are not). Afterwards, we apply the algorithm aimed at maximizing either the specificity or the sensitivity value of our model, i.e., we can decide whether to maximize the network's ability to identify the number of true negatives or true positives [38]. Concerning computational issues of the algorithm, a bootstrap procedure is used in this study in order to improve the robustness of estimates. Moreover, since in the training phase ANN weights are randomly set every time, that algorithm allows us to run the ANN for a defined number of times (i.e., 100 replications). Finally, the algorithm presented here runs in the training phase and yields network parameters trying to ensure, if possible, minimum levels of sensitivity (i.e., 0.80) and specificity (i.e., 0.70) defined at the beginning of the computation. Thus, when a physician introduces new patient data into the model to forecast a negative outcome, the answer obtained about a potential future event is specifically based on levels of sensitivity and specificity set ex-ante. Analyses were performed using the MAT-LAB (release R2019b, 64bit) software.

Applications with smart solutions
The main application of this new technology is the adoption of smart solutions (e.g., a mobile app) to customize the stratification of patients admitted to the DIEM [39]. Clinical information about the patients might be collected at his/her admission to the Department and then processed by the algorithm based on ANNs to support the decision making process regarding hospitalization and specialist investigations. On the one hand, the adoption of these smart solutions gives the opportunity to customize risk stratification in real time, according to the specific clinical case (i.e., the patient's health status). On the other hand, further collection of data about incoming patients makes it possible to gather new evidence to refine the algorithm, so that updated versions of our innovative technology will become ever more effective at every access. In other words, ANNs and its application to smart solutions can provide an effective learning decision-making system to support health services.

Input and output specification
According to the empirical strategy and the proposed methodology based on ANNs, two model definitions have to be identified, i.e., which inputs and outputs are adopted for the macro and micro analysis respectively. In both analyses, two outcomes are introduced: • long hospitalization, which is a dummy variable equal to 1 if the hospitalization is longer than the national average, 0 otherwise; • outlier hospitalization which is a dummy variable equal to 1 if the hospitalization is longer than a specific reference value, 0 otherwise.
In the estimation of these outcomes, we consider all types of hospital discharges (e.g., to the patient's home, to another hospital, as well as death). The former outcome is based on a dynamic and variable benchmark, which is equal to the average of all hospitalizations occurred under a specific DRG code for a selected year (data source: Annual Report published by the Italian Ministry of Health, 2018). The latter outcome is based on a static threshold, which has been fixed for every specific DRG by the Italian Ministry of Health in 2008 through a dedicated regulation (i.e., Decreto Ministero della Salute del 18 Dicembre 2008). Obviously, the threshold is higher than the benchmark, which means that the outliers are a sub-sample of the long hospitalizations, and this is exactly why we decided to adopt both outcomes in our analysis, so that we may be able to estimate an interval for our potential interventions (i.e., between the benchmark and the threshold).
For what concerns the macro analysis, that is to say, the model specification in which we consider the whole sample of observations without any distinctions regarding departments (i.e., among clinical units), the inputs are the following: • sex, which is a dummy variable equal to 1 if the patient is male, 0 otherwise; • age, a matrix of four dummy variables according to the seniority class of patients, i.e., "18-40", "41-65", "66-75", "> 75"; • presence of cancer, which is a dummy variable equal to 1 if the patient has received a diagnosis of cancer; • type of admission, which is a matrix of three dummy variables according to the access, i.e., "urgent hospitalization with direct access", "urgent hospitalization from the emergency rooms" or "planned hospitalization"; • time slot, which is a matrix of three dummy variables according to the access, i.e., "morning", "afternoon" or "night"; • day of admission, which is a matrix of seven dummy variables according to the access, ranging from Monday to Sunday. • principal discharge diagnosis, which is a matrix of eleven dummy variables according to general areas of these diagnosis, i.e., selecting the 10 classes with the highest number of cases, which represent at least over 80% of the observations, plus one residual dummy variable. In the micro analysis, that is to say, the model specification in which we look at single clinical units and their hospitalizations, the inputs are the same as in the macro analysis, but the selected top 10 diagnosis classes, which are more specific in this case, and the presence of internal transfers among the same units are also investigated. Note that we use the clinical unit to which patients are first admitted as our selection criterion. Finally, note that the micro analysis focuses on the most representative clinical units, i.e., internal medicine, cardiology, emergency medicine, geriatrics, respiratory diseases, neurology, and oncology. Obviously, taking the 2018 hospitalizations into account, one model for each clinical unit is run in order to collect specific weights that can explain the relation between inputs and outcomes.

Empirical strategy
The results of the prediction models computed according to the aforementioned specifications are evaluated on the basis of the following indexes: • incorrect classification, i.e., the proportion of actual positives and actual negatives that are not correctly identified as such, in relation to the selected outcome and the whole population of patients; • sensitivity, i.e., the proportion of actual positives that are correctly identified as such, in relation to the selected outcome and the population of positive patients; • specificity, i.e., the proportion of actual negatives that are correctly identified as such, in relation to the selected outcome and the population of negative patients; • false positive rate (Type I error), i.e., the proportion of patients identified as false positives, in relation to the selected outcome and the population of actual negative patients; • false negative rate (Type II error), i.e., the proportion of patients identified as false negatives, in relation to the selected outcome and the population of actual positive patients; • Positive Likelihood Ratio, i.e., the change in pre-test probability caused by a positive test results, with values that range between 1 and + ∞ (the higher, the better); • Negative Likelihood Ratio, i.e., the change in pre-test probability caused by a negative test results, with values that range between 0 and 1 (the lower, the better); • Area Under the Curve, i.e., the area under the receiver operating characteristic curve, with values that range between 0 and 1 (the higher, the better).
These indexes represent the expected quality of our prediction models, i.e., the ability of our models to correctly identify patients in relation to the selected outcome. Finally, to complete the analysis, we have also estimated the Garson index for each variable introduced into the ANN, which represents its percentage contribution to the outcomes under investigation [35], as well as the expected sign of their contribution according to the NN synaptic weights [40]. Afterwards, considering the outcomes under investigation, Table 2 shows the percentage of hospitalizations in the clinical units, highlighting main differences. Considering the overall DIEM, 40% of the hospitalizations lasted more than the national average, and almost 3.74% were outliers. Note that, based on the aforementioned specification, these hospitalizations are a sub-sample of the whole set of longer hospitalizations. Finally, Table 2 illustrates other descriptive statistics on the DIEM units and their performance in 2018. In detail, the table shows the average LOS estimated on the basis of administrative data and the expected average LOS according to the national benchmarks. By comparing the two columns, we can identify the efficiency gap. For example, on average, the geriatrics unit has an overall LOS equal to 12.516 days for its hospitalizations, while the national average LOS was 9.808 days. This would translate into an average loss of more than 2 days for every hospitalization. Tables 9 and 10 in Appendix report all medical units, as well as the number of available beds and the number of hospitalizations in 2018. This information will be used to estimate the managerial impact of the proposed stratification rule. Lastly, Table 11 in Appendix show the correlation matrix among the main inputs introduced in the model definition.

Results
This section presents the results of our prediction models based on hospitalizations in 2018 (Table 3). In detail, taking the selected outcomes into account, the table displays the results in relation to the indexes defined in the previous section, as well as the number of observations (i.e., hospitalizations) used to validate every ANN (1/3 of the sample). Finally, in order to provide a more complete picture of our models and identify possible intervention strategies, Table 4 shows the weight of the covariates   used in the ANNs, which is expressed as Garson indexes [35], and the expected sign of their contribution, which is expressed as a function of the NN synaptic weights [40]. Focusing on Table 3 and considering the macro analysis, the prediction ability of our model is equal to 69.34% (sensitivity) and 48.59% (specificity) in case of hospitalizations longer than the national average; while it is equal to 62.73% (sensitivity) and 51.72% (specificity) in case of hospitalizations longer than the national threshold. In other words, according to the results and the sample under investigation, our prediction models can identify correctly more than sixty percent of the hospitalizations with an expected excessive LOS. Focusing on the micro analysis, we observe a certain level of heterogeneity in our prediction models, which could be due to the internal organization of our medical units, and the impact of our inputs on the correct identification of the outcome under investigation (e.g., scheduling). Obviously, the values of sensitivity and specificity reflect the strategy adopted by authors in orienting the model [39]. Indeed, authors decided to maximize the ability of our prediction models in identifying correctly the subjects with hospitalizations longer than the selected values (i.e., national average or national threshold), guaranteeing the possibility for selected interventions aimed to save financial resources and, even more important, to improve the health conditions of these patients.
These interventions could be oriented to the internal organization of these hospitalizations, coherently with the insights proposed in Table 4. Indeed, depending on the selected outcome and the unit under investigation, the Table 4 indicates the contribution of every covariate in predicting that outcome (expressed as a percentage) and whether the impact is positive or negative (highlighted in the tables in green or red respectively). In other words, the red cells indicate that the variable can decrease the probability of obtaining the selected outcome, while the green cells indicate that the variable can increase the probability of obtaining the selected outcome. Note that we also consider the top 10 diagnoses as covariates (i.e., the diagnoses with the highest number of hospitalizations), although they are not presented here.
Focusing on patients' admission to the DIEM (first column of Table 4), according to the estimated indexes, hospitalization during the night can contribute to the final outcome in the proportion of 2.46% (i.e., 2.46 is the weight of this covariate), reducing the likelihood of a LOS longer than the national average. Similarly, if the admission occurs during the afternoon, the contribution to the final outcome is almost double (i.e., 4.62%) and, even more importantly, its sign is opposite (i.e., increasing the probability of a LOS above the national average). In other words, at the DIEM level (i.e., macro analysis), the admission of patients in the morning or in the afternoon can increase the expected probability of a LOS longer than the national average, with the afternoon having the highest impact, while nighttime hospitalizations can decrease the likelihood of that outcome. Obviously, this information can be combined with the other information (e.g., main diagnosis, day and type of admission), so that managers and medical professionals may identify ways to reduce the current LOS, working both on current admission procedures and scheduling. By looking at the covariates in the micro analysis (i.e., investigation of the single units), we can observe a certain degree of heterogeneity in our results, which is probably due to the different internal organization of the units and the clinical pathways adopted. Note that separate ANN frameworks have been run for each clinical unit; therefore, it is not surprising that the Garson indexes and relative contributions of the covariates are different among them, since they represent the detected heterogeneity. What about the managerial and clinical implications of our results?

Clinical, managerial and economic implications of ANNs prediction models
Let us assume that the DIEM decides to adopt interventions necessary to improve its performance in 2019, based on the prediction models relying on the data extracted for 2018. In particular, let us imagine that the hospital management decides to use dedicated human resources to monitor all the patients identified as positive for the first outcome (i.e., hospitalizations with expected length above the national average and, thus, classified as longer) or, alternatively, to reorganize current hospital admissions according to the results displayed in Tables 4 (e.g., planning non-urgent hospital admissions on specific days). Finally, let us assume that the planned interventions are efficient, that is to say, they are successful in reducing the length of hospitalizations so as to reach the selected benchmark (i.e., the national average).   Figure 2 shows the results of this simulation following the aforementioned hypothesis (i.e., efficient interventions for all patients selected as positive to the first patient management outcome). In detail, the first column presents the observed average length of hospitalizations based on the 2019 activities, while the second column shows the expected average length according to the national benchmarks. Finally, the third column displays the theoretical LOS assuming that the DIEM carries out interventions on the patients identified as positive using the prediction models based on ANNs, and also assuming that the interventions are completely efficient (i.e., the DIEM reduces the length of all the selected hospitalizations so as to reach the national benchmark). By comparing the reported values, the readers can easily observe the impact of the supporting decision rule proposed here, which may significantly reduce the length of hospitalizations. On average, considering the whole department and the macro prediction model, LOS could decrease from 9.41 to 7.23 days, that is to say, on average, the department could make hospitalizations shorter by more than 2 days. Note that, by working on the patients identified as positive to the selected outcome and reducing their LOS, we may obtain a performance indicator that would be much better than the national average (i.e., 7.23 vs. 9.23).
The extra days gained thanks to this intervention could be used for other hospitalizations, lessening the current overcrowding and increasing the revenues of the department. Focusing on the latter outcome, Table 5 shows the economic impact of this improvement, reporting the number of additional hospitalizations and expected additional revenues. These revenues are estimated adopting the average hospitalization values collected in 2018, based on the specific DRG of these hospitalizations and current reimbursement values.
On average, adopting the prediction model presented here to identify patients at risk of prolonged LOS, the DIEM could expect additional revenues equal to more than 12 million Euro. Considering the single clinical units and using the prediction models showed in Table 5, we can observe that the most significant contribution is expected for Neurology, Emergency medicine, Cardiology and Internal medicine (i.e., economic impact higher than 1 million Euro). This is exactly the contribution of our prediction model based on ANNs to the management of patients: it can support medical professionals in the identification of the most critical cases and then, as they carry out the interventions that they deem necessary, it can improve outcomes and the efficiency of clinical units.
What about the second prediction model aimed at identifying outlier hospitalizations? Following the same approach as above and assuming effective interventions on all the patients with the identified outcome, our calculations reveal that the DIEM could shorten the average LOS up to 9.11 days, with a reduction equal to 0.30. This improvement could provide 288 additional hospitalizations and additional revenues amounting to € 1,318,428.27. Note that outlier hospitalizations only represent 3.74% of the total hospitalization sample under investigation (see Table 3), and this is why the expected impact is lower. Taking the clinical units into account, the most relevant economic implications are expected for Internal medicine, with 139 additional hospitalizations and expected additional revenues equal to € 456,320.00.
As highlighted in "Methods" section, the outliers are a sub-sample of the longer hospitalizations, and this is precisely why we decided to adopt both outcomes in our analysis, so that we would be able to estimate an interval for our potential interventions (i.e., between the benchmark and the threshold). Looking at the results, we can identify that interval, which is between € 1,318,428.27 and € 12,153,936.00 in terms of additional revenues and between 288 and 2650 in terms of additional days.
Finally, Table 6 focuses on the interventions that could be adopted to improve LOS, simulated by our prediction models for 2019 and considering three key diagnosis groups (i.e., heart failure, pneumonia, and sepsis). The second column of the table presents the observed average length of hospitalizations, while the third column displays the expected average length of hospitalizations according to the national benchmark. The fourth column, instead, shows the hypothetically achievable LOS if the DIEM adopts interventions on the patients identified by the prediction models based on ANNs, and assuming that these interventions are efficient (i.e., assuming that the DIEM can reduce the length of hospitalizations enough to reach the reference values). Focusing on these three key diagnosis groups rather than on the clinical units, Table 6 indicates where there might be opportunities to reduce LOS through the specific clinical pathways that regulate the treatment of patients. Obviously, this approach provides more easily readable and understandable outputs, shedding new light on the link between LOS and clinical pathways. For example, the prediction models are able to identify and support the appropriate monitoring of patients with heart failure and, assuming that the interventions are effective, 1437 additional days might become available for hospitalizations, which means 200 additional hospitalizations (considering the average LOS for this diagnosis group). According to our data, 90% of the additional days and hospitalizations are concentrated in two specific types of diagnosis: "unspecified congestive heart failure (code 4280)" and "left heart failure (code 4281)". Therefore, medical professionals may work to modify the current clinical pathways for these two conditions, belonging to the macro classification "heart failure", in order to optimize LOS.

Discussion
The problem of correct inpatient scheduling is extremely significant for healthcare management [41,42]. On the one hand, extended LOS duration can have negative effects on the supply of healthcare treatments, reducing patient accessibility. On the other hand, there can also be negative effects on the budget of healthcare suppliers, creating missed opportunities to increase hospital revenues by means of other treatments and/or additional hospitalizations. Finally, considering the market of healthcare services, social planners would have to tackle the consequences of lower supplier competitiveness, which could be even more relevant if we consider the current regulations on patients' rights in cross-border healthcare (i.e., Directive 2011/24/EU of the European Parliament and of the Council of 9 March 2011). According to our evidence, we can identify the main areas where managers could implement an internal reorganization to increase hospital's efficiency and the effectiveness of its treatments, as well as the expected economic outcomes of these interventions. Indeed, the interpretation of the collected results can shed new light on the internal organization of medical units and its role as LOS driver. For instance, at macro level (i.e., considering the DIEM), results suggest that hospitalizations during the weekend (i.e., Saturday or Sunday) can have a negative impact on the expected LOS, increasing its probability of being longer than the national average. At the same time, ceteris paribus, hospitalizations during the night can decrease this probability. These insights could be clearly explained by the internal organization of these medical units and the impossibility to supply promptly treatments and interventions to hospitalized patients (e.g., during the weekend). At the same time, considering hospitalization during the night, we can imagine there could be more opportunities to plan properly patients' therapeutic path, as well as more possibilities to pay appropriate attention to their diagnosis and the successive interventions. This could be explained by a lower demand of care in this specific moment (i.e., during the night), which can lead to minimize waste of time and resources in the initial steps.
Nevertheless, managers cannot control all LOS drivers. Results suggest that planned hospitalizations have a longer LOS, which is perfectly coherent with the type of access and it is outside the control of local management. With respect to this specific point, we might support the necessity to coordinate the territorial competence of ERs and the supply of treatments by policy makers, so that there could be an equilibrium between planned and urgent hospitalizations (according to Table 1, the number of non-urgent hospitalizations is less than 20%). Indeed, in order to contain the health expenditure, the number of medical centers and the supply of treatments have decreased significantly in the last years, creating a congestion effect that has increased the waiting lists and the access to the medical facilities by urgent cases (up to 80% in our case study).
These results are coherent with previous findings. On the one hand, literature emphasizes that medical units have to take care of urgent admissions and this adds a lot to the resulting inpatient pathway [43]. On the other hand, there are factors that strictly depend on the organizational process that could affect LOS [44,45]. Indeed, coherently with our results, literature suggests that much of the variation in hospitals' LOS is not attributable to patient illness, but rather it is due to differences in practice style [46] and it is potentially avoidable [47]. Moreover, as highlighted in our results, daily and timely kinetics of admissions are a known factor linked to the duration of hospital stay [44,48].

Limits
Even though our results are interesting and may effectively support physicians in their daily professional activities, our work also has some limits due to the available data and main assumptions (i.e., reference values and interventions).
First of all, the administrative data should be supported by more precise clinical information, if available. Indeed, gathering clinical and administrative information might help improve the current prediction models, reducing the number of false positives (i.e., lower cost in the patient monitoring phase, since there would be no risk of hospitalizations that are "too long"), as well as the number of false negatives (i.e., patients that are not monitored since there are wrong assumptions on their expected LOS).
Secondly, our results are solely based on the data collected in the specific department under investigation, with its specific internal organization, which clearly affects the final outcomes. Accordingly, our approach might prove effective only if adopted by health providers with similar characteristics and clinical pathways. Accordingly, there are clear limits in the external validity of our evidence, which are driven by the sample, the specific internal organization and the socio-economic environment under investigation.
Finally, we work under the assumption that the reference values are realistic and achievable, which we cannot take for granted. Taking the first outcome into account (i.e., longer LOS), the adopted reference values are based on the national average according to the DRG classification. This means that they are estimated considering all hospitalizations in Italy, classified under that specific DRG code upon discharge and assuming homogeneous procedures that could make the collected values comparable. Therefore, even if the data could be normalized to avoid outliers, the reference values might still be influenced by the sample selection, i.e., by the heterogeneity among regional healthcare systems and the specific procedures adopted by providers. If data become available, our DIEM should be compared with other departments that work within the same regional healthcare system

Conclusion
Adopting available national reference values and focusing on a Department of Internal and Emergency Medicine (DIEM) located in the North-West of Italy, this work assesses prediction models of hospitalizations with LOS longer than the selected benchmarks and thresholds. The prediction models investigated in this case study are based on Artificial Neural Networks (ANNs) and, according to our results, they might provide administrators and medical professionals with an effective tool to predict the outcome of hospital admissions and design interventions to improve hospital efficiency. Indeed, assuming effective interventions on all the subjects identified by the prediction models, the management of the DIEM could reduce the average LOS by up to 2 days, guaranteeing more than 2000 additional hospitalizations in a year. An innovative approach that might be based on smart solutions (e.g., a mobile app) to customize the stratification of patients admitted to the DIEM and to support the professionals' decision making.