The approximation technique is restricted to three-state progressive, time-homogenous Markov disease processes such as the one shown in Fig. 1 [15]. *Progressive* means that transitions are irreversible (i.e. cannot return to health from illness). *Time-homogenous*, means that transition rates do not change over time. *Markov* means that transition rates do not depend on disease history; in other words, the probability that a patient transitions from state *x* to state *y* during a particular time period is independent of their previous health state.

The data needed to use the approximation technique can be abstracted from most articles reporting PFS and OS analyses. The number of patients experiencing an event and number of censored patients in both the PFS (\(N_{pfs}^e\) and \(N_{pfs}^c\)) and OS (\(N_{os}^e\) and \(N_{os}^c\)) analyses can be determined from the article text or patients-at-risk risk table. To obtain the remaining data points, PFS and OS KM curves need to be digitized. Digitized KM curves can then be used to reconstruct individual patient data using validated algorithms to determine the event times in the PFS and OS analyses [16]. The approximation technique requires that we make note of the maximum observation time (event or censoring) in the PFS and OS analyses (\(\tau _{pfs}\) and \(\tau _{os}\) respectively). The area under the PFS and OS curves (\(AUC_{pfs}\) and \(AUC_{os}\) respectively) are calculated by summing the area under each step of the KM curve.

We denote \(h \rightarrow i\), \(h \rightarrow d\), and \(i \rightarrow d\) transition rates as \(\lambda _{hi}\), \(\lambda _{hd}\), and \(\lambda _{id}\). For the time-homogenous disease processes (i.e. constant transition rates), exit times from the (i) healthy state (i.e. \(h \rightarrow i\) or \(h \rightarrow d\) transition) and (ii) ill state (i.e. \(i \rightarrow d\) transition) are exponentially distributed. Furthermore, once a patient exits health, the probability that they make an \(h \rightarrow d\) transition is

$$\begin{aligned} \rho =\frac{\lambda _{hd}}{\lambda _{hi}+\lambda _{hd}} \end{aligned}$$

We will refer to \(\rho\) as the risk of death for healthy patients. As there are only two possible transitions out of health, the probability that a transition out of the health state is an \(h \rightarrow i\) transition is \(1-\rho\).

The mean time of exit from the healthy state (i.e. mean progression-free survival time) is a biased measure in the presence of right censoring [17]. Instead we calculate the restricted mean progression free-survival time (\({\mathrm {RMPFST}}^{-\tau }\)) which is interpreted as the mean progression-free survival time if observation is restricted to a truncation time \(\tau\) [18]. Since the exit time from health is exponentially distributed, the \({\mathrm {RMPFST}}^{-\tau }\) can be calculated as

$$\begin{aligned} {\mathrm {RMPFST}}^{-\tau }=\frac{1-e^{-\left( \lambda _{hi}+ \lambda _{hd}\right) \tau }}{\lambda _{hi}+\lambda _{hd}}. \end{aligned}$$

(1)

By definition, the area under the PFS curve is equal to \({\mathrm {RMPFST}}^{-\tau }\) when \(\tau\) is set to the maximum observation time in the PFS analysis, \(\tau _{pfs}\) [19, 20]. Using Formula 1, we can then numerically solve for \(\lambda _{hi}+\lambda _{hd}\) using standard algorithmic methods [21]. Simultaneous events in the PFS and OS analyses indicate \(h \rightarrow d\) transitions. Therefore, we can approximate the risk of death for healthy patients as

$$\begin{aligned} \rho \approx \frac{N_{simul}}{N_{pfs}^e}. \end{aligned}$$

(2)

To approximate \(\lambda _{id}\) we need to use information gathered from the OS analysis. It is more challenging to define an exact formula for the restricted mean overall survival time (\(\mathrm {RMOST}^{-\tau }\)) than form the \({\mathrm {RMPFST}}^{-\tau }\) because exit from the alive state (i.e. healthy or ill) is defined by a mixture of two exponential distributions: exit from health and exit from illness. However, if we know the death times \(o_i^e\) and censoring times \(o_j^c\) for a cohort of alive patients, \(N_{os}^e\) who had an observed event, and \(N_{os}^c\) who were right censored, we can approximate \({\mathrm {RMOST}}^{-\tau }\) truncated to \(\tau _{os}\), \({\mathrm {RMOST}}^{-\tau _{os}}\), using inverse probability weighting [22]

$$\begin{aligned} {RMOST}^{-\tau _{os}} \approx \left( \left( \frac{N_{os}^e+N_{os}^c}{N_{os}^e}\right) \sum \limits _{i=1}^{N_{os}^e}o_i^e + \sum \limits _{j=1}^{N_{os}^c}o_j^c \right) \left( \frac{1}{N_{os}^e+N_{os}^c}\right) . \end{aligned}$$

(3)

Next, we determine the total person-time of observation in the OS analysis

$$\begin{aligned} E_{os}= \sum \limits _{i=1}^{N_{os}^e}o_i^e + \sum \limits _{j=1}^{N_{os}^c}o_j^c. \end{aligned}$$

(4)

If censoring times are not denoted on the OS curve, it is not possible to determine \(o_j^c\). However, we can rearrange Formula 3 to yield

$$\begin{aligned} \sum \limits _{j=1}^{N_{os}^c}o_j^c \approx {\mathrm {RMOST}}^{-\tau _{os}} \left( N_{os}^e+N_{os}^c \right) - \sum \limits _{i=1}^{N_{os}^e}o_i^e\left( \frac{N_{os}^e+N_{os}^c}{N_{os}^e}\right) . \end{aligned}$$

If we substitute this relationship into Formula 4 we obtain

$$\begin{aligned} E_{os}&\approx \sum \limits _{i=1}^{N_{os}^e}o_i^e+ {\mathrm {RMOST}}^{-\tau _{os}} \left( N_{os}^e+N_{os}^c \right) -\sum \limits _{i=1}^{N_{os}^e}o_i^e \left( \frac{N_{os}^e+N_{os}^c}{N_{os}^e}\right) \nonumber \\&\approx {\mathrm {RMOST}}^{-\tau _{os}}\left( N_{os}^e+N_{os}^c \right) + \sum \limits _{i=1}^{N_{os}^e}o_i^e \left( 1- \frac{N_{os}^e+N_{os}^c}{N_{os}^e} \right) \end{aligned}$$

(5)

We can repeat the same calculations using the corresponding data from the PFS analysis to approximate total person-time of observation in the OS analysis, \(E_{pfs}\). We then approximate the total person-time of observation in the ill state as

$$\begin{aligned} E_{ill} \approx E_{os} - E_{pfs}. \end{aligned}$$

(6)

If we make the assumption that the number of \(i \rightarrow d\) transitions is

$$\begin{aligned} N_{id} \approx N_{os}^e - N_{simul}, \end{aligned}$$

(7)

we can compute [23]

$$\begin{aligned} \lambda _{id} \approx \frac{N_{id}}{E_{ill}}. \end{aligned}$$

(8)