---
title: "ROPE-based trial design for single-arm one-stage phase II trials with binary endpoints"
author: | 
  | Riko Kelter
  | Institute of Medical Statistics and Computational Biology
  | Faculty of Medicine
  | University of Cologne
  | Cologne, Germany
date: "`r format(Sys.Date(), '%d %B %Y')`"
bibliography: references.bib
output:
  rmarkdown::html_vignette:
    mathjax: default
    includes:
      in_header: mathjax-config.html
vignette: >
  %\VignetteIndexEntry{ROPE-based trial design for single-arm one-stage phase II trials with binary endpoints}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width  = 6,
  fig.height = 5,
  dpi        = 100,
  fig.retina = 1,
  dev        = "png",
  dev.args   = list(type = "cairo-png")
)

library(bfbin2arm)
```

## Introduction

This vignette illustrates how to use `design_singlearm_onestage_rope()` to calibrate
ROPE-based equivalence designs for single-arm phase II trials with binary endpoints. ROPE stands for the region of practical equivalence and has been proposed by @Kruschke2018, @kruschkeDoingBayesianData2014 and @Kruschke2018a, even though the idea itself is older and appears under various names in different contexts, see @Kelter2021BMCHodgesLehmann, @Liao2020, @lindeDecisionsEquivalenceComparison2023, @Lakens2018, @Wellek2010 and @panUntappedPotentialBayesian2025. The idea to replace the test of a point-null hypothesis with a small-interval goes at least back until @Hodges1954.

## Setup

We consider a single-arm binomial model

\[
Y \mid p \sim \mathrm{Binomial}(n, p),
\]

where \(Y\) is the number of responders among \(n\) patients and \(p \in (0,1)\) is
the true response probability under the experimental treatment.
We fix a benchmark response rate \(p_0\) (e.g. historical control or standard of care)
and define the risk difference

\[
\Delta = p - p_0
\]

We work with a symmetric ROPE formulation on the risk-difference scale.
Let \(\Delta = p - p_0\) denote the risk difference between the experimental
treatment and the benchmark response probability \(p_0\), and let
\(\delta > 0\) be the equivalence margin. On the risk-difference scale we define

\[
H_0:\; |\Delta| > \delta,
\]

\[
H_1:\; |\Delta| \le \delta.
\]

Equivalently, on the response-probability scale the ROPE is

\[
[p_0 - \delta,\; p_0 + \delta] \cap (0,1),
\]

and the hypotheses can be written as

\[
H_0:\; p \notin [\,p_0 - \delta,\; p_0 + \delta\,],
\]

\[
H_1:\; p \in [\,p_0 - \delta,\; p_0 + \delta\,].
\]

## The region of practical equivalence (ROPE) 

The **region of practical equivalence (ROPE)** on the risk-difference scale is

\[
\mathcal{R}_\Delta = [-\delta, \delta],
\]

where \(\delta > 0\) is a prespecified equivalence margin.
Equivalently, on the response-probability scale the ROPE for \(p\) is

\[
\mathcal{R}_p = [p_0 - \delta,\; p_0 + \delta] \cap (0,1).
\]

Given a beta analysis prior

\[
p \sim \mathrm{Beta}(a, b),
\]

the posterior after observing \(Y = y\) is

\[
p \mid y \sim \mathrm{Beta}(a + y,\; b + n - y),
\]

and the posterior ROPE probability is

\[
\Pr\bigl(p \in \mathcal{R}_p \mid y\bigr)
  = F_{\mathrm{Beta}(a+y,\,b+n-y)}(p_0 + \delta)
  - F_{\mathrm{Beta}(a+y,\,b+n-y)}(p_0 - \delta),
\]

with endpoints truncated to \([0,1]\) if needed. A ROPE-based equivalence decision rule declares **practical equivalence** if

\[
\Pr\bigl(p \in \mathcal{R}_p \mid y\bigr) \ge \gamma_{\mathrm{eq}},
\]

where \(\gamma_{\mathrm{eq}} \in (0.5, 1)\) is a pre-specified evidence threshold.

## Design and analysis priors

At the **design stage** we distinguish between three priors:

- an **analysis prior** \(\mathrm{Beta}(a, b)\) used to compute posterior ROPE probabilities,
- a **design prior under equivalence** \(H_1: \Delta \in [-\delta, \delta]\),
  typically \(\mathrm{Beta}(a_1, b_1)\) centred near \(p_0\),
- a **design prior under non-equivalence** \(H_0: \Delta \notin [-\delta, \delta]\),
  typically \(\mathrm{Beta}(a_0, b_0)\) centred away from the ROPE.

These design priors induce beta–binomial predictive distributions for \(Y\) under
equivalence and non-equivalence, respectively. Under the equivalence design prior
\(\pi_1\) we define **ROPE-based Bayesian power** as

\[
\text{Power}_\text{ROPE}(n)
= \Pr_{\pi_1}\bigl( \Pr(p \in \mathcal{R}_p \mid Y) \ge \gamma_{\mathrm{eq}} \bigr),
\]

and under the non-equivalence design prior \(\pi_0\) we define the **ROPE-based
Bayesian type-I error** as

\[
\alpha_\text{ROPE}(n)
= \Pr_{\pi_0}\bigl( \Pr(p \in \mathcal{R}_p \mid Y) \ge \gamma_{\mathrm{eq}} \bigr).
\]

## ROPE decision illustrations

In this section we illustrate the ROPE-based decision rule for four prototypical
outcomes in a single-arm binomial model with analysis prior
\(p \sim \mathrm{Beta}(1,1)\), benchmark response rate \(p_0 = 0.30\), and ROPE
\(\mathcal{R}_p = [p_0 - \delta, p_0 + \delta] = [0.18, 0.42]\) with
\(\delta = 0.12\).

For an observed responder count \(Y = y\) out of \(n\) patients, the posterior is

\[
p \mid y \sim \mathrm{Beta}(a + y,\; b + n - y),
\]

and the symmetric ROPE probability is

\[
\Pr\bigl(|p - p_0| \le \delta \mid y\bigr)
  = \Pr(p_0 - \delta \le p \le p_0 + \delta \mid y).
\]

We adopt the following simple decision rule:

- **Equivalence accepted** if \(\Pr(|p - p_0| \le \delta \mid y) \ge \gamma_{\mathrm{eq}}\).
- **Non-equivalence accepted** if \(\Pr(|p - p_0| > \delta \mid y) \ge \gamma_{\mathrm{diff}}\).
- **Indecisive** otherwise,

with \(\gamma_{\mathrm{eq}} = \gamma_{\mathrm{diff}} = 0.80\) in the examples below.

```{r, echo = FALSE}
plot_rope_posterior <- function(n, y,
                                p0 = 0.30,
                                delta = 0.12,
                                a = 1, b = 1,
                                gamma_eq = 0.80,
                                gamma_diff = 0.80,
                                main = "") {
  shape1 <- a + y
  shape2 <- b + n - y
  p_min  <- max(0, p0 - delta)
  p_max  <- min(1, p0 + delta)

  p_grid <- seq(0, 1, length.out = 1000)
  dens   <- dbeta(p_grid, shape1, shape2)

  rope_prob <- pbeta(p_max, shape1, shape2) - pbeta(p_min, shape1, shape2)
  diff_prob <- 1 - rope_prob

  decision <- if (rope_prob >= gamma_eq) {
    "Equivalence accepted"
  } else if (diff_prob >= gamma_diff) {
    "Non-equivalence accepted"
  } else {
    "Indecisive"
  }

  plot(p_grid, dens, type = "n",
       xlab = expression(p), ylab = "Posterior density",
       main = main)

  usr <- par("usr")
  x_min <- usr[1]
  x_max <- usr[2]
  y_min <- usr[3]
  y_max <- usr[4]

  # Lighter matte background regions
  h0_col <- adjustcolor("#DCEAF7", alpha.f = 0.55)  # light matte blue
  h1_col <- adjustcolor("#F7DDDD", alpha.f = 0.55)  # light matte red

  # H0: outside ROPE
  rect(xleft = x_min, ybottom = y_min,
       xright = p_min, ytop = y_max,
       col = h0_col, border = NA)
  rect(xleft = p_max, ybottom = y_min,
       xright = x_max, ytop = y_max,
       col = h0_col, border = NA)

  # H1: inside ROPE
  rect(xleft = p_min, ybottom = y_min,
       xright = p_max, ytop = y_max,
       col = h1_col, border = NA)

  # Posterior density and benchmark
  lines(p_grid, dens, lwd = 2)
  abline(v = p0, lty = 2)

  # Build plotmath label explicitly
  rope_label <- expression(scriptstyle(R)[p])

  text(x = (p_min + p_max) / 2 + 0.05,
       y = y_min + 0.06 * (y_max - y_min),
       labels = rope_label,
       cex = 1.05)

  text(x = (x_min + p_min) / 2,
       y = y_min + 0.50 * (y_max - y_min),
       labels = expression(H[0]),
       col = "#5B84B1",
       cex = 1.1)

  text(x = (p_max + x_max) / 2,
       y = y_min + 0.50 * (y_max - y_min),
       labels = expression(H[0]),
       col = "#5B84B1",
       cex = 1.1)

  text(x = (p_min + p_max) / 2,
       y = y_min + 0.78 * (y_max - y_min),
       labels = expression(H[1]),
       col = "#C06C84",
       cex = 1.1)

  legend("topright",
         legend = c(
           sprintf("y = %d / n = %d", y, n),
           sprintf("Pr(ROPE | y) = %.2f", rope_prob),
           sprintf("Pr(outside ROPE | y) = %.2f", diff_prob),
           sprintf("Decision: %s", decision)
         ),
         bty = "n")
}
```

### 1) Equivalence accepted

We choose an outcome \((n, y)\) for which the posterior is concentrated inside
the ROPE and \(\Pr(|p - p_0| \le \delta \mid y) \ge \gamma_{\mathrm{eq}}\),
so the decision is to **accept equivalence**.

```{r, echo = FALSE, eval = FALSE}
par(mar = c(4, 4, 3, 1))
plot_rope_posterior(
  n = 100,
  y = 30,             # close to p0 * n = 30
  main = "Scenario 1: Equivalence accepted"
)
```
```{r, echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 1: Illustration of the first possible scenario in a ROPE-based clinical phase II trial with binary endpoints: Equivalence is accepted, because sufficient posterior probability mass concentrates inside the ROPE. The true data-generating process follows the alternative hypothesis, that is, equivalence indeed holds."}
knitr::include_graphics("figures/singlearm-onestage-rope-scenario1.png")
```

The plot illustrates this first possible outcome.

### 2) Type-I error: equivalence concluded under the null hypothesis

Conceptually, a type-I error occurs when the *true* data-generating process
is non-equivalent (e.g. \(p = 0.55\) or 0.60), but the observed data still lead
the ROPE rule to **accept equivalence**. Thus, $H_0$ is true and $p \notin [\,p_0 - \delta,\; p_0 + \delta\,]$ holds. 

In this plot we **do not** change the posterior calculation—posterior is always
conditional on the observed \((n,y)\) and the analysis prior. To illustrate a
type-I error, we choose \((n,y)\) such that:

- \(y\) is plausible under a non-equivalence scenario (e.g. generated from
  \(p = 0.55\)), **and**
- the resulting posterior still satisfies
  \(\Pr(|p - p_0| \le \delta \mid y) \ge \gamma_{\mathrm{eq}}\).

For illustration we tune \(y\) so that this happens:

```{r echo = FALSE, eval = FALSE}
par(mar = c(4, 4, 3, 1))
plot_rope_posterior(
  n = 100,
  y = 35,             # pick y so Pr(ROPE|y) >= 0.8 but mean > p0 + delta
  main = "Scenario 2: Equivalence accepted (type-I error case)"
)
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 2: Illustration of the second possible scenario in a ROPE-based clinical phase II trial with binary endpoints: Equivalence is accepted, because sufficient posterior probability mass concentrates inside the ROPE. In contrast to the first possible scenario, the true data-generating process follows the null hypothesis. Thus, a ROPE-based type-I-error occurs."}
knitr::include_graphics("figures/singlearm-onestage-rope-scenario2.png")
```


In this scenario, in contrast to scenario 1 above, the *true* \(p\) lies
outside the ROPE (under $H_0$), but due to sampling variability the posterior
still concentrates enough mass inside the ROPE to meet the equivalence threshold.

### 3) Indecisive result

Here we choose \((n,y)\) such that neither threshold is reached:

- \(\Pr(|p - p_0| \le \delta \mid y) < \gamma_{\mathrm{eq}}\),
- \(\Pr(|p - p_0| > \delta \mid y) < \gamma_{\mathrm{diff}}\).

The posterior spreads substantial mass both inside and outside the ROPE, and
the decision is **indecisive**.

```{r echo = FALSE, eval = FALSE}
par(mar = c(4, 4, 3, 1))
plot_rope_posterior(
  n = 100,
  y = 18,             # tuned so ROPE probability is between ~0.3 and 0.7
  main = "Scenario 3: Indecisive (posterior straddles ROPE)"
)
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 3: Illustration of the third possible scenario in a ROPE-based clinical phase II trial with binary endpoints: The result is indecisive, because neither does sufficient posterior probability mass concentrate inside the ROPE, nor outside the ROPE."}
knitr::include_graphics("figures/singlearm-onestage-rope-scenario3.png")
```
### 4) Clear non-equivalence

Finally, we choose an outcome where the posterior lies mostly outside the ROPE,
so that \(\Pr(|p - p_0| > \delta \mid y) \ge \gamma_{\mathrm{diff}}\) and we
**accept non-equivalence**.

```{r echo = FALSE, eval = FALSE}
par(mar = c(4, 4, 3, 1))
plot_rope_posterior(
  n = 100,
  y = 10,             # clearly below the ROPE region
  main = "Scenario 4: Non-equivalence accepted"
)
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 4: Illustration of the fourth possible scenario in a ROPE-based clinical phase II trial with binary endpoints: Non-equivalence is accepted, because sufficient posterior probability mass concentrates outside the ROPE."}
knitr::include_graphics("figures/singlearm-onestage-rope-scenario4.png")
```
In this last case, the treatment is worse than the standard of care with success probability $p_0$.

## A first ROPE-based design example

We now provide a simple example of the calibration function
`design_singlearm_onestage_rope()`, which calibrates a single-arm one-stage phase II
design using the ROPE as the primary measure of evidence.

We consider a setting with benchmark response rate \(p_0 = 0.30\) and regard
differences up to 0.12 as clinically negligible. Thus the ROPE on \(p\) is
\(\mathcal{R}_p = [0.18, 0.42]\).

We use:

- a **uniform analysis prior** \(\mathrm{Beta}(1,1)\),
- a **non-equivalence design prior** \(\mathrm{Beta}(60,40)\) with mean 0.60,
  representing clearly superior response compared to 0.30 (non-equivalence); this is the design prior under $H_0$
- an **equivalence design prior** \(\mathrm{Beta}(36,84)\) with mean 0.30,
  representing plausible equivalence scenarios; this is the design prior under $H_1$
- an **equivalence threshold** \(\gamma_{\mathrm{eq}} = 0.80\),
- a **target ROPE-based power** of 0.80 under the equivalence design prior,
- a **maximum ROPE-based type-I error** of 0.10 under the non-equivalence design prior,
- a **sustain requirement** of `sustain_n = 10`, meaning the criteria must hold
  for 10 consecutive sample sizes starting from the selected \(n^\ast\).

```{r}
des_baseline <- design_singlearm_onestage_rope(
  n_min = 20,
  n_max = 200,
  p0 = 0.30,     # benchmark response rate p0
  delta = 0.12,  # ROPE half-width: equivalence if p in [0.18, 0.42]
  gamma_eq = 0.80,  # posterior ROPE probability threshold for equivalence

  # Analysis prior: p ~ Beta(a, b), used for posterior and ROPE decision
  a = 1,
  b = 1,

  # Design prior under H0 (non-equivalence): p ~ Beta(da0, db0)
  # Here: mean 0.60, representing clearly higher response than 0.30.
  da0 = 60,
  db0 = 40,

  # Design prior under H1 (equivalence): p ~ Beta(da1, db1)
  # Here: mean 0.30, representing plausible equivalence scenarios.
  da1 = 36,
  db1 = 84,

  # Target ROPE-based power under H1 (equivalence design prior)
  target_power = 0.80,

  # Maximum ROPE-based type-I error under H0 (non-equivalence design prior)
  target_type1 = 0.10,

  # Stability requirement: criteria must hold for 10 consecutive n values
  sustain_n = 10
)
```
We can take a look at the resulting design object:
```{r}
des_baseline
```

The printed output reports:

- the search range for \(n\),
- the ROPE specification (`p0`, `delta`, `gamma_eq`),
- the analysis and design priors in beta parameterization,
- the target power and type-I constraints,
- the chosen `sustain_n`,
- the selected sample size `Selected n`,
- the ROPE-based power and type-I error at that \(n\),
- and the equivalence decision region \([y_{\min}^{\mathrm{eq}}, y_{\max}^{\mathrm{eq}}]\),
  i.e. all responder counts \(y\) that lead to practical equivalence.

### Summarizing the design

We can summarize the calibration grid and the selected design via:

```{r, eval = FALSE}
summary(des_baseline)
```

The summary object (not shown here) contains:

- the selected row of the grid (with `n`, `y_eq_min`, `y_eq_max`, `power`, `type1`),
- the first and last 10 rows of the evaluated `n` values.

In particular:

- `y_eq_min` and `y_eq_max` are the smallest and largest responder counts
  for which the posterior ROPE probability exceeds `gamma_eq` and equivalence would be concluded;
- `power` is the ROPE-based Bayesian power under the equivalence design prior at that `n`;
- `type1` is the ROPE-based Bayesian type-I error under the non-equivalence design prior
  at that `n`.

These summaries allow you to inspect how power and type-I error evolve with increasing
sample size, and how the equivalence decision region moves.

### Plotting the design

The overview plot visualizes operating characteristics, priors, and a textual summary:

```{r echo = TRUE, eval = FALSE}
plot(des_baseline)
```
```{r, echo = FALSE, out.width = "100%", fig.align = "center", fig.cap = "Figure 5: Illustration of calibrated single-arm one-stage design of a ROPE-based clinical phase II trial with binary endpoint."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig5.png")
```
- The **upper left panel** shows ROPE-based power and type-I error as functions of \(n\),
  with horizontal lines at `target_power` and `target_type1`, and a vertical line at
  the selected `n`.
- The **upper right panel** displays a textual summary of the key inputs and outputs
  (priors, ROPE, thresholds, selected `n`, power, type-I, and equivalence region).
- The **lower left panel** displays the design priors under `H0`
  and `H1` overlaid: their beta densities highlight which response probabilities are
  regarded as typical under non-equivalence and equivalence, respectively.
- The **lower right panel** displays the analysis prior \(\mathrm{Beta}(a,b)\),
  which governs the posterior ROPE probabilities used in the decision rule.

You can also visualize only the operating characteristics or the decision region:

```{r echo = TRUE, eval = FALSE}
plot(des_baseline, what = "operating_characteristics")
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 6: Visualization of the operating characteristics of a calibrated single-arm one-stage design of a ROPE-based clinical phase II trial with binary endpoint."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig6.png")
```
```{r echo = TRUE, eval = FALSE}
plot(des_baseline, what = "decision_region")
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 7: Visualization of the equivalence region for increasing sample size of a calibrated single-arm one-stage design of a ROPE-based clinical phase II trial with binary endpoint."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig7.png")
```
The decision-region plot shows how the range of responder counts leading to equivalence
changes with `n`, providing intuition about how stringent the rule is at different
sample sizes.

## Example 1: Oncology phase II equivalence trial

In this section we illustrate a full ROPE-based design calibration in a setting
resembling a single-arm phase II oncology trial with a binary endpoint such as
objective response rate (ORR), compare @chenBayesianTwostageDesign2022, @kelterBayesianGroupSequentialPredictive2024 and @Lee2008. For definiteness, we assume:

- Historical control ORR \(p_0 = 0.25\) based on previous phase II data.
- The new treatment is considered *clinically non-inferior / equivalent* if its
  true ORR lies within ±12 percentage points of \(p_0\), that is,
  \(\mathcal{R}_p = [0.13, 0.37]\). This is a common margin in phase II oncology trials, compare @hashimSystematicReviewNoninferiority2021.
- We want a high probability to conclude practical equivalence when the true ORR
  is near 0.25, and a low probability to conclude equivalence when the true ORR
  is clearly better or worse than 0.25 (non-equivalence).

### Clinical hypotheses and ROPE

On the response-probability scale we set \(p_0 = 0.30\) and \(\delta = 0.12\).
The ROPE for equivalence is

\[
\mathcal{R}_p = [p_0 - \delta,\; p_0 + \delta] = [0.18, 0.42].
\]

We formulate the hypotheses as

\[
H_0:\; p \notin [\,p_0 - \delta,\; p_0 + \delta\,]
     \quad \text{(non-equivalence, clinically relevant difference)},
\]
\[
H_1:\; p \in [\,p_0 - \delta,\; p_0 + \delta\,]
     \quad \text{(practical equivalence)}.
\]

We adopt the following ROPE-based decision rule:

- **Accept equivalence** (\(H_1\)) if
  \(\Pr(p \in \mathcal{R}_p \mid y) \ge \gamma_{\mathrm{eq}}\).
- **Accept non-equivalence** (\(H_0\)) if
  \(\Pr(p \notin \mathcal{R}_p \mid y) \ge \gamma_{\mathrm{diff}}\).
- **Indecisive** otherwise.

For this example, we set \(\gamma_{\mathrm{eq}} = \gamma_{\mathrm{diff}} = 0.80\).

### Analysis and design priors

We separate the analysis prior from the design priors.

- **Analysis prior** for ORR:

  \[
  p \sim \mathrm{Beta}(1,1),
  \]
  
  a uniform prior on \((0,1)\), reflecting weak prior information.

- **Design prior under equivalence** \(H_1\):
  
  \[
  p \sim \mathrm{Beta}(a_1, b_1) = \mathrm{Beta}(36, 84),
  \]
  
  which has mean \(36 / (36 + 84) = 0.30\) and moderate concentration
  around \(p_0 = 0.30\). This prior represents plausible ORR values under
  practical equivalence.

- **Design prior under non-equivalence** \(H_0\):
  we consider superior scenarios where ORR is clinically higher than 0.42.
  For concreteness we choose
  
  \[
  p \sim \mathrm{Beta}(60, 40),
  \]
  
  which is centred at 0.6 and places most mass clearly outside the ROPE
  interval [0.18, 0.42]. This prior represents clinically relevant departures
  from equivalence (e.g. strong improvement), and is used to quantify
  ROPE-based type-I error for wrongly declaring equivalence in such
  scenarios.

These design priors induce beta–binomial predictive distributions for the
response count \(Y\) under \(H_1\) and \(H_0\), respectively.

Under the equivalence design prior \(\pi_1\), the ROPE-based Bayesian power is

\[
\text{Power}_\text{ROPE}(n)
= \Pr_{\pi_1}\bigl( \Pr(p \in \mathcal{R}_p \mid Y) \ge \gamma_{\mathrm{eq}} \bigr),
\]

and under the non-equivalence design prior \(\pi_0\), the ROPE-based Bayesian
type-I error is

\[
\alpha_\text{ROPE}(n)
= \Pr_{\pi_0}\bigl( \Pr(p \in \mathcal{R}_p \mid Y) \ge \gamma_{\mathrm{eq}} \bigr).
\]

### Calibration target

For this oncology-inspired example we consider the following calibration goals:

- ROPE-based power under \(H_1\) at least 80%:
  \(\text{Power}_\text{ROPE}(n) \ge 0.80\).
- ROPE-based type-I error under \(H_0\) at most 10%:
  \(\alpha_\text{ROPE}(n) \le 0.10\).
- A stability requirement `sustain_n = 10`, meaning that the criteria must hold
  for 10 consecutive sample sizes starting at the selected \(n^\ast\).
  This guards against local non-monotonicities in the discrete predictive
  curves.

We search over a one-stage sample size range of 20 to 200 patients.

```{r}
des_onc <- design_singlearm_onestage_rope(
  n_min = 20,
  n_max = 200,
  p0 = 0.30,
  delta = 0.12,
  gamma_eq = 0.80,

  # Analysis prior p ~ Beta(a, b)
  a = 1, b = 1,

  # Design priors under H0 and H1
  da0 = 60, db0 = 40,   # H0: non-equivalence, mean ~0.60
  da1 = 36, db1 = 84,   # H1: equivalence, mean ~0.3

  target_power = 0.80,
  target_type1 = 0.10,
  sustain_n = 10
)

des_onc
```

The printed output shows the selected sample size \(n^\ast\), ROPE-based power and type-I
error at that \(n^\ast\), and the equivalence decision region in terms of the
responder counts \(y\) that lead to practical equivalence.

```{r, eval = FALSE}
summary(des_onc)
```

The summary (not shown here) gives the first and last rows of the calibration grid, along with
the selected design point. These values can be reported, e.g. as a table listing \(n^\ast\), the ROPE region
\(\mathcal{R}_p = [0.18, 0.42]\), the decision thresholds
\(\gamma_{\mathrm{eq}}, \gamma_{\mathrm{diff}}\) and the resulting ROPE-based
power and type-I error. This is primarily helpful when analyzing a specific design or the relationship of the operating characteristics and the sample size.

### Visualization

We can inspect the operating characteristics and prior structure in more detail.

```{r echo = TRUE, eval = FALSE}
plot(des_onc)
```
```{r echo = FALSE, out.width = "100%", fig.align = "center", fig.cap = "Figure 8: Visualization of the calibrated ROPE-based oncology single-arm one-stage phase II design with binary endpoints."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig8.png")
```

- The upper-left panel shows ROPE-based power and type-I error as functions of \(n\).
- The upper-right panel summarizes the design numerically.
- The lower-left and middle panels overlay the design priors under \(H_0\) and \(H_1\).
- The lower-right panel shows the analysis prior.

For example, the equivalence design prior `Beta(36, 84)` reflects prior belief
that in realistic equivalence scenarios, the ORR is close to 30%, whereas the
non-equivalence design prior `Beta(60, 40)` reflects scenarios with substantially
higher ORR around 60%.

To see how the equivalence decision region changes with sample size, we can
plot the decision region directly:

```{r echo = TRUE, eval = FALSE}
plot(des_onc, what = "decision_region")
```
```{r echo = FALSE, out.width = "100%", fig.align = "center", fig.cap = "Figure 9: Visualization of the equivalence region of ROPE-based oncology single-arm one-stage phase II designs with binary endpoints for increasing sample size."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig9.png")
```

This plot shows, for each evaluated sample size \(n\), the range of responder counts
\(y\) that would lead the trial to conclude practical equivalence. For the
selected \(n^\ast\), this region is reported in the upper right panel of Figure 8: If 20 to 35 patients show a success in the phase II trial (out of \(n^\ast\)=94), then equivalence of the novel drug or treatment to the reference probability $p_0=0.30$ (of the standard of care) is established. Thus, we then accept $H_1:p \notin [\,p_0 - \delta,\; p_0 + \delta\,]$.

Figure 8 also shows that both the Bayesian power and type-I-error rate are calibrated.


## Example 2: Sensitivity analysis via grid exploration

Here we explore the impact of different design priors, ROPE half-widths $\delta$ and the posterior probability threshold $\gamma_{eq}$ for establishing equivalence.

```{r grid-exploration, message=FALSE, warning=FALSE, echo = FALSE}
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(knitr)

# Fixed setup: oncology-inspired equivalence example
n_min <- 10
n_max <- 250
p0    <- 0.30

# Analysis prior
a <- 1
b <- 1

# Design priors under H0 (non-equivalence) and H1 (equivalence)
da0 <- 60
db0 <- 40   # mean = 0.60

da1 <- 36
db1 <- 84   # mean = 0.30

# Calibration targets
target_power  <- 0.80
target_type1  <- 0.10
sustain_n     <- 10

# Grid for ROPE half-width and posterior threshold
delta_grid    <- c(0.10, 0.12, 0.15)   # 0.08 removed
gamma_eq_grid <- c(0.75, 0.80, 0.90)

grid <- expand.grid(
  delta    = delta_grid,
  gamma_eq = gamma_eq_grid,
  KEEP.OUT.ATTRS = FALSE,
  stringsAsFactors = FALSE
)

# Helper to extract a concise summary from the design object
extract_design_summary <- function(fit, delta, gamma_eq) {
  tibble(
    delta    = delta,
    gamma_eq = gamma_eq,
    n_star   = if (!is.null(fit$n_star)) fit$n_star else NA_real_,
    power_H1 = if (!is.null(fit$selected$power)) fit$selected$power else NA_real_,
    type1_H0 = if (!is.null(fit$selected$type1)) fit$selected$type1 else NA_real_
  )
}

# Wrapper to run the design calibration for one grid point
run_design_grid <- function(delta, gamma_eq) {
  fit <- design_singlearm_onestage_rope(
    n_min      = n_min,
    n_max      = n_max,
    p0         = p0,
    delta      = delta,
    gamma_eq   = gamma_eq,
    gamma_diff = gamma_eq,              # same threshold for non-equivalence
    direction  = "equivalence",
    a          = a,
    b          = b,
    da0        = da0,
    db0        = db0,
    da1        = da1,
    db1        = db1,
    calibration        = "Bayesian",
    dp                 = NULL,
    target_power       = target_power,
    target_type1       = target_type1,
    target_pce_h0      = NULL,
    target_freq_power  = NULL,
    target_freq_type1  = NULL,
    sustain_n          = sustain_n,
    return_grid        = TRUE
  )

  extract_design_summary(fit, delta, gamma_eq)
}

# Run the grid with the *updated* delta_grid and gamma_eq_grid
results_grid <- pmap_dfr(
  list(grid$delta, grid$gamma_eq),
  run_design_grid
) %>%
  arrange(delta, gamma_eq)

# Keep only rows where a feasible design was found
results_grid_feasible <- results_grid %>%
  filter(!is.na(n_star), !is.na(power_H1), !is.na(type1_H0))

# Inspect which combinations dropped out (for checking)
results_grid %>%
  mutate(feasible = !is.na(n_star)) %>%
  print()

# Table for the vignette / paper
kable(
  results_grid,
  digits = 3,
  caption = "Grid exploration for the oncology equivalence example: calibrated sample size n*, ROPE-based Bayesian power under H1, and ROPE-based Bayesian type-I error under H0 for different ROPE half-widths and posterior probability thresholds."
)
```
```{r echo = FALSE, eval = FALSE}
# Plot n* versus gamma_eq, stratified by delta (feasible designs only)
ggplot(
  results_grid_feasible,
  aes(x = gamma_eq, y = n_star, color = factor(delta), group = factor(delta))
) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  labs(
    x = expression(gamma[eq]),
    y = expression(n^"*"),
    color = expression(delta),
    title = "Calibrated sample size n* across ROPE widths and posterior thresholds"
  ) +
  theme_minimal(base_size = 12)
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 10: Calibrated sample size n* across ROPE widths and posterior thresholds for the oncology equivalence phase II trial."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig10.png")
```
```{r echo = FALSE, eval = FALSE}
# Plot type-I error versus gamma_eq, stratified by delta (feasible designs only)
ggplot(
  results_grid_feasible,
  aes(x = gamma_eq, y = type1_H0, color = factor(delta), group = factor(delta))
) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  geom_hline(
    yintercept = target_type1,
    linetype = "dashed",
    color = "grey40"
  ) +
  labs(
    x = expression(gamma[eq]),
    y = expression(alpha[H[0]](n^"*")),
    color = expression(delta),
    title = "ROPE-based Bayesian type-I error at the calibrated sample size"
  ) +
  theme_minimal(base_size = 12)
```
```{r echo = FALSE, out.width = "80%", fig.align = "center", fig.cap = "Figure 10: ROPE-based Bayesian type-I-error at the calibrated sample sizes for the oncology equivalence phase II trial for different ROPE widths."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig11.png")
```

---

## Example 3: revisiting the first example with PCE(H0) and frequentist power

Here we revisit the first example of the oncology trial, now adding a target constraint on the probability of compelling evidence for $H_0$ and also reporting frequentist power post-hoc for the resulting design:
```{r example-pce-freq, message=FALSE, warning=FALSE}
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(knitr)

# Oncology-inspired equivalence example: revisited
n_min <- 10
n_max <- 300
p0    <- 0.30

# ROPE and evidence thresholds
delta      <- 0.12
gamma_eq   <- 0.925
gamma_diff <- 0.90

# Analysis prior
a <- 1
b <- 1

# Design priors as in the first example
da0 <- 60
db0 <- 40   # non-equivalence prior (H0)

da1 <- 36
db1 <- 84   # equivalence prior (H1)

# Calibration targets
target_power      <- 0.80   # Bayesian predictive power under H1
target_type1      <- 0.10   # Bayesian predictive type-I error under H0
target_pce_h0     <- 0.80   # predictive compelling evidence for H0
target_freq_power <- 0.80   # frequentist power at dp (here dp = p0)
target_freq_type1 <- 0.10   # frequentist type-I error at ROPE boundaries

# Point alternative for frequentist power
dp <- p0

# Design calibration in "full" mode
fit_pce_freq <- design_singlearm_onestage_rope(
  n_min      = n_min,
  n_max      = n_max,
  p0         = p0,
  delta      = delta,
  gamma_eq   = gamma_eq,
  gamma_diff = gamma_diff,
  direction  = "equivalence",
  a          = a,
  b          = b,
  da0        = da0,
  db0        = db0,
  da1        = da1,
  db1        = db1,
  calibration        = "full",
  dp                 = dp,
  target_power       = target_power,
  target_type1       = target_type1,
  target_pce_h0      = target_pce_h0,
  target_freq_power  = target_freq_power,
  target_freq_type1  = target_freq_type1,
  sustain_n          = 10,
  return_grid        = TRUE
)

fit_pce_freq
```

You can summarise and visualise the calibrated design:
```{r echo = FALSE, eval = FALSE}
plot(fit_pce_freq)
```
```{r echo = FALSE, out.width = "100%", fig.align = "center", fig.cap = "Figure 12: Calibrated one-stage ROPE-based oncology equivalence phase II design with additional constraints on the probability of compelling evidence for the null hypothesis. In contrast to the earlier example, the probability of compelling evidence must reach 80% now, and frequentist power and type-I-error rate must also fulfill their respective target constraints of 80% and 10%."}
knitr::include_graphics("figures/singlearm-onestage-rope-fig12.png")
```

```{r example-pce-freq-summary, message=FALSE, warning=FALSE}
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
# Extract selected row and key operating characteristics
sel <- fit_pce_freq$selected

summary_tab <- tibble(
  quantity = c(
    "Selected sample size n*",
    "Bayesian power under H1 at n*",
    "Bayesian type-I error under H0 at n*",
    "PCE(H0) at n*",
    "Frequentist power at p = p0",
    "Frequentist type-I error (worst boundary)"
  ),
  value = c(
    fit_pce_freq$n_star,
    sel$power,
    sel$type1,
    sel$pce_h0,
    sel$freq_power,
    sel$freq_type1
  )
)

kable(
  summary_tab,
  digits = 3,
  col.names = c("Quantity", "Value"),
  caption = "Operating characteristics of the calibrated equivalence design with constraints on Bayesian power, Bayesian type-I error, PCE(H0), and frequentist power/type-I error."
)
```

Optionally, you can compare this design to the original first example (purely Bayesian calibration) by recomputing the first example and putting both designs side by side in a small table:

```{r example-pce-freq-comparison, message=FALSE, warning=FALSE}
des_onc_with_freq_power <- design_singlearm_onestage_rope(
  n_min = 20,
  n_max = 200,
  p0 = 0.30,
  delta = 0.12,
  gamma_eq = 0.80,
  
  # frequentist power at p = 0.3
  dp = 0.3,

  # Analysis prior p ~ Beta(a, b)
  a = 1, b = 1,

  # Design priors under H0 and H1
  da0 = 60, db0 = 40,   # H0: non-equivalence, mean ~0.60
  da1 = 36, db1 = 84,   # H1: equivalence, mean ~0.3

  target_power = 0.80,
  target_type1 = 0.10,
  target_freq_type1 = 0.10,
  target_freq_power = 0.80,
  sustain_n = 10,
  calibration = "Bayesian"
)

sel_orig <- des_onc_with_freq_power$selected
sel_new  <- fit_pce_freq$selected

comparison_tab <- tibble(
  design = c("Bayesian (original)", "Full (Bayes + frequentist + PCE(H0))"),
  n_star = c(sel_orig$n, fit_pce_freq$n),
  bayes_power = c(sel_orig$power, sel_new$power),
  bayes_type1 = c(sel_orig$type1, sel_new$type1),
  pce_h0 = c(sel_orig$pce_h0, sel_new$pce_h0),
  freq_power = c(sel_orig$freq_power, sel_new$freq_power),
  freq_type1 = c(sel_orig$freq_type1, sel_new$freq_type1)
)

kable(
  comparison_tab,
  digits = 3,
  caption = "Comparison of the original Bayesian calibration and the extended design with additional constraints on PCE(H0) and frequentist power/type-I error."
)
```

This third example stays within the equivalence framework but shows how the **same posterior-threshold decision rule** can be calibrated to satisfy additional Bayesian and frequentist criteria, including a lower bound on predictive compelling evidence for \(H_0\).

## Summary

This vignette has shown how `design_singlearm_onestage_rope()` can be used to:

- define a baseline ROPE-based equivalence design in a realistic phase II range,

- quantify how the evidence threshold `gamma_eq`, ROPE width `delta`, design priors, and sustain requirement influence the required sample size, power, and type-I error.

In practice, we recommend exploring such grids of tuning parameters collaboratively with clinicians, to arrive at a design where the ROPE region, evidence thresholds, and priors are all clinically interpretable and the resulting sample size is operationally feasible.

## References