Skip to contents

This function performs rolling validation of short-term forecasts generated by EpiEstim or similar models. It divides the input time series into overlapping validation windows and repeatedly runs forecasts to assess model performance across different time segments.

Usage

generate_validation(
  data,
  start_date,
  validate_window_size = 7,
  window_size = 7,
  n_days = 7,
  type = NULL,
  smooth_data = FALSE,
  smoothing_cutoff = 10,
  ...
)

Arguments

data

A data frame containing at least the columns "date" and "confirm". The "date" column should be of class Date, and "confirm" should be numeric.

start_date

A Date (or date-convertible string) specifying the starting point for validation Must exist in the "date" column.

validate_window_size

Integer. The number of days between each validation window (default: 7).

window_size

Integer. The sliding window size (in days) used by the forecasting model (default: 7).

n_days

Integer. The number of future days to forecast in each validation iteration (default: 7).

type

character Type of epidemic. Must be one of "flu_a", "flu_b", "rsv", "sars_cov2", or "custom". Passed to fit_epiestim_model.

smooth_data

Logical. Whether to smooth the input case counts prior to forecasting (default: FALSE).

smoothing_cutoff

Numeric. Threshold used for smoothing when smooth_data = TRUE (default: 10).

...

Additional arguments passed to generate_forecast().

Value

A list of forecast results, each element corresponding to one validation window. Each element contains the output returned by generate_forecast() for that particular window.

Details

The validation procedure ensures that forecasts are evaluated under realistic temporal conditions. Starting from the earliest date, the function repeatedly:

  1. Takes a growing subset of data up to the current validation endpoint.

  2. Runs the forecast using generate_forecast().

  3. Moves the validation window forward by validate_window_size days.

This results in a set of forecasts that can be compared to observed data to evaluate predictive performance across time.

Examples

data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- as.Date("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type="flu_a")