Validate forecast performance over multiple time windows

This function performs rolling validation of short-term forecasts generated by EpiEstim or similar models. It divides the input time series into overlapping validation windows and repeatedly runs forecasts to assess model performance across different time segments.

Usage

generate_validation(
  data,
  start_date,
  validate_window_size = 7,
  window_size = 7,
  n_days = 7,
  type = NULL,
  smooth_data = FALSE,
  smoothing_cutoff = 10,
  seed = 123,
  ...
)

Arguments

data: A data frame containing at least the columns "date" and "confirm". The "date" column should be of class Date, and "confirm" should be numeric.
start_date: A Date (or date-convertible string) specifying the starting point for validation Must exist in the "date" column.
validate_window_size: Integer. The number of days between each validation window (default: 7).
window_size: Integer. The sliding window size (in days) used by the forecasting model (default: 7).
n_days: Integer. The number of future days to forecast in each validation iteration (default: 7).
type: character Type of epidemic. Must be one of "flu_a", "flu_b", "rsv", "sars_cov2", or "custom". Passed to fit_epiestim_model.
smooth_data: Logical. Whether to smooth the input case counts prior to forecasting (default: FALSE).
smoothing_cutoff: Numeric. Threshold used for smoothing when smooth_data = TRUE (default: 10).
seed: Integer or NULL Random seed used to ensure reproducibility of the forecast. If provided, the same input data will produce identical results across runs. If set to NULL, results may vary between runs due to stochastic sampling in reproduction number estimation and projection steps. Default is 123.
...: Additional arguments passed to generate_forecast().

Value

A list of forecast results, each element corresponding to one validation window. Each element contains the output returned by generate_forecast() for that particular window.

Details

The validation procedure ensures that forecasts are evaluated under realistic temporal conditions. Starting from the earliest date, the function repeatedly:

Takes a growing subset of data up to the current validation endpoint.
Runs the forecast using generate_forecast().
Moves the validation window forward by validate_window_size days.

This results in a set of forecasts that can be compared to observed data to evaluate predictive performance across time.

Examples

data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- as.Date("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type="flu_a")

Usage

Arguments

Value

Details

See also

Examples