Validate forecast performance over multiple time windows
Source:R/validation_functions.R
generate_validation.RdThis function performs rolling validation of short-term forecasts generated by EpiEstim or similar models. It divides the input time series into overlapping validation windows and repeatedly runs forecasts to assess model performance across different time segments.
Usage
generate_validation(
data,
start_date,
validate_window_size = 7,
window_size = 7,
n_days = 7,
type = NULL,
smooth_data = FALSE,
smoothing_cutoff = 10,
...
)Arguments
- data
A data frame containing at least the columns
"date"and"confirm". The"date"column should be of classDate, and"confirm"should be numeric.- start_date
A
Date(or date-convertible string) specifying the starting point for validation Must exist in the"date"column.- validate_window_size
Integer. The number of days between each validation window (default:
7).- window_size
Integer. The sliding window size (in days) used by the forecasting model (default:
7).- n_days
Integer. The number of future days to forecast in each validation iteration (default:
7).- type
character Type of epidemic. Must be one of
"flu_a","flu_b","rsv","sars_cov2", or"custom". Passed tofit_epiestim_model.- smooth_data
Logical. Whether to smooth the input case counts prior to forecasting (default:
FALSE).- smoothing_cutoff
Numeric. Threshold used for smoothing when
smooth_data = TRUE(default:10).- ...
Additional arguments passed to
generate_forecast().
Value
A list of forecast results, each element corresponding to one
validation window. Each element contains the output returned by
generate_forecast() for that particular window.
Details
The validation procedure ensures that forecasts are evaluated under realistic temporal conditions. Starting from the earliest date, the function repeatedly:
Takes a growing subset of data up to the current validation endpoint.
Runs the forecast using
generate_forecast().Moves the validation window forward by
validate_window_sizedays.
This results in a set of forecasts that can be compared to observed data to evaluate predictive performance across time.
Examples
data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- as.Date("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type="flu_a")