Forecasting TSA Data with R

2024-10-10

If we want to create a forecast in R for the TSA Passenger data to predict future volumes, we can use the tidyverts packages fable, tsibble, and feasts.

These packages apply tidy data principles to time series data and forecasting. They were coauthored by the forecasting great Rob Hyndman. If you are interested in learning more about forecasting, I highly recommend Rob Hyndman and George Athanasopoulos's fantastic textbook: Forecasting: Principles and Practice (3rd Edition) (FPP3), which uses these tidyverts packages and workflows.

Plotting History

We have monthly TSA Passenger data from 2019 to September 2024, and it looks like this:


                    autoplot(df, passengers) + 
                        ggtitle("TSA Passenger History") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                
Line plot of TSA Passenger History.

We do have the data at a daily level, but for this exercise we're just forecasting monthly.

There is clear seasonality in the data, and there has been an upward trend coming out of the Covid downturn.


                    # Time series decomposition
                    dcmp <- df |>
                        model(stl = STL(passengers))
                    
                    components(dcmp) |>
                        autoplot() +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        )
                
Line plots decomposing TSA Passenger History time series into trend, seasonality, and remainder components.

Evaluating Models with a Train/Test Split

The fable package provides several time series models to choose from. We can evaluate which models perform the best by breaking our data into separate train/test sets. We fit each model on the training data, generate a forecast from each model over the test set time period, and evaluate each model's forecast against our test set to measure accuracy.

We'll start with four simple baseline models that allow us to compare our more complex models against simple but still effective baselines. We'll use the ARIMA function to find the best fitting ARIMA model, and we'll try four different ETS models. We know that trend and seasonality are likely to continue, so we will limit ourselves to just these ETS models, but you could try others. Alternatively, the ETS() function can select the best-fitting ETS model for you, just like the ARIMA() function.

We recently learned that our models will likely perform best if we ignore pre-Covid data and just assume history begins at April 2020. So we will use that in our model selection and final forecasts here.


                    # Forecast with fable
                    fit <- train |> model(
                        # Baseline models
                        mean = MEAN(passengers),
                        naive = NAIVE(passengers),
                        drift = NAIVE(passengers ~ drift()),
                        snaive = SNAIVE(passengers),
                        # tslm = TSLM(passengers ~ trend() + season()),
                        
                        # Auto-ARIMA and Auto-ETS models
                        arima = ARIMA(passengers, stepwise=FALSE, approximation=FALSE),
                        # ets = ETS(passengers),
                
                        # ETS models
                        ets_aaa = ETS(passengers ~ error('A') + trend('A') + season('A')),
                        ets_aada = ETS(passengers ~ error('A') + trend('Ad') + season('A')),
                        ets_aam = ETS(passengers ~ error('A') + trend('A') + season('M')),
                        ets_aadm = ETS(passengers ~ error('A') + trend('Ad') + season('M'))
                    )              
                
Terminal output of initial forecast model results.

Thankfully our ARIMA and ETS models do mostly outperform our baselines! The ets_aada and ets_aaa models perform the best. The models with multiplicative seasonality don't perform as well and actually underperform some of our baselines - they also seem to take much longer to fit and forecast. The arima model underperforms the seasonal naive baseline model, but outperforms the rest.

One nice feature of fable is that it makes it very easy to combine or ensemble multiple forecast models. Hyndman and Athanasopoulos describe the benefits of this as such:

"The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. In many cases one can make dramatic performance improvements by simply averaging the forecasts."

Based on our results, we'll try creating combined forecasts of the arima model with each of the two best performing ETS models: ets_aada and ets_aaa.


                    # Create combined models
                    fcst <- fit |> mutate(
                        arima_ets_aaa = (arima + ets_aaa) / 2,
                        arima_ets_aada = (arima + ets_aada) / 2,
                    ) |> forecast(h = months_to_forecast)
                    
                    # View model forecast accuracy against test set
                    accuracy(fcst, df) |>
                        arrange(RMSE)
                
Terminal output of forecast model results when adding in combination models.

After adding in our two combined models, the plain ETS models still perform the best. But the combination models are right behind them.

We can plot the forecasts from the ARIMA, combination, and top ETS models to see how they compare to the historical actuals in our test data:


                    autoplot(filter(fcst, .model %in% c(
                        'arima',
                        'ets_aaa',
                        'ets_aada',
                        'arima_ets_aaa',
                        'arima_ets_aada'
                    )), df, level = NULL) +
                        ggtitle("Top Forecast Models by RMSE on TSA Passenger Test Data") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                

The models are generally overshooting what has actually happened so far in 2024, which would be reasonable if the existing trend from 2021 to 2023 were to just continue. 2023 was the first year to fully recover back to 2019 levels, so 2024's growth is going to be more organic growth rather than recovery from Covid. Since the model only has data from 2020 through 2023 in its training set, it doesn't know that growth will steady in 2024 since we've hit pre-Covid levels.

Line plot showing performance of top forecast models against test set.

Based on these results, I would pick either the top-performing ets_aada model or the combined arima_ets_aada model. While the ets_aada model is more accurate compared to the test set right now, over the long term I would probably prefer an ensemble. As I've worked on this forecast over the past couple months, the top performing model can swap between ETS and ARIMA as we get new data each month. So long-term I'd prefer to use an ensemble model that takes both ARIMA and ETS approaches into account.

Evaluating Models with Cross-Validation

Another approach to time series model evaluation is cross-validation. This is a little different from the cross-validation you may have used in other machine learning contexts. For cross-validation with time series data, we start with an initial base set of only n historical data points, fit our models, calculate forecasts, and measure accuracy. We then repeat the process by adding one additional data point (or more) to the base set of historical data with each step, fit the models, forecast, and measure accuracy. In each iteration we're forecasting one or more steps out. Hyndman and Athanasopoulos use the visualization below in FPP3 to help explain the process. The blue dots are the base history we're using to fit our models in each iteration, and the orange dots are the future points we're trying to estimate:

Visual from Hyndman and Athanasopoulos showing how cross-validation works in time series contexts.

One neat benefit of this approach is that if we're forecasting more than one step ahead in each iteration we can see the average model accuracy across all of the CV sets at 1 to n steps out.


                    # Plot forecast accuracy by months out
                    accuracy(fcst |> group_by(h), df) |>
                        filter(!(.model %in% c(
                            'drift',
                            'mean',
                            'naive',
                            'snaive',
                            'tslm'
                        ))) |>
                        ggplot(mapping = aes(x = h, y = RMSE, color = .model)) +
                        geom_line() +
                        labs(
                            title = "TSA Passenger History",
                            x = "Forecast RMSE N Months Out"
                        ) +
                        scale_y_continuous(
                            name="RMSE",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        scale_x_continuous(
                            breaks = seq(1, 12)
                        ) +
                        theme(panel.grid.minor = element_blank())                        
                

As we'd expect, model accuracy gets worse the further out we try to forecast. We can see how the models compare at different future forecast steps in the chart below. The best performing model across the CV sets is the ETS(A,Ad,A) model, and the second best is the combination of ARIMA and ETS(A,Ad,A).

Line chart showing the forecast error over 12 months across multiple cross-validation iterations from the top models we're investigating.

The total model accuracy across all CV steps and forecast periods is below. It's also possible to calculate the accuracy just for n specific forecast steps out if needed.

Terminal output showing the total model accuracy across all CV steps and forecast periods.

Final Forecast

Based on what we've seen by evaluating different models with a train/test split and cross-validation, we know that the ARIMA and ETS(A,Ad,A) models tend to perform the best, along with the combination of the two. Ultimately, I would choose the combination of the two so that we can hopefully gain from the benefits of ensembling over the long term.

Let's perform a final forecast, fitting ARIMA and ETS(A,Ad,A) models on the full dataset and then combining them. The chart below shows the forecast from October 2024 to December 2025 for the three models. The arima forecast is slightly lower and the ets_aada forecast is slightly higher. The combined model, being the average of the two, runs through the middle. The 80% and 95% prediction intervals are shown by default in the autoplot() function in the tidyverts packages. Both ETS and ARIMA models ultimately give very similar forecasts, and it looks like they're accurately capturing reasonable trend and seasonality for the next 15 months.

Line chart showing the TSA Passenger history data with our final forecasts (plus prediction intervals) from the ARIMA, ETS, and combination models.

Finally, let's plot just the point forecast of the final chosen ensemble model with the full history.

Line chart showing the TSA Passenger history data plus the point forecast from the chosen ensemble model for the rest of 2024 and 2025.

Conclusions

In this case, we're forecasting through the end of this year and through all of next year. Let's look at what the forecast is telling us the total annual volume and growth will be in TSA passengers for this year and next.


                    # Calculate annual volume and percent change in TSA Passengers
                    df |>
                        as_tibble() |>
                        mutate(Year = year(date)) |>
                        group_by(Year) |>
                        summarize(passengers = sum(passengers)) |>
                        arrange(Year) |>
                        mutate(pct_chg = percent((passengers / lag(passengers) - 1)))
                
Terminal output showing the annual volume and percent change in TSA Passenger volume. This is based on historical data plus the forecast we've created for the end of 2024 and all of 2025.

The forecast is showing 2024 will finish the year up +4.9% and 2025 will grow +3.8%, which sounds reasonable.

Full Code


                # Forecasting TSA Passenger data with R ----------------------------------------

                # Standard packages
                library(readr)
                library(dplyr)
                library(lubridate)
                library(tidyr)
                library(ggplot2)
                library(scales)
                library(forcats)
                library(gt)
                
                # Tidy time series packages
                library(tsibble)
                library(fable)
                library(feasts)
                
                # Read in and prepare data. Plot full history.
                prepare_data <- function(){
                    # Read in data
                    df <- read_csv('tsa.csv')
                    tail(df)
                    
                    # Sum passengers by month
                    df <- df |> 
                        group_by(date = floor_date(date, "month")) |>
                        summarise(passengers = sum(passengers)) |>
                        # Exclude any extra dates in the current month where we don't have a full month of data
                        filter(date < floor_date(today(), "month"))
                
                    # Convert data to tsibble
                    df <- df |> mutate(date = yearmonth(date))
                    df <- df |> as_tsibble(index = date)
                
                    # Plot data and return tsibble dataframe
                    autoplot(df, passengers) + 
                        ggtitle("TSA Passenger History") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                    ggsave("images/tsa-passenger-history.png", width=16.18, height=10, units='cm')
                    return(df)
                }
                
                
                # Time series decomposition
                stl_decomposition <- function(df){
                    dcmp <- df |>
                        model(stl = STL(passengers))
                    components(dcmp) |>
                        autoplot() +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        )
                    ggsave("images/time-series-decomposition.png", width=16.18, height=10, units='cm')
                }
                
                
                # Compare models using train test split
                train_test_models <- function(df){
                    # The models will be more accurate if we ignore pre-Covid
                    df <- df |> filter_index("2020-04" ~ .)
                
                    # Split data into train/test sets
                    size_of_data <- nrow(df)
                    size_of_test <- round(size_of_data * 0.15)
                    train <- df |> slice_head(n = size_of_data - size_of_test)
                    test <- df |> slice_tail(n = size_of_test)
                    months_to_forecast <- nrow(test)
                
                    # Forecast with fable
                    fit <- train |> model(
                        # Baseline models
                        mean = MEAN(passengers),
                        naive = NAIVE(passengers),
                        drift = NAIVE(passengers ~ drift()),
                        snaive = SNAIVE(passengers),
                        # tslm = TSLM(passengers ~ trend() + season()),
                        
                        # Auto-ARIMA and Auto-ETS models
                        arima = ARIMA(passengers, stepwise=FALSE, approximation=FALSE),
                        # ets = ETS(passengers),
                
                        # ETS models
                        ets_aaa = ETS(passengers ~ error('A') + trend('A') + season('A')),
                        ets_aada = ETS(passengers ~ error('A') + trend('Ad') + season('A')),
                        ets_aam = ETS(passengers ~ error('A') + trend('A') + season('M')),
                        ets_aadm = ETS(passengers ~ error('A') + trend('Ad') + season('M'))
                    )
                    # Generate forecasts
                    fcst <- fit |> forecast(h = months_to_forecast)
                    
                    # View model forecast accuracy against test set
                    accuracy(fcst, df) |>
                        arrange(RMSE) |>
                        print()
                
                    # Create combined models
                    fcst <- fit |> mutate(
                        arima_ets_aaa = (arima + ets_aaa) / 2,
                        arima_ets_aada = (arima + ets_aada) / 2,
                    ) |> forecast(h = months_to_forecast)
                        
                    # View model forecast accuracy against test set
                    accuracy(fcst, df) |>
                        arrange(RMSE) |>
                        print()
                    
                    # Plot results
                    autoplot(filter(fcst, .model %in% c(
                        'arima',
                        'ets_aaa',
                        'ets_aada',
                        'arima_ets_aaa',
                        'arima_ets_aada'
                    )), df, level = NULL) +
                        ggtitle("Top Forecast Models by RMSE on TSA Passenger Test Data") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                    ggsave("images/train-test-top-forecast-models.png", width=16.18, height=10, units='cm')
                }
                
                
                # Compare models using cross-validation
                perform_cross_validation <- function(df){
                    # Create cross-validation dataset
                    df_tr <- df |>
                        filter_index("2020-04" ~ .) |>
                        # We need at least two years of data in order to fit the seasonal ETS models
                        # This starts our base data with 24 months and increases each iteration by 1 month
                        stretch_tsibble(.init=24, .step=1) |>
                        # We want to forecast 12 months out in each iteration, so the last test set
                        # can only have data through August 2023. This removes the last 12 monthly test sets.
                        filter(!(.id %in% tail(unique(.id), 12)))
                    
                    # Fit models to CV dataset
                    fit <- df_tr |> model(
                        # Baseline models
                        mean = MEAN(passengers),
                        naive = NAIVE(passengers),
                        drift = NAIVE(passengers ~ drift()),
                        snaive = SNAIVE(passengers),
                        # tslm = TSLM(passengers ~ trend() + season()),
                        
                        # Auto-ARIMA and Auto-ETS models
                        arima = ARIMA(passengers, stepwise=FALSE, approximation=FALSE),
                        # ets = ETS(passengers),
                
                        ets_aaa = ETS(passengers ~ error('A') + trend('A') + season('A')),
                        ets_aada = ETS(passengers ~ error('A') + trend('Ad') + season('A')),   
                    
                    # Add combined models
                    ) |> mutate(
                        arima_ets_aaa = (arima + ets_aaa) / 2,
                        arima_ets_aada = (arima + ets_aada) / 2,
                    )
                
                    # Forecast 12 months ahead for each iteration
                    fcst <- fit |>
                        forecast(h = 12)|>
                        group_by(.id, .model) |>
                        mutate(h = row_number()) |>
                        ungroup() |>
                        # Need to convert back to a fable (forecast table) to use in the accuracy() function next
                        as_fable(response = "passengers", distribution = passengers)
                    
                    # Print model accuracy
                    accuracy(fcst, df) |> arrange(RMSE) |> print()
                    
                    # Plot forecast accuracy by months out
                    accuracy(fcst |> group_by(h), df) |>
                        filter(!(.model %in% c(
                            'drift',
                            'mean',
                            'naive',
                            'snaive',
                            'tslm'
                        ))) |>
                        ggplot(mapping = aes(x = h, y = RMSE, color = .model)) +
                        geom_line() +
                        labs(
                            title = "TSA Passenger History",
                            x = "Forecast RMSE N Months Out"
                        ) +
                        scale_y_continuous(
                            name="RMSE",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        scale_x_continuous(
                            breaks = seq(1, 12)
                        ) +
                        theme(panel.grid.minor = element_blank())
                        
                    ggsave("images/cross-validation-forecast-error.png", width=16.18, height=10, units='cm')
                
                }
                
                
                # Forecast data using chosen model
                final_forecast <- function(df){
                    # Forecast data using chosen models
                    # The models will likely be more accurate if we ignore pre-Covid
                    df_hist <- df
                    df <- df |> filter_index("2020-04" ~ .)
                
                    # Find number of months to forecast if we want a forecast through 2025
                    months_to_forecast <- lubridate::interval(
                        ym(max(df$date)) + months(1), ymd('2026-01-01')
                    ) %/% months(1)
                            
                    # Fit models
                    fit <- df |> model(
                        arima = ARIMA(passengers, stepwise=FALSE, approximation=FALSE),
                        ets_aada = ETS(passengers ~ error('A') + trend('Ad') + season('A')),
                    ) |> mutate(
                        arima_ets_aada = (arima + ets_aada) / 2
                    )
                    
                    # Forecast
                    fcst <- fit |> forecast(h = months_to_forecast)
                    
                    # Plot all three forecasts with prediction intervals
                    autoplot(fcst, df) +
                        ggtitle("TSA Passenger Forecast | ARIMA, ETS(A,Ad,A), and Combination") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                    ggsave("images/final-forecast-three-models.png", width=16.18, height=10, units='cm')
                        
                    # Plot final combination model point forecast
                    fcst <- fcst |> filter(.model == 'arima_ets_aada')
                    autoplot(fcst, df_hist, level = NULL) +
                        ggtitle("TSA Passenger Forecast | Combination ARIMA + ETS(A,Ad,A) Model") +
                        scale_y_continuous(
                            name="Passengers",
                            labels=label_number(scale_cut = cut_short_scale())
                        ) +
                        theme(axis.title.x = element_blank())
                    ggsave("images/final-forecast.png", width=16.18, height=10, units='cm')
                        
                    return(fcst)
                }
                
                
                # Calculate annual volume and percent change
                calculate_annual_change <- function(df, fcst){
                    fcst <- fcst |>
                        as_tibble() |>
                        select(date, .mean) |>
                        rename(passengers = .mean)
                
                    df <- bind_rows(df, fcst)
                
                    # Calculate annual volume and percent change in TSA Passengers
                    df |>
                        as_tibble() |>
                        mutate(Year = year(date)) |>
                        group_by(Year) |>
                        summarize(passengers = sum(passengers)) |>
                        arrange(Year) |>
                        mutate(pct_chg = percent((passengers / lag(passengers) - 1))) |>
                        print()
                }
                
                
                # Run code
                df <- prepare_data()
                stl_decomposition(df)
                train_test_models(df)
                perform_cross_validation(df)
                fcst <- final_forecast(df)
                calculate_annual_change(df, fcst)