Baselines, Prophet and NeuralProphet

Baselines, Prophet and NeuralProphet#

This notebook is the first of a series that introduces the application of popular, recently developed time series forecasting methods. In particular, we emphasize the use of consistent evaluation metrics and analysis across all models and model configurations.

Use these notebooks as tools to explore the application of various forecasting methods to multivariate time series datasets, and to inspire an experimental approach for comparing multiple models and model configurations.

This notebook explores the application of Prophet and NeuralProphet to exchange rate forecasting, as well as two baseline methods using sktime.

if 'google.colab' in str(get_ipython()):
    !pip install prophet
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes
    !pip install sktime

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from sktime.forecasting.naive import NaiveForecaster
from prophet import Prophet
from neuralprophet import NeuralProphet

Data Loading#

Load exchange rate data file#

The used dataset includes daily exchange rates between CAD and 12 other currencies between 2007 and 2017.

if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/content/drive')

# your Google Drive path may also begin with /content/drive/MyDrive/
data_filename = "/ssd003/projects/forecasting_bootcamp/bootcamp_datasets/boc_exchange/dataset.csv"
data_df = pd.read_csv(data_filename, index_col=0)
data_df.index = pd.to_datetime(data_df.index)
data_df = data_df.reset_index().rename({'index':'date'}, axis=1)
data_df

Split data according to use case#

For simplicity, this notebook uses a conventional training and testing split over the dataset. Other notebooks will give examples of rolling cross validation using multiple validation periods given by a set of cutoff dates.

The purpose of this notebook is to explore a simpler problem formulation using multiple models. The experiments and analysis can be easily adapted for rolling cross validation.

lag_time = 90
lead_time = 60

train_size = 0.8

train_df = data_df.iloc[:int(len(data_df)*train_size)]
test_df = data_df.iloc[int(len(data_df)*train_size):]

To ensure that we have enough data for testing, we need to withhold at least lag_time + lead_time observations from the dataset. Assuming we want to test a fitted model on all available examples in the test set, the number of testing examples can be computed as follows.

n_test_cases = len(test_df) - lag_time - lead_time + 1
print(f"   Timesteps in test_df: {len(test_df)}")
print(f"Number of test examples: {n_test_cases}")

Iterating over test examples#

To help with iterating over valid pairs of input and target data, we define a PyTorch-like dataset class. In this notebook, we’ll use this primarily for iterating over test examples, since both Prophet and NeuralProphet impose their own, special formats for passing in training data.

class ForecastingDataset:

    def __init__(self, data_df, lag_time, lead_time, feature_columns):
        self.n_examples = len(data_df) - lag_time - lead_time + 1
        assert self.n_examples > 0, "Dataset must contain at least one example."
        assert "date" in data_df.columns or "ds" in data_df.columns, "Source DataFrame must contain a date/ds column."

        self.df = data_df[feature_columns]
        if 'date' in data_df.columns:
            self.dates = data_df.date
        elif 'ds' in data_df.columns:
            self.dates = data_df.ds
        self.lag_time = lag_time
        self.lead_time = lead_time

    def __len__(self):
        return self.n_examples

    def __getitem__(self, idx):
        input = self.df.iloc[idx:idx+lag_time]
        output = self.df.iloc[idx+lag_time:idx+lag_time+lead_time]
        input_dates = self.dates[idx:idx+lag_time]
        output_dates = self.dates[idx+lag_time:idx+lag_time+lead_time]
        return input, output, input_dates, output_dates

Next, we instantiate an indexable test_dataset.

feature_columns = [col for col in test_df if col.endswith("_CLOSE")]
test_dataset = ForecastingDataset(test_df, lag_time, lead_time, feature_columns)

Evaluation Metrics#

In order to objectively compare the performance of this and other models on out-of-sample forecasting performance, we will need to collect output in a consistent format and apply a suite of standard evaluation metrics:

Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)

See the article Time Series Forecast Error Metrics You Should Know for an overview of these and other popular forecasting error metrics.

from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error

metrics = {
    'mse': mean_squared_error,
    'rmse': lambda y_true, y_pred: np.sqrt(mean_squared_error(y_true, y_pred)),
    'mae': mean_absolute_error,
    'mape': mean_absolute_percentage_error
}

def compute_error_statistics(error_metrics_dict, exp_name):
    return {
        'mean': pd.DataFrame(error_metrics_dict).mean(axis=0).rename(f'{exp_name}_mean_metrics'),
        'std': pd.DataFrame(error_metrics_dict).std(axis=0).rename(f'{exp_name}_std_metrics'),
        'max': pd.DataFrame(error_metrics_dict).max(axis=0).rename(f'{exp_name}_max_metrics'),
    }

Baseline Forecasts#

Let’s begin our experiments by producing forecasts using naïve estimators. A common baseline is persistence forecasting, where the forecast is simply an extension of the last known observation of the time series. A second baseline is the mean window forecast, where we take the mean over a window of observations and use this value for forecasts. The following code produces and collects the baseline forecasts into lists.

baseline_model_persistence = NaiveForecaster(strategy='last')
baseline_model_mean = NaiveForecaster(strategy='mean',window_length=lag_time)

forecasts_persistence = []
forecasts_mean = []

for i in range(len(test_dataset)):
    x, y, x_d, y_d = test_dataset[i]
    
    persistence_fc = baseline_model_persistence.fit_predict(x['USD_CLOSE'], fh=list(range(1, lead_time+1)))
    persistence_fc = pd.Series(persistence_fc.values, index=y_d)
    forecasts_persistence.append(persistence_fc)

    mean_fc = baseline_model_mean.fit_predict(x['USD_CLOSE'], fh=list(range(lead_time)))
    mean_fc = pd.Series(mean_fc.values, index=y_d)
    forecasts_mean.append(mean_fc)

    print(i, end='\r')

Compute error metrics over the baseline forecasts#

In this notebook, we want to compare the performance of experimental models (Prophet, NeuralProphet) compared to baselines (persistence and mean window extension). The following code applies each of the four evaluation metrics for every example in the test set.

def compute_baseline_error_metrics(forecasts, test_dataset):

    errors = {metric_name:[] for metric_name in metrics.keys()}

    for i in range(len(forecasts)):
        
        fc = forecasts[i]
        x, y, x_d, y_d = test_dataset[i]
    
        for metric_name, metric_fn in metrics.items(): 
                errors[metric_name].append(metric_fn(y_true=y['USD_CLOSE'], y_pred=fc))

    return errors, forecasts

persistence_errors, _ = compute_baseline_error_metrics(forecasts_persistence, test_dataset)
mean_errors, _ = compute_baseline_error_metrics(forecasts_mean, test_dataset)

The following code uses the function compute_error_statistics to reduce the mean evaluation metrics over the entire test set to three statistics (mean, standard deviation, and max).

persistence_stats = compute_error_statistics(persistence_errors, 'persistence')
persistence_stats['mean']

mean_window_stats = compute_error_statistics(mean_errors, 'mean_window')
mean_window_stats['mean']

We now collect the mean evaluation statistics for each metric into a DataFrame so that we can later compare these to experimental models.

results_df = pd.DataFrame(persistence_stats['mean']).T
results_df = results_df._append(mean_window_stats['mean'])
results_df

Visualizing forecasts over the test set#

For each example in the test set, we have produced a forecast between 1 and lead_time days into the future. As we will see later, this is difficult to visualize over the whole test set. Instead, we can visualize the value of each forecast at a single time step into the future. The code below visualizes the baseline forecasts at the maximum lead time. As we can see, the persistence forecast is exactly the ground truth shifted lead_time days into the future. In the context of exchange rate forecasting, this baseline may be difficult to beat.

Persistence Forecasts At Max Lead Time#

max_fcs = [{'date': fc.index[-1:][0], 'yhat':fc[-1:][0]} for fc in forecasts_persistence]
max_fcs = pd.DataFrame(max_fcs)

plt.figure(figsize=(12,3))
plt.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.plot(max_fcs.date, max_fcs.yhat, color='red', label='forecast')
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Persistence")
plt.legend(loc='upper right')

# Plot ground truth
plt.figure(figsize=(12,3))
ground_truth = test_df[['date', 'USD_CLOSE']]
plt.plot(ground_truth.date, ground_truth['USD_CLOSE'], label='ground truth')

# Plot example single forecast
plt.plot(forecasts_persistence[-1], label='forecast')
plt.legend()

Mean Window Forecasts At Max Lead Time#

max_fcs = [{'date': fc.index[-1:][0], 'yhat':fc[-1:][0]} for fc in forecasts_mean]
max_fcs = pd.DataFrame(max_fcs)

plt.figure(figsize=(12,3))
plt.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.plot(max_fcs.date, max_fcs.yhat, color='red', label='forecast')
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Mean Window")
plt.legend(loc='upper right')

# Plot ground truth
plt.figure(figsize=(12,3))
ground_truth = test_df[['date', 'USD_CLOSE']]
plt.plot(ground_truth.date, ground_truth['USD_CLOSE'], label='ground truth')

# Plot example single forecast
plt.plot(forecasts_mean[-1], label='forecast')
plt.legend()

Prophet#

Univariate forecasting that supports additional future regressors. Prophet does not support the inclusion of lagged regressors, i.e. it does not support the use of historical values of multiple series to predict a single target series. We include it as a baseline because it is popular, lightweight, interpretable, and performs very well in some domains.

Prophet is based on a Generalized Additive Model (GAM):

\( y(t) = g(t) + s(t) + h(t) + \epsilon_t\)

where \(y(t)\) is the target series, \(g(t)\) is the trend function, \(s(t)\) is the seasonality or periodic function, \(h(t)\) is a function reflecting holidays or other irregular events, and \(\epsilon_t\) is an error term that is assumed to be normally distributed.

Despite being formulated as an additive model, multiplicative interaction between seasonality and trend components is supported (using a log transform). In the implementation, this is easily configurable using a constructor parameter. See the documentation for more details.

Data Preparation#

Prophet, like most forecasting packages, imposes its own, specific format for input data. It expects inputs in the form of a Pandas DataFrame with two columns, ds and y, which correspond to Pandas-formatted timestamps and the target time series, respectively.

In this example, we create a Prophet DataFrame by selecting the columns date and USD_CLOSE from the Bank of Canada exchange rate dataset. We then rename those columns to ds and y, respectively.

Note that the ds column is already correctly formatted using the Pandas datetime format, since we converted it immediately after loading the data. When reading CSVs, always be sure to check that datestamps are properly formatted.

prophet_model_df = train_df[['date', 'USD_CLOSE']]
prophet_model_df = train_df.rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1)
prophet_model_df = prophet_model_df[['ds', 'y']]
prophet_model_df

Model Initialization and Fitting#

For our baseline model, we fit Prophet using its default configuration.

model = Prophet()
model = model.fit(prophet_model_df)

Produce Forecasts#

To produce a forecast using a fitted Prophet model, we need to pass it a dataframe with the desired timestamps in a column named ds. In the example below, we use the fitted model object to produce a dataframe future with dates that extend len(test_df) days beyond the training dates. Passing future to the fitted model’s predict function will return a dataframe populated with a detailed forecast, including model component values and confidence ranges.

Notice here that we are asking Prophet to produce a single forecast for the entire test period. We are doing this because Prophet does not support inference using fixed-sized inputs in the same way that every other technique considered in our bootcamp does.

future = model.make_future_dataframe(periods=len(test_df))
forecast = model.predict(future)

forecast.tail(5)

Plotting Prophet Forecasts#

The following code visualizes the application of the fitted Prophet model to both in-sample (training) and out-of-sample (testing) data. Visualization and evaluation of forecasting models using out-of-sample data is crucial for estimating future performance.

fig, ax = plt.subplots(figsize=(15, 6))

ax.fill_between(forecast.ds.iloc[:-len(test_df)], 
    forecast.yhat_lower.iloc[:-len(test_df)], 
    forecast.yhat_upper.iloc[:-len(test_df)],
    color='blue', label='In-Sample confidence interval (80%)', alpha=0.15)

ax.fill_between(forecast.ds.iloc[-len(test_df):], 
    forecast.yhat_lower.iloc[-len(test_df):], 
    forecast.yhat_upper.iloc[-len(test_df):],
    color='red', label='Out-of-Sample confidence interval (80%)', alpha=0.1)

ax.scatter(prophet_model_df.ds, prophet_model_df['y'], color='slategrey', s=3, linewidths=0, label='Train Samples')
ax.scatter(test_df.date, test_df['USD_CLOSE'], color='salmon', s=3, linewidths=0, label='Test Samples')

ax.plot(forecast.ds.iloc[:-len(test_df)], 
        forecast.yhat.iloc[:-len(test_df)], color='blue', label='In-Sample Forecast')

ax.plot(forecast.ds.iloc[-len(test_df):], forecast.yhat.iloc[-len(test_df):], 
        color='red', label='Out-of-Sample Forecast')

ax.legend(loc='upper left')
ax.grid(axis='y')
plt.show()

Prophet Forecasts At Max Lead Time#

As we did with the baseline methods, let’s visualize Prophet’s forecasts at maximum lead time.

# We can use our ForecastingDataset class to help with formatting Prophet's output.
forecast_eval_dataset = ForecastingDataset(forecast.iloc[-len(test_df):], lag_time, lead_time, ['yhat'])

fig, ax = plt.subplots(figsize=(9,4))

forecasts_at_max_lead = []
dates_at_max_lead = []

for i in range(len(forecast_eval_dataset)):
    x, y, x_d, y_d = forecast_eval_dataset[i]
    x_gt, y_gt, x_gt_d, y_gt_d = test_dataset[i]

    forecasts_at_max_lead.append(y.values[-1])
    dates_at_max_lead.append(y_d.values[-1])

ax.plot(dates_at_max_lead, forecasts_at_max_lead, color='red', label='forecast')
ax.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.legend()
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Prophet")
plt.show()

With the help of the ForecastingDataset class defined earlier, iterate over each forecast and ground truth pair, and compute and collect multiple evaluation metrics as defined in the previous cell.

def compute_error_metrics(ground_truth_dataset, forecast_dataset):
        
    errors = {metric_name:[] for metric_name in metrics.keys()}

    for i in range(len(forecast_dataset)):
        x, y, x_d, y_d = forecast_dataset[i]
        x_gt, y_gt, x_gt_d, y_gt_d = ground_truth_dataset[i]
        for metric_name, metric_fn in metrics.items(): 
            errors[metric_name].append(metric_fn(y_true=y_gt['USD_CLOSE'], y_pred=y))
    
    return errors

error_metrics = compute_error_metrics(test_dataset, forecast_eval_dataset)

prophet_stats = compute_error_statistics(error_metrics, 'prophet')
prophet_stats['mean']

Let’s now collect the mean evaluation metrics into a new DataFrame that we will use for comparative evalution against other models’ forecasts.

Please note that the comparison is not completely fair - Prophet has to predict 672 steps into the future at once, whereas our baselines only have to predict the next 60 days.

results_df = results_df._append(prophet_stats['mean'])
results_df.sort_values('mae')

NeuralProphet#

Let’s proceed to explore the NeuralProphet model. Please review the following resources to learn more:

In the words of its developers, NeuralProphet is “based on neural networks, inspired by Facebook Prophet and AR-Net, built on PyTorch”. A very important differentiating feature is that NeuralProphet conveniently supports lagged regressors. In the context of this running example, NeuralProphet supports the use of multiple other currencies’ time series. With this expanded flexibility, however, the model is more complex, with a greater number of design choices and hyperparameters to consider.

The official documentation on lagged regressors (lagged covariates) gives several examples for configuring NeuralProphet models to use lagged regressors, but commentary and suggestions on best practices are largely absent.

In the following code, we will consider a small number of NeuralProphet model configurations applied to the same forecasting task from above. Importantly, we retain the same train/test (in-sample/out-of-sample) split, and we will apply the same evaluation metrics to NeuralProphet’s forecasts.

Data Formatting#

NeuralProphet’s data format is very similar to Prophet’s. We prepare new DataFrames for training and evaluation.

np_train_df = train_df.reset_index().rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1).drop('index', axis=1)
np_test_df = test_df.reset_index().rename({'date':'ds','USD_CLOSE':'y'}, axis=1).drop('index', axis=1)

Of course the most important difference between the DataFrames prepared for Prophet and NeuralProphet is that, with NeuralProphet, we have the opportunity to include data about the non-target variables as lagged regressors.

np_train_df.head(5)

Baseline/Default Model#

A baseline NeuralProphet model with lagged regressors using default initialization parameters, except:

n_lags=lag_time, specifying that the autoregressive component of the model should use the past lag_time daily observations as inputs
n_forecasts=lead_time, specifying that our use case is to predict the target signal lead_time days into the future

NeuralProphet also allows you to specify a validation_df in fit(), on which the model will be evaluated every epoch. We are not using this feature here.

np_model = NeuralProphet(n_lags=lag_time, n_forecasts=lead_time)

# Add the non-target feature columns as lagged regressors
feature_cols = [col for col in np_train_df if col not in ('USD_CLOSE', 'ds', 'y')]
for feature in feature_cols:
    np_model.add_lagged_regressor(f'{feature}')
    
np_model.fit(np_train_df, freq='D')

After fitting, you can plot the learned model parameters, including the additional 11 lagged regressors (with 90 day lead time).

np_model.plot_parameters()

NeuralProphet, rather annoyingly, does not collect forecasts into a single yhat variable, but rather into separate stepXs for each of the lead times. For example, the following is a single 60-day forecast:

x, y, x_d, y_d = test_dataset[0]
x = x.reset_index().rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1).drop('index', axis=1)
x = x.assign(ds=x_d.reset_index().drop('index', axis=1).values)
y = y.reset_index().rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1).drop('index', axis=1)

np_future_df = np_model.make_future_dataframe(x, periods=len(y))
np_forecast = np_model.predict(np_future_df, decompose=False, raw=True)
np_forecast

To get a more useable data structure, the following function takes a NeuralProphet forecast dataframe and turns it into a time series of its predictions:

def yhat_from_neuralprophet_forecast(np_forecast, y_d):
    return pd.Series(np_forecast.T.iloc[1:].set_index(y_d).iloc[:,0], name='np_yhat').rename_axis('ds')

The forecast from above would now look this:

yhat_from_neuralprophet_forecast(np_forecast, y_d)

Since NeuralProphet uses a fixed-size input sequence (lagged observations) to produce forecasts, we iterate over the input sequences in the test set and use them as model inputs to produce forecasts. This mode of inference should be more familiar to machine learning practitioners than Prophet’s. Note that NeuralProphet requires us to first format input data using the make_future_dataframe function before running inference using the predict function. We define the following function, which produces forecasts for each of the input/ground-truth-output sequences in the test set.

def collect_np_forecasts(np_model, test_dataset):

    forecasts = []

    for i in range(len(test_dataset)):
        
        x, y, x_d, y_d = test_dataset[i]
        x = x.reset_index().rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1).drop('index', axis=1)
        x = x.assign(ds=x_d.reset_index().drop('index', axis=1).values)
        y = y.reset_index().rename({'date':'ds', 'USD_CLOSE':'y'}, axis=1).drop('index', axis=1)

        np_future_df = np_model.make_future_dataframe(x, periods=len(y))
        np_forecast = np_model.predict(np_future_df, decompose=False, raw=True)
        fc_series = yhat_from_neuralprophet_forecast(np_forecast, y_d)
        forecasts.append(fc_series)

    return forecasts

Similarly to what we defined for Prophet, we define the following function for computing and collecting evaluation metrics over all of the forecasts.

def compute_np_error_metrics(forecasts):

    errors = {metric_name:[] for metric_name in metrics.keys()}

    for i in range(len(forecasts)):
        
        fc = forecasts[i]
        gt = test_df.loc[test_df.date.isin(fc.index)].sort_values('date')  # Sorting because I am not 100% sure that the 'isin' function always preserves order.
        
        for metric_name, metric_fn in metrics.items(): 
                errors[metric_name].append(metric_fn(y_true=gt['USD_CLOSE'], y_pred=fc))

    return errors, forecasts

forecasts = collect_np_forecasts(np_model, test_dataset)
np_baseline_error_metrics, fcs = compute_np_error_metrics(forecasts)

Plot all forecasts#

We have the option to visualize complete forecasts at every time step, but it does not tell us much about the model’s performance.

fig, ax = plt.subplots(figsize=(15,6))

for i in range(len(forecasts)):

    fc = forecasts[i]
    gt = test_df.loc[test_df.date.isin(fc.index)]

    ax.plot(fc.index[:], fc[:], alpha=0.1, color='red')
    ax.plot(gt.date, gt['USD_CLOSE'], alpha=0.1, color='blue')
plt.title(f"Forecasts at all lead times (1 to {lead_time} samples)")
plt.show()

Plot all forecasts at max lead time#

max_fcs = [{'date': fc.index[-1:][0], 'yhat':fc[-1:][0]} for fc in forecasts]
max_fcs = pd.DataFrame(max_fcs)

plt.figure(figsize=(12,3))
plt.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.plot(max_fcs.date, max_fcs.yhat, color='red', label='forecast')
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Neural Prophet Default")
plt.legend(loc='upper right')

# Plot ground truth
plt.figure(figsize=(12,3))
ground_truth = test_df[['date', 'USD_CLOSE']]
plt.plot(ground_truth.date, ground_truth['USD_CLOSE'], label='ground truth')

# Plot example single forecast
plt.plot(forecasts[-1], label='forecast')
plt.legend()

Append evaluation metrics to `results_df`#

results_df = results_df._append(compute_error_statistics(np_baseline_error_metrics, 'neural_prophet_baseline')['mean'])
results_df.sort_values('mae')

Restricted model#

The baseline NeuralProphet model does not perform well on out-of-sample data. We can consider multiple changes to the model’s configuration and hyperparameters in pursuit of better performance. Let’s consider the following configuration that restricts the model to using only the last observed value of last regressors, as opposed to n_lags past observations. While less expressive, this model may be less prone to overfitting.

np_model_last_sample_only = NeuralProphet(n_lags=lag_time, n_forecasts=lead_time)

# Add the non-target feature columns as lagged regressors
feature_cols = [col for col in np_train_df if col not in ('USD_CLOSE', 'ds', 'y')]
for feature in feature_cols:
    np_model_last_sample_only.add_lagged_regressor(f'{feature}', n_lags=1)
    
np_model_last_sample_only.fit(np_train_df, freq='D')
forecasts = collect_np_forecasts(np_model_last_sample_only, test_dataset)
np_last_sample_only_error_metrics, fcs = compute_np_error_metrics(forecasts)

Once again, we are able to plot the learned parameters of the model. The lagged regressors are now grouped together in a single chart, as only one value of each is used.

np_model_last_sample_only.plot_parameters()

Plot forecasts at max lead time#

max_fcs = [{'date': fc.index[-1:][0], 'yhat':fc[-1:][0]} for fc in forecasts]
max_fcs = pd.DataFrame(max_fcs)

plt.figure(figsize=(12,3))
plt.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.plot(max_fcs.date, max_fcs.yhat, color='red', label='forecast')
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Neural Prophet 1-Step Lag")
plt.legend(loc='upper right')

# Plot ground truth
plt.figure(figsize=(12,3))
ground_truth = test_df[['date', 'USD_CLOSE']]
plt.plot(ground_truth.date, ground_truth['USD_CLOSE'], label='ground truth')

# Plot example single forecast
plt.plot(forecasts[-1], label='forecast')
plt.legend()

Append evaluation metrics to `results_df`#

results_df = results_df._append(compute_error_statistics(np_last_sample_only_error_metrics, 'neural_prophet_last_sample_only')['mean'])
results_df.sort_values('mae')

Model with Sparse Neural Autoregression#

In the previous parameter plots, you could see high values for all autoregressive features. You can tell NeuralProphet to try avoiding relying on them too much by restricting how many of them it is able to use. In this case, we set ar_reg to 10, which imposes a strong regularization of the AR model towards sparsity in its coefficients. We also reduce the AR depth to 10 days. Note: NeuralProphet applies this sparsity factor only to the regular AR coefficients, not the lagged regressor AR coefficients, where higher sparsity would make more sense.

We can also play around with parameters like the learning rate of the AR-Net. Another change applied to this model is the loss function, now MAE instead of the default Huber loss.

np_model_sparse_nar = NeuralProphet(n_lags=10, 
                                    n_forecasts=lead_time,
                                    ar_reg=10,
                                    learning_rate=5e-3,
                                    loss_func='MAE'
                                    )

# Add the non-target feature columns as lagged regressors
feature_cols = [col for col in np_train_df if col not in ('USD_CLOSE', 'ds', 'y')]
for feature in feature_cols:
    np_model_sparse_nar.add_lagged_regressor(f'{feature}', n_lags=1)
    
np_model_sparse_nar.fit(np_train_df, freq='D')
forecasts = collect_np_forecasts(np_model_sparse_nar, test_dataset)
np_sparse_ar_error_metrics, fcs = compute_np_error_metrics(forecasts)

np_model_sparse_nar.plot_parameters()

Plot forecasts at max lead time#

max_fcs = [{'date': fc.index[-1:][0], 'yhat':fc[-1:][0]} for fc in forecasts]
max_fcs = pd.DataFrame(max_fcs)

plt.figure(figsize=(12,3))
plt.plot(test_df.date, test_df['USD_CLOSE'], color='blue', label='ground truth')
plt.plot(max_fcs.date, max_fcs.yhat, color='red', label='forecast')
plt.title(f"Forecasts at max lead time ({lead_time} samples) - Neural Prophet Sparse")
plt.legend(loc='upper right')

# Plot ground truth
plt.figure(figsize=(12,3))
ground_truth = test_df[['date', 'USD_CLOSE']]
plt.plot(ground_truth.date, ground_truth['USD_CLOSE'], label='ground truth')

# Plot example single forecast
plt.plot(forecasts[-1], label='forecast')
plt.legend()

Append evaluation metrics to `results_df`#

results_df = results_df._append(compute_error_statistics(np_sparse_ar_error_metrics, 'neural_prophet_sparse_ar')['mean'])
results_df.sort_values('mae')

Reflections and Next Steps#

So far, the best performing ‘model’ is the persistence forecasting model. This is, of course, an unsatisfactory result. The best performing experimental model on the exchange rates dataset is the restricted NeuralProphet model that uses only the last observation of lagged regressors as features. Of course, we have only considered a very small number of configurations using NeuralProphet, many more model and hyperparameter configurations are possible. Please refer to the NeuralProphet documentation for detailed information. However, to find a better configuration may require significant effort, either manual or automated (via a hyperparameter search, for example). In practical forecasting use cases, it may be important to consider the time, resources, and effort that are needed to find a forecasting model that is better than baseline.

The following notebooks in this series will cover additional models (N-BEATS and DeepAR) as well as rolling cross validation using NeuralProphet. In order to compare the out-of-sample forecasts produced by this notebook to others, the results_df DataFrame is saved below. Hopefully we will find a model that performs better than baseline in a continued out-of-sample evaluation experiment!

Note The path here should be changed to your local drive folder in order for it to be retrieved in other notebooks. See the beginning of this notebook for context.

output_filename = "exchange_rate_mean_test_metrics.csv"
results_df.to_csv(output_filename)

Baselines, Prophet and NeuralProphet

Contents

Baselines, Prophet and NeuralProphet#

Data Loading#

Load exchange rate data file#

Split data according to use case#

Iterating over test examples#

Evaluation Metrics#

Baseline Forecasts#

Compute error metrics over the baseline forecasts#

Visualizing forecasts over the test set#

Persistence Forecasts At Max Lead Time#

Mean Window Forecasts At Max Lead Time#

Prophet#

Data Preparation#

Model Initialization and Fitting#

Produce Forecasts#

Plotting Prophet Forecasts#

Prophet Forecasts At Max Lead Time#

NeuralProphet#

Data Formatting#

Baseline/Default Model#

Plot all forecasts#

Plot all forecasts at max lead time#

Append evaluation metrics to results_df#

Restricted model#

Plot forecasts at max lead time#

Append evaluation metrics to results_df#

Model with Sparse Neural Autoregression#

Plot forecasts at max lead time#

Append evaluation metrics to results_df#

Reflections and Next Steps#

Append evaluation metrics to `results_df`#

Append evaluation metrics to `results_df`#

Append evaluation metrics to `results_df`#