Introduction to Time Series Analysis#

Introduction#

With the advent of big data, an increasing amount of information is being captured by organizations to optimize their operations and enhance their product offerings. Often, this data forms a time series - a sequence of data points indexed by time. Forecasting involves predicting future values given past observations of a time series. Common applications of time series forecasting include predicting future sales, patient outcomes, asset prices and resourcing needs.

Time Series with Missing Values

This two-part notebook series provides an overview machine learning for time series forecasting. Throughout the series, we will discuss exploratory analysis, data preprocessing, modelling and evaluation. In doing so, we will encounter several time series forecasting methods, from classical statistical approaches (ARIMA, Exponential Smoothing, etc.) to state-of-the-art neural network-based methods (NHITS, Temporal Fusion Transformer, etc.). To illustrate concepts, we will use Python packages such as pandas, statsmodels and darts. The outline for this notebook - Intro to Time Series is below:

Outline#

  • Intro to Time Series

  • Basic Time Series Operations

    • Missing Values

    • Resampling

    • Denoising

  • Time Series Analysis

    • Autocorrelation and Partial Autocorrelation

    • Cross Correlation

    • Time Series Decomposition

    • Stationairy Data

    • Making Data Stationairy

  • Conclusion

  • References

  • Next Steps

Environment Configuration and Imports#

IF USING COLAB: After installing the dependencies into the notebook you must restart the kernel. Simply, go under the Runtime tab and select Restart and Run All. This must be done only after the first time executing the notebook in a given session.

import sys
in_colab = 'google.colab' in sys.modules

if in_colab: 
    !pip install darts
    !pip install pyyaml==5.4.1
    !pip install -U matplotlib
%load_ext autoreload
%autoreload 2
%matplotlib inline

import pywt

import numpy as np
import pandas as pd 

import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

import matplotlib.pyplot as plt

from darts import TimeSeries
from darts.datasets import GasRateCO2Dataset
from darts.datasets import AirPassengersDataset
from darts.dataprocessing.transformers import MissingValuesFiller

Intro to Time Series#

Time Series data is a sequence of data points indexed by time. In the simplest case, each datapoint \(x_i\) is a scalar in whats referred to as a univariate time series. We can represent a univariate time series with N observations using a value vector \(\mathbf{x} \in R^N\) where the index \(i \dots N\) represent the time step:

(1)#\[\begin{align} \mathbf{x} = \begin{bmatrix} x_{1} \\ \vdots \\ x_{N} \end{bmatrix} \end{align}\]

In contrast, a multivariate time series has a vector \(X_ i \in R^{M}\) of observations at each time step. In this way, a multivariate time series is a set of time series over the same indices. It follows that multivariate time series are represented as matrices \(X \in R^{NxM}\) where the row index \(i \dots N\) is the time step and the column index \(j \dots M\) is the time series:

(2)#\[\begin{align} \mathbf{x} = \begin{bmatrix} x_{1, 1} & \cdots & x_{1, M}\\ \vdots & \cdots & \vdots\\ x_{N, 1} & \cdots & x_{N, M}\\ \end{bmatrix} \end{align}\]

Load Univariate Data#

To solidify this notion, we will use the Air Passenger Dataset which is a univariate time series that tracks the amount of commerical airplane passengers \(x_i\) by month \(i\). We can use the darts.datasets.AirPassengersDataset.load function to load the dataset:

ts = AirPassengersDataset().load()
df = ts.pd_dataframe()
df
component #Passengers
Month
1949-01-01 112.0
1949-02-01 118.0
1949-03-01 132.0
1949-04-01 129.0
1949-05-01 121.0
... ...
1960-08-01 606.0
1960-09-01 508.0
1960-10-01 461.0
1960-11-01 390.0
1960-12-01 432.0

144 rows × 1 columns

Generate Missing Value DF#

val_list = df["#Passengers"].tolist()
null_index_list = [i for i in range(len(df)) if i % 8 == 0]
mv_df = df.copy()
mv_df["#Passengers"][null_index_list] = np.nan
mv_df
component #Passengers
Month
1949-01-01 NaN
1949-02-01 118.0
1949-03-01 132.0
1949-04-01 129.0
1949-05-01 121.0
... ...
1960-08-01 606.0
1960-09-01 508.0
1960-10-01 461.0
1960-11-01 390.0
1960-12-01 432.0

144 rows × 1 columns

mv_ts = TimeSeries.from_dataframe(mv_df)
mv_ts.plot()
../_images/09acf6060fa3d11d02ab5373392f4a055eb3d4215238a01430ab70f9f054dbad.png

Time Series Operations#

Missing Values#

As evidenced by the above plot, there are missing values at some indexes \(i\) in the time series. In general, missing values \(x_i\) can be filled with a previous non-missing value \(x_i\) (forward fill) or a function (average, etc.) of previous non-missing values. It is important to note that missing values should always be filled using a function of previous values to avoid introducting a look-ahead bias. We can apply the pandas.DataFrame.ffill method to the underlying pandas.DataFrame in order to generate the forward filled series:

ffill_df = mv_df.ffill()
ts = TimeSeries.from_dataframe(ffill_df)
ts.plot()
../_images/180965a70e9bdfe38f0b52ea5ef3b311446367fcea378e7e8e853eded20e62d5.png

Resampling#

The temporal resolution is the frequency at which the observations in the time series occur. Common temporal resolutions include hourly, daily and weekly data. It is generally assumed the temporal resolution of the time series is constant across observations. We can use resampling to change the temporal cadence of the time series by using upsampling or downsampling.

Screen Shot 2021-09-28 at 5 41 20 PM

Downsampling: Decreases the frequency of observations by reducing the sampling rate of the input time series. This involves omitting intermediate values \(x_j\) that lie between points \(x_i\) and \(x_k\). More formally, the downsampling operation maps the original time series \(\mathbf{x} \in R^{N}\) to a lower resolution time series \(\mathbf{\hat{x}} \in R^{D}\).

We can downsample the monthly time series to quarterly using the darts.TimeSeries.resample method:

quarterly_series = ts.resample("3M")
quarterly_series.plot()
../_images/9be21bde0d6aea486d8d54fc806ed86aaf1cbe1979c92e69f2d1bae19fa2055a.png

Upsampling: Increases frequency of observations by genarating intermediate values \(x_j\) by interpolating/filling between adjacent time series values \(x_i\) and \(x_k\). More formally, the upsampling operation maps the original time series \(\mathbf{x} \in R^{D}\) to a higher resolution time series \(\mathbf{\hat{x}} \in R^{N}\).

Similar to downsampling, we can upsample the time series from monthly to weekly data using the darts.TimeSeries.resample method:

weekly_series = ts.resample("1W")
weekly_series.plot()
../_images/86033879f7f445b203b775629b68becae6a5b5f0a8e16b09b25b964d8662dc83.png
ds_df = df.resample("3M").mean()
print(ds_df.columns)
ds_df.index.names = ["Quarter"]
Index(['#Passengers'], dtype='object', name='component')
ds_ts = TimeSeries.from_dataframe(ds_df)
ds_ts.plot()
../_images/8c116a8006cee8608b07f6134481bd5fd5a4bcf4af7e5c6192f3917493e1de4b.png
us_df = ds_df.resample("1M").ffill() 
us_df.index.names = ["Day"]
us_ts = TimeSeries.from_dataframe(us_df)
us_ts.plot()
../_images/7bd87406c2cc2ca94d9a68f1c6d82dd63565e0a98ec2b84e27d1caaac354fbec.png

Denoising#

As a stochastic process, time series are vulnerable to noise. It is often helpful to denoise data - a process that involves filtering out noise to extract the underlying signal. There are many approaches to denoising time series data, we will explore just one - Average Smoothing.

Average Smoothing: Average smoothing involves iteratively computing the average over a sliding window of observations that is incremented by a stride across the entire time series. This computation can be realized using average pooling on input time series with padding to ensure the output time series has the same dimensionality:

(3)#\[\begin{align} \hat{x} = AvgPool(Padding(x)) \end{align}\]

We can write a custom function to perform the operation on a column of the pandas.DataFrame underlying the darts.TimeSeries object. A new time series can be initialized with the updated dataframe:

def average_smoothing(signal, kernel_size=3, stride=1):
    padding_size = kernel_size // 2
    padded_signal = np.pad(signal, padding_size)
    samples = []
    start = 0 
    while True:
        start = start
        end = start + kernel_size
        val = padded_signal[start:end].mean()
        samples.append(val)
        start += 1
        if end == len(padded_signal):
            break 

    res = np.array(samples)
    return res
# Get index and time series
np_ts = df["#Passengers"].to_numpy()
index = df.index

# Average Smoothing 
dn_as = average_smoothing(np_ts)
dn_as_df = pd.DataFrame()
dn_as_df.index = index 
dn_as_df["#Passengers"] = dn_as

# Plot new time series
dn_as_ts = TimeSeries.from_dataframe=(dn_as_df)
dn_as_ts.plot()
<AxesSubplot: xlabel='Month'>
../_images/6a66afc9d4e93a131085c769fd9ceb41375ca804222cad61304a837e97a1a42d.png

Time Series Analysis#

Time Series Decomposition#

In time series analysis, it is common to decompose a time series into multiple components representing different factors of variation. These include the trend, cyclical, seasonal and residual (irregular) components.

  • Trend: General tendency of the time series to increase or decrease over time.

  • Cyclical: Fluctuations in the values of the time series occuring at irregular intervals.

  • Periodic: Fluctuations in the values of time series occuring at regular intervals.

  • Residual: Fluctuations in the values of time series, which are a result of unforeseen and unpredictable forces.

The following figure illustrates the various components of a time series:

Time Series Decomposed

It is worth noting that the trend is often lumped together with an intercept term known as the level. Additionally, the cyclical component is usually not included unless it is clear that cycles exist in the time series (ex. economic related time series). With this assumption in place, we can formally specify the interaction of the sub-components of the time series. Two simple models for time series decomposition include additive decomposition and multiplicative decomposition.

Additive Decomposition: Assumes additive interaction between sub-components of time series. Additive decomposition is most appropriate in the case where the of the seasonal fluctuations are not propostional to the magnitude of the time series (ie trend + level). The additive decomposition can be formally specified as follows:

(4)#\[\begin{align} x_{i} = s_{i} + t_{i} + r_{i} \end{align}\]

where \(x_{i}\) is the time series value, \(s_{i}\) is the seasonal componemt, \(t_{i}\) is the trend compenent and \(r_i\) is the residual component, all at time \(i\).

In the case of the Air Passenger time series, the additive decomposition can be performed with the sm.tsa.seasonal_decompose function with model="additive"as follows:

# graphs to show seasonal_decompose
def seasonal_decompose(y, model):
    decomposition = sm.tsa.seasonal_decompose(y, model=model, extrapolate_trend='freq')
    fig = decomposition.plot()
    fig.set_size_inches(14,7)
    plt.show()
# Decompose Air Passenger additive
seasonal_decompose(df, model="additive")
../_images/fa97d555cb77eae3d779938edd75eaec24e3c34b737e7dea2d8a11e858ba3366.png

Multiplicative Decomposition: Assumes multiplicative interaction between sub-components of the time series. Multiplicative decomposition is most appropriate in the case where the magnitude of the seasonal fluctuations are proportional to the magnitude of the time series (ie trend + level). The multiplicative decomposition can be formally specified as follows:

(5)#\[\begin{align} x_{i} = s_{i} \times t_{i} \times r_{i} \end{align}\]

where \(x_{i}\) is the time series value, \(s_{i}\) is the seasonal componemt, \(t_{i}\) is the trend compenent and \(r_i\) is the residual component, all at time \(i\).

In the case of the Air Passenger time series, the multiplicative decomposition can be performed with the sm.tsa.seasonal_decompose function model="multiplicative" as follows:

# Decompose Ait Passenger Multiplicative 
seasonal_decompose(df, model="multiplicative")
../_images/2e19f3be5f5e8e43d98c7e6b4684b3a83a8bc138a6007cfee2a2ae89381d9e14.png

By looking at the plots of the residuals, it is evident that the multiplicative decomposition is more appropriate for this time series. This is due to the fact that the residuals approximate a normal distribution with mean 0. In the case of the additive decomposition, the residuals seem to have underlying patterns, indicating they are not exclusively noise.

Autocorrelation and Partial Autocorrelation#

One central component in time series analysis is discovering the relative dependencies among indices of the time series. To this end, autocorrelation and partial autocorrelation are two metrics that measure this.

Autocorrelation: Correlation of a signal with a lagged version of itself as a function of the delay. Given a time series \(\mathbf{^{t}x}\) and its \(k\) lagged series \(\mathbf{^{t-k}x}\), the autocorrelation \(a_{k}\) can be calculated as follows:

(6)#\[\begin{align} a_{k} = \frac{\mathbb{E}[(\mathbf{^{t}x} - \mu_{x})(\mathbf{^{t-k}x} - \mu_{x})]}{\sqrt{(\mathbb{E}[(\mathbf{^{t}x} - \mu_{x})^2])\cdot \mathbb{E}[(\mathbf{^{t-k}x} - \mu_{x})^2])}} = \ \frac{cov(\mathbf{^{t}x}, \mathbf{^{t-k}x})}{\sqrt{var(\mathbf{^{t}x}) \cdot var(\mathbf{^{t-k}x})}} \end{align}\]

where \(cov\) is a function that computes covariance and \(var\) is a function that computes the variance. By computing the \(A_k\) for all lags from \(1\) to \(K\), we generate a vector \(A \in R^K\) where each entry represents the autocorrelation for a specific lag. In the case of the Air Passenger Dataset, the autocorrelation with \(K=25\) can be plotted using the statsmodels.graphics.tsaplots.plot_acf function.

# Autocorrelation 
plot_acf(df['#Passengers'])
print()

../_images/5ea549c7a32bc24f2ebd3ca11203e2e29befa020e9fce0344eec79e376da33e6.png

The autocorrelation ranges from -1 (tend to move in the same direction) and 1 (tend to move in different directions). From the above plot, the lag series \(\mathbf{^{t-k}x}\) has the highest autocorrelation, 1, when \(k=0\). This makes sense intuitively because \(\mathbf{^{t}x}\) and \(\mathbf{^{t-k}x}\) are equivalent when \(k=0\). You will also notice a shaded area, which corresponds to the confidence interval. If an autocorrelation value exceeds the confidence interval region, the autocorrelation value is statistically significant.

Partial Autocorrelation: Correlation of a signal with a lagged version of itself as a function of the delay with the relationships of intervening observations removed. Using partial correlation helps derive the dependencies among time steps more accurately by controlling for the effects of other lags. Given a time seres \(\mathbf{^{t}x}\) and its \(k\) lagged series \(\mathbf{^{t-k}x}\), the partial autocorrelation \(p_{k}\) can be calculated as follows:

(7)#\[\begin{align} p_{k} = \frac{cov(\mathbf{^{t}x}, \mathbf{^{t-k}x}| \mathbf{^{t-1}x}, \cdots, \mathbf{^{t-k+1}x})}{\sqrt{var(\mathbf{^{t}x} | \mathbf{^{t-1}x}, \cdots, \mathbf{^{t-k+1}x}) \cdot var(\mathbf{^{t-k}x} | \mathbf{^{t-1}x}, \cdots, \mathbf{^{t-k+1}x})}} \end{align}\]

where \(cov\) is a function that computes covariance between \(\mathbf{^{t}x}\) and \(\mathbf{^{t-k}x}\) conditioned on intermediate values and \(var\) is a function that computes the variance of \(\mathbf{^{t}x}\) and \(\mathbf{^{t}x}\) conditioned on intermediate values. By computing the \(p_k\) for all lags from 1 to 𝐾 , we generate a vector \(\mathbf{p} \in 𝑅^𝐾\) where each entry represents the autocorrelation for a specific lag. In the case of the Air Passenger Dataset, the partial autocorrelation with 𝐾=25 can be plotted using the statsmodels.graphics.tsaplots.plot_pacf function:

# Partial Autocorrelation 
plot_pacf(df['#Passengers'])
print()
/ssd003/projects/aieng/public/forecasting_unified/lib/python3.8/site-packages/statsmodels/graphics/tsaplots.py:348: FutureWarning: The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
  warnings.warn(

../_images/03dbdc07a00a797e65e56f4aac811b044693294e6471117fd7eff52818533fa4.png

The partial autocorrelation ranges from -1 (tend to move in the same direction) and 1 (tend to move in different directions). In contrast to the autocorrelation plot, the partial autocorrelation plot highlights dependencies based solely on their contribution. You will also notice a shaded area, which corresponds to the confidence interval. If a partial autocorrelation value exceeds the confidence interval region, the partial autocorrelation value is statistically significant.

Cross Correlation#

Often, it is beneficial to compare multiple time series to discover what relationships exist. By generalizing the notion of autocorrelation, we are able to define cross correlation, which describes the relationship between a pair of time series at different lags. To account for the fact that we can compute lags on each time series independently, we have defined positive and negative lags to evaluate the correlation of the two time series. A positive lag in one time series is a negative lag in the other time series and vice versa.

To illustrate the concept of cross-correlation, we can compute an affine transformation on the AirPassenger time series and compute the cross-correlation between the two time series at different lags with a maximum lag of \( K = \pm 10 \). The function matplotlib.pyplot.xcorr is useful for generating a cross-correlation plot for a pair of time series:

passengers = df["#Passengers"]
passengers_shifted = passengers + 10

f, axarr = plt.subplots(1, 1, figsize=(10, 5))
axarr.xcorr(passengers, passengers_shifted, maxlags=10, normed=True)
axarr.set_title("Cross Correlation Plot")
axarr.set_xlabel("Lag")
axarr.set_ylabel("Correlation")
axarr.plot()
[]
../_images/b5a259e87350f322d2e46c1d8d1a173aeb04f458af50d79d5657e62c4f298a15.png

Stationarity#

Many techniques for time series forecasting assume that the underlying time series is stationary. A stationary time series has constant mean and variance over time. In many cases, time series are not stationary. As a quick check, we can determine if a given time series is stationary by visualizing the mean and variance as a function of time.

In the case of the Air Passenger time series, the visualization can be generated as follows:

### plot for Rolling Statistic for testing Stationarity
def test_stationarity(timeseries, title):
    
    #Determing rolling statistics
    rolmean = pd.Series(timeseries).rolling(window=12).mean() 
    rolstd = pd.Series(timeseries).rolling(window=12).std()
    
    fig, ax = plt.subplots(figsize=(16, 4))
    ax.plot(timeseries, label= title)
    ax.plot(rolmean, label='rolling mean');
    ax.plot(rolstd, label='rolling std (x10)');
    ax.legend()
test_stationarity(df["#Passengers"], "")
../_images/4cffb50b70a74040afa99e704b84340ee71fef0aad36db7cbbc2d5dfa3e291fb.png

Clearly, the time series is not stationary because both the standard deviation and the mean change across time. In some cases, it may not be apparent from such a visualization, so we can resort to statistical tests to determine whether or not the time series is stationairy. On such statistical test is the Augmented Dickey-Fuller Test, which can be implemented using the statsmodels.tsa.stattools.adfuller function as follows:

# Augmented Dickey-Fuller Test

def ADF_test(timeseries, dataDesc):
    print(' > Is the {} stationary ?'.format(dataDesc))
    dftest = adfuller(timeseries.dropna(), autolag='AIC')
    print('Test statistic = {:.3f}'.format(dftest[0]))
    print('P-value = {:.3f}'.format(dftest[1]))
    print('Critical values :')
    for k, v in dftest[4].items():
        print('\t{}: {} - The data is {} stationary with {}% confidence'.format(k, v, 'not' if v<dftest[0] else '', 100-int(k[:-1])))
ADF_test(df["#Passengers"], "Air Passenger Time Series")
 > Is the Air Passenger Time Series stationary ?
Test statistic = 0.815
P-value = 0.992
Critical values :
	1%: -3.4816817173418295 - The data is not stationary with 99% confidence
	5%: -2.8840418343195267 - The data is not stationary with 95% confidence
	10%: -2.578770059171598 - The data is not stationary with 90% confidence

Making Data Stationary#

Their are several methods to make time series data stationary. This is required in cases that you are employing a time series analysis method that assumes there is constant mean and covariance across the entire time series. The time series methods explored in this section are detrending and differencing.

Detrending: Involves subtracting out the underlying trend in the time series to generate the detrended series. The trend can be determined using an average smoothing or other methods. In the case of the Air Passenger Dataset, we can process the time series by detrending it. Subsequently, we can evaluate whether the detrended time series is stationary:

# Detrending
y = df["#Passengers"]
y_detrend =  (y - y.rolling(window=12).mean())/y.rolling(window=12).std()

test_stationarity(y_detrend,'de-trended data')
ADF_test(y_detrend,'de-trended data')
 > Is the de-trended data stationary ?
Test statistic = -2.481
P-value = 0.120
Critical values :
	1%: -3.4865346059036564 - The data is not stationary with 99% confidence
	5%: -2.8861509858476264 - The data is not stationary with 95% confidence
	10%: -2.579896092790057 - The data is not stationary with 90% confidence
../_images/4ef57a6b8f3b6cadce77b02909ddd9239eb89fde20409cf52daf8e139b2c42a7.png

Although the ADF test does not currently consider the time series to be stationary, it is a significant improvement over the original time series.

Differencing: Removes underlying cyclical and seasonal patterns by subtracting out a lagged value of the series. In the case of the Air Passenger Dataset, we can process the time series by differencing it. Subsequently, we can evaluate whether the differenced time series is stationary:

# Differencing 
y_12lag =  y - y.shift(12)
test_stationarity(y_12lag,'12 lag differenced data')
ADF_test(y_12lag,'12 lag differenced data')
 > Is the 12 lag differenced data stationary ?
Test statistic = -3.383
P-value = 0.012
Critical values :
	1%: -3.4816817173418295 - The data is not stationary with 99% confidence
	5%: -2.8840418343195267 - The data is  stationary with 95% confidence
	10%: -2.578770059171598 - The data is  stationary with 90% confidence
../_images/55697bac032094c2188fbd9522b579d3a62af5623c5097c82ef9496ef3b67305.png

By using differencing, the processed time series is considered stationary. Although for the highest confidence interval, it is not deemed stationary. To mitigate this, we can explore other methods for making the data stationary or combining differencing and detrending to hopefully yield more robust results.

# Detrending and Differencing 
y_12lag_detrend =  y_detrend - y_detrend.shift(12)
test_stationarity(y_12lag_detrend,'12 lag differenced de-trended data')
ADF_test(y_12lag_detrend,'12 lag differenced de-trended data')
 > Is the 12 lag differenced de-trended data stationary ?
Test statistic = -3.181
P-value = 0.021
Critical values :
	1%: -3.4924012594942333 - The data is not stationary with 99% confidence
	5%: -2.8886968193364835 - The data is  stationary with 95% confidence
	10%: -2.5812552709190673 - The data is  stationary with 90% confidence
../_images/7c3c72ed7967ae62cd67c37c2a7a7f5e7e02240e1f57f6bebd97cdcfda8bd951.png

Conclusion#

In this notebook, univariate time series analysis is explored. Topics we covered include:

  • Basic Time Series operations such as imputing missing values, resampling and denoising.

  • Discovering dependencies between steps of a time series with autocorrelation and partial autocorrelation.

  • Discovering dependencies between steps of a pair of time series with cross-correlation.

  • Decomposing a time series into trend, seasonal, cyclical and residual components

  • Determining if a series is stationary and methods for making data stationary.

Resources#

Next Steps#

  • Apply techniques for time series analysis to the dataset for your use case (or similar open source dataset)

  • Dive deeper into the topics we explored in the following notebook: TS-0: The Basics