Chapter 4 - ARMA Models

Chapter 4 - ARMA Models


Previous chapter →
Chapter 3 - Regression Models
Chapter 3 - Regression Models

The previous chapter introduced what a model is and how it can be used in time series analysis. Next, the simple linear regression model was described, which is a particularly helpful model when dealing with trends. Furthermore, it is efficient, simple and easily interpretable, making it an ideal first choice. However, the linear regression model does come with certain disadvantages such as the linearity assumption and does not take autocorrelation into account. Autoregressive Moving Average (ARMA) models are more powerful and, in the context of time series analysis, they can be used to understand and predict data that varies over time (i.e. data that does not exhibit a trend per se).
 
At their core, ARMA models can help to capture and model the dependencies in a time series by considering both past values of the series (i.e. autocorrelation) and past random errors (often called shocks) that might have influenced its behavior. This dual approach makes ARMA models especially effective for capturing patterns in time-dependent data where simple trend lines from linear regression models or averages fall short. This can be easily linked to real-world applications, where many types of time series such as stock prices, weather patterns, economic indicators and social media activity exhibit behavior that isn’t purely random, but is also not predictable. The patterns found in these applications often have both a persistent trend (e.g. temperatures over seasons or stock market trends over years) and random fluctuations (such as day-to-day weather changed or daily trading noise).
 
ARMA models allow to break down time series into two components; the predictable and seemingly random component. This opens the possibility to create models that balance historical dependencies (autocorrelation) and unpredictable shifts (noise). This makes these types of models both flexible and simplistic. The focus on solely autocorrelation and noise allows them to adapt to a wide variety of time series patterns without the need of excessive complexities.
 

4.1 White Noise & Random Walks

In chapter one, a stochastic process was described as a sequence of stochastic variables. In turn, a time series was then described as observing the sequence from some up to . Similarly, a white noise process can be described. In what follows, such a process will be denoted by .
page icon
Def 4.1 - White Noise Process
A white noise process is a sequence of i.i.d. observations with zero mean and variance .
In this way, a white noise process can be seen as a simple type of time series that consists of only random values that adhere to specific properties (i.e. zero mean and fixed variance) and where each value is independent of others. One important remark is that due to these properties, at any given point in time, a value from a white noise series is completely unpredictable, based on previous values, which makes it a purely random process.
 
The white noise process or just white noise in general, is essential for time series analysis since it acts as a building block for more complicated models. Many time series models, including ARMA models, rely on this white noise to capture the random fluctuation aspect that can not be explained by underlying patterns or trends. For example, in a random walk (without drift), the model can be defined using the first difference operator:
Understanding white noise creates a baseline for randomness. More sophisticated models can be introduced that exhibit structured dependencies and patterns, overlaying on top of the random noise to capture meaningful trends, cycles or seasonal effects in the data. In general, it can be said that white noise serves as a fundamental concept, creating contrast between randomness and predictability in time series analysis.
page icon
Example 4.1 - White Noise & Autocorrelation
The following image illustrates a white noise process and its autocorrelation function. The ACF of a white noise process is relatively straightforward.
  • At a lag of , the autocorrelation is 1, since each data point perfectly correlates with itself.
  • For all other lags, the ACF is approximately zero, since each value within the white noise process is independent of the others. This means there is no correlation between values at different time points.
notion image
page icon
Example 4.2 - Random Walk & Autocorrelation
The following image illustrates a random walk defined by the following model:
where is the previous value in the series and is the white noise value. Unlike plain white noise, which has no memory of past values, each step in a random walk depends directly on the prior value. This makes the walk non-stationary (i.e. its mean and variance change over time). The ACF for a random walk is distinctive because it shows both strong and persistent correlation at all positive lags.
  • At low lags, there is a high autocorrelation. Since each value in a random walk is closely related to the one just before it, the ACF at low lags is high and often close to one.
  • There is a gradual decay in the ACF. This pattern reflects the fact that a random walk has a “memory” of all past values, as each step represent the cumulative sum of all previous noise terms.
notion image

4.2 Moving Average (MA) Models

Before moving average (MA) models can be described, it is important to recall the concepts of both stationarity and a stochastic process. A stochastic process was described in the last section. Stationarity however required the following conditions to be true over a stochastic process:
  • The mean must be the same for all
  • The variance must be the same for all
  • The covariance must be the same for all and
page icon
Def 4.2 - Moving Average of Order 1
A stationary stochastic process is a moving average of order 1 (denoted as ) if it satisfies:
where and are unknown parameters.
This is a time series model where the current value in the series is the weighed average between the current and the previous value of a random shock or noise. In a process, each value in the series depends on the noise at the current time step and the previous time step. The noise is assumed to be white noise (i.e. i.i.d. with mean 0 and constant variance . The (unknown) parameter controls how much influence the previous shock has on the current value. Note that, for and , the result is plain white noise as described in the previous section.
 
Autocorrelations of
The autocorrelations of an process are given by
Proof Of Autocorrelations for
  1. Given is the model equation of an process
  1. Variance is described as where . The constant does not contribute to variance so it can be ignored.
  1. The rules of variance to the sum of random variables can be applied. Since and are independent, their covariance is 0 and thus
  1. Next, the following variance properties can be used:
      • , since this is a property of white noise
      which leads to
  1. The covariance of order one is given by
    1. which can be expanded to
  1. There only exists a dependence in all these terms between and itself, meaning all other terms are 0. The covariance in this case is equal to the variance of the white noise, . This leads to
  1. From here, it follows that
  1. Similar to this reasoning, it can be shown that and .
page icon
Def 4.3 - Moving Average of Order Q
A stationary stochastic process is a moving average of order q (denoted as ) if it satisfies:
where and are unknown parameters.
Autocorrelation of
The autocorrelations of an process are equal to zero for lags larger than . If the correlogram shows a strong decline and becomes non-significant after lag , there is evidence the series was generated by an process.
page icon
Example 4.3 - Moving Average MA(4) & Autocorrelation
The following image illustrates a moving average process of order 4 and its ACF. As mentioned, the ACF becomes negligible for lags larger than 4.
notion image
Parameter Estimation & Validation
When a time series is expected to be generated by an model, the parameters and all should be estimated. This can be done using any of the techniques described in chapter 2. Often a simple estimator such as the maximum likelihood estimator will suffice. If , then the maximum likelihood estimator is given by
The estimated parameters will lead to residuals. The residuals should behave close to a white noise distribution. In order to validate an model, it is often a good idea to make a correlogram of the residuals. The ACF function should ideally contain a spike at a lag of order 0 (as is usual), but not contain any significant values at other lags.
page icon
Example 4.4 - Simulation
In this example four graphs are shown. They have the following meaning:
  • The first graph visualizes the result of a simulated process. This can be mathematically expressed as
    • This requires to estimate the parameters
  • The second graph shows the predicted values against the actual values, together with the upper and lower confidence intervals. These represent a range around each predicted value that reflects the uncertainty of the prediction. They give a margin within which the true future values are expected to fall, based on the variability of the data and the uncertainty in parameter estimation (see
    Chapter 2 - Estimators
    Chapter 2 - Estimators
    )
  • The third graph shows the value of the residuals between the predicted and observed data.
  • The final graph shows the correlogram of the ACF. As expected, there is an autocorrelation of 1 for a lag of order , but at other lags the autocorrelation is negligible, indicating a good fit.
notion image