Chapter 1 - Introduction To Time Series Analysis

Chapter 1 - Introduction To Time Series Analysis


This chapter provides an introductory overview of the basic concepts related to time series analysis (abbr. TSA). Before an analysis of a time series can be made, it is important to write down a clear and concise definition of what a time series actually is or what it represents.
page icon
Def. 1.1 - Time Series A time series is a sequence of data points measured over successive time intervals.
Time series analysis then focuses on identifying patterns and structures in sequentially ordered data points. When data points are spaced out (evenly) over a period of time, the result is often a time series. Analyzing these time series allows to uncover underlying structures, make forecasts and understand the behavior of metric of interest over time. Simple examples of time series include
  • Daily stock prices
  • Monthly sales
  • Yearly rainfall
page icon
Example 1.1 - Stock Market
The stock market is a prime example of a time series, which is updated on a daily basis. Each day, the prices of stocks or shares fluctuate, either increasing or decreasing, based on a range of influencing factors. Investors seek to predict these market movements to determine whether the price of a stock will rise or fall. Accurate predictions enable them to make informed decisions about buying new stocks or selling existing investments, thereby optimizing their financial returns.
The following figure illustrates the prices of the Microsoft stock (MSFT) in a typical "candlestick" chart format. As an investor, gaining insights into how the price of this stock may evolve over time can be valuable for making informed decisions. For instance, the Exponential Moving Average (EMA) displayed in the chart indicates that the stock price has been trending upward over the past few weeks. This upward trend suggests it may not be an optimal time to purchase additional shares, as prices are currently higher.
notion image
When conducting time series analysis or examining a time series, it is crucial to establish the context of the data being analyzed. This context includes the nature of the data itself, which is typically a representation of a specific metric over time. In a two-dimensional graph, the horizontal axis (commonly the -axis) represents the temporal dimension, while the vertical axis (the -axis) depicts the magnitude of the data metric. In addition to the temporal and data dimensions, it is important to specify the sampling frequency of the data. For example, in the previous example, the data is sampled daily, with each candlestick in the chart representing the opening and closing prices of a stock, as well as the daily maximum and minimum prices. Another key aspect to note is the sample size, denoted by , which refers to the total number of data points collected. In this example, with daily data collection over a period of 90 days, the sample size would be .
 
Building on these foundational concepts, a time series can be described in greater detail as a sequence of data points collected at regular or irregular intervals over time. Unlike traditional data analysis, where observations are often treated as independent, time series data contains a clear temporal structure. This temporal ordering is essential because each data point in a time series is linked to its predecessors and potentially influences future values. This temporal dependence is a key characteristic that sets time series analysis apart from other, types of data analysis.
 
One of the most important characteristics of time series data is temporal dependency. Temporal dependency refers to the tendency that observations close in time are more likely to be strongly correlated. This dependency often decays as the time interval between observations increases. Neglecting these dependencies can lead to inaccurate conclusions, as traditional methods assume that observations are independent and identically distributed (i.i.d.). However, time series analysis methods explicitly account for these relationships, making them ideal for forecasting and understanding dynamic systems.
 
Furthermore, time series have the ability to exhibit various patterns such as trends (long-term upward or downward movements), seasonality (repeating patterns at regular intervals), and cycles (longer-term oscillations driven by external factors). Identifying and understanding these patterns is essential for making accurate predictions. This is the goal of time series analysis, to extract these structures, remove noise, and create models that capture the underlying processes driving the data.

1.1 Stochastic Processes

Time series analysis requires the notion of a stochastic process, which is a mathematical model that is used to describe systems or phenomena that evolve over time in a way that incorporates randomness. More formally, it can be described as a collection of random variables indexed by time (or space) that describe how a system changes across different time points or locations. Each random variable in the process represents the state of the system at a given point in time, and the evolution of the system is influenced by both deterministic rules and random fluctuations.
page icon
Def. 1.2 - Stochastic Process A stochastic process is a sequence of stochastic variables . The process is observed from to , which yields a sequence of numbers
which from now on, will be called a time series.
Randomness
One of the key characteristics of a stochastic process is the randomness or uncertainty of the process. Unlike deterministic processes, where the future states of the system are entirely determined by the current state (and in a way, the first state of the process), in a stochastic process' future states are probabilistic and can vary even if the current state is known. The randomness that is introduced at each step or interval means that multiple “realizations” or “paths” of the process are possible, with each path representing one possible outcome of the process over time.
 
Time/Space-based Indexing
Another key characteristic of a stochastic process is that it is indexed by time or space, albeit continuous or discrete. This makes it an ideal framework for modeling time-dependent phenomena (such as time series data). In time series analysis specifically, the index set is usually discrete, meaning the process evolves at specific points in time (e.g. the stock price example). However, it must be noted that stochastic processes can also be continuous in time (e.g. Brownian motion).
 
Dependency Structure
A third and final characteristic of a stochastic process is the form of dependency. There are independent processes where each point is independent of others. Markov processes are stochastic processes where the future state depends only on the current state and not on past states (this is called the Markov property). Finally, there exist autoregressive processes where the future state depends on a combination of the current and previous states.
page icon
Example 1.2 - Stochastic Processes This example shows three types of stochastic processes as an example: random walks, Poisson processes and a time series obtained by an ARMA process ().
  • Random walks are basic stochastic processes where each step is random and independent of the previous step.
    • notion image
  • Poisson processes are stochastic processes that model the occurrence of events over time, where the events happen independently and at a constant average rate.
    • notion image
  • ARMA/ARIMA models are examples of stochastic processes that are often used specifically in time series analysis. The next value in the series is modeled as a function of previous values (autoregression) and random “shocks”, incorporating randomness at each time step.
    • notion image
Over the course of this syllabus, only regularly spaced, discrete time series will be treated. Note also that the observations in a time series are not independent of each other (i.e. a time series is an autoregressive stochastic process, meaning the previous values influence future values). This autoregressive nature of time series requires the introduction of stationarity in what follows.
page icon
Def. 1.3 - Stationarity A stochastic process is (weakly) stationary if the following conditions hold:
  • is the same for all
  • is the same for all
  • is the same for all and for all
In the previous example, the random walk and Poisson processes where clearly non-stationary. In contrast, the ARMA process can be seen as a stationary process.

1.2 Autocorrelations

Autocorrelations will play an important role when talking about time series analysis. An autocorrelation is the correlation of a signal (or sample) with a delayed version of itself over successive time intervals. In particular for time series, it measure how similar a time series is to a “lagged” version of itself. It quantifies how the current values of a variable relate to its past values at different time lags. This enables the quantification of the degree of dependence between observations in a time series at different points in time.
 
In simpler terms, autocorrelation of order explains the relation between a value at time to a value at time , where is the lag or time difference. If the autocorrelation is high for a particular lag, it means that the values at those time points are strongly related. The opposite is also true, low autocorrelation values for a particular lag mean that the values at those time points are only weakly related or not at all.
 
Difference between correlation and autocorrelation
In general, a regular correlation is the relationship between two distinct variables at the same point in time. An autocorrelation on the other hand is the correlation of the variable with itself at a different point in time. This makes autocorrelation a powerful tool to detect temporal dependencies, as will become clear in the extent of this course.
page icon
Def. 1.4.a - Autocorrelation The autocorrelation of order is defined as:
The autocorrelation can be estimated by:
page icon
Def. 1.4.b - Autocorrelation Function
The autocorrelation can also be expressed as the autocorrelation function (abbr. ACF) of the order :
 
An autocorrelation of order provides useful insights about a stochastic process. It offers an explanation for the behavior and can help characterize the underlying dynamics of a stochastic process. In the context of time series analysis, autocorrelation is a fundamental concept because it helps identify whether there is a temporal relationship between observations. Time series data often exhibits patterns or dependencies over time (e.g. trends, cycles or seasonality) which can be captured by analyzing autocorrelations. They can be further separated into three distinct categories.
 
Positive Autocorrelations
When positive autocorrelation exists, high (or low) values in the time series tend to be followed by high (or low) values. An example of this behavior is a stock market trend, where consecutive days of rising stock prices indicate a positive autocorrelation.
 
Negative Autocorrelations
When negative autocorrelation exists, high values are followed by low values (and vice versa), indicating that the time series oscillates or alternates over time. This is common in cyclical data, such as alternating high and low demand in different seasons.
 
No Autocorrelation
If there is no autocorrelation (i.e. the autocorrelation is close to zero for all orders ) the time series may be random or follow a white noise process where there are no meaningful temporal patterns present.
 
While the values of autocorrelations provide basic information about the underlying stochastic process (or in this particular case, time series), often more information can be retrieved. This includes temporal dependencies and the stationarity of a stochastic process. Furthermore, it allows to identify patterns, provide insight in model selection and detect white noise.
  • Temporal Dependencies - Autocorrelation identifies whether the values in a time series are dependent on previous values. This is essential for constructing models that account for these dependencies, such as ARIMA models, which rely on the assumption that past values affect future ones.
  • Stationarity - For many stochastic processes, stationarity (where statistical properties like the mean and variance do not change over time) is a key assumption. Autocorrelation helps determine whether a series is stationary or not. A stationary series typically exhibits rapidly decaying autocorrelations, while a non-stationary series may show persistent correlations over long lags.
  • Pattern Identification - If the autocorrelation remains significant even at large lags, this often suggests the presence of a trend (i.e., a long-term movement upward or downward). If the autocorrelation shows periodic peaks at regular intervals (e.g., significant correlation at every 12 months in a monthly dataset), this indicates the presence of a seasonal pattern.
  • Model Selection - In time series modeling, autocorrelation plays a role in choosing the appropriate model. For example, autoregressive (AR) models assume that future values can be predicted from past values, which is detected through significant autocorrelation at lower lags. On the other hand, moving average (MA) models account for autocorrelation in the error terms, and are identified by examining the ACF (autocorrelation function) of residuals after fitting an AR model.
  • White Noise Detection - In a purely random or white noise process, autocorrelations at all lags are close to zero, indicating no discernible pattern or structure in the data. This helps in distinguishing between random processes and those that exhibit structure, such as trends or cycles.
 
Formula for Autocorrelation
The correlation between two variables and is expressed as
This expression standardizes covariance to a dimensionless measure of linear association between two variables. In the case of time series, more specifically autocorrelation, this formula translates to
In the case of a stationary time series, this can be simplified further since to become
Covariance is a measure for the linear relationship between two (auto)correlated variables. More specifically, it quantifies how changes in one variable are linearly associated with changes in the other. It is defined using the following expression
In this case it is again possible to further simplify, since in the case of stationarity. Variance in theory, is a special case of covariance, where the covariance of a variable with itself is determined.
This measure the spread of around its mean. This can be seen intuitively once the formula of covariance is applied, resulting in the well known formula for variance.
(Auto)Correlation then adjusts the covariance by the product of the standard deviations of the two variables. The division by product of standard deviations removes units, allowing to be a pure number which is normalized between -1 and 1. The standard interpretation for correlation is as follows:
  • implies a perfect positive linear relationship. This means that if the value of increases, the value of increases linearly.
  • implies no linear relationship.
  • implies a perfect negative linear relationship. This means that if the value of increases, the value of decreases linearly.
From here, it is relatively simple to derive the formula for autocorrelation. After all, for some lag . Substituting the formula for correlation, the following expression is obtained
As mentioned previously, most time series models (and in particular those where autocorrelation will be meaningful, such as or ), weak stationarity is assumed. Weak stationarity implies that
  • depends only on the lag and not on . We will express this as the autocovariance at lag .
With these assumptions in mind, the expression for can be heavily simplified to
where . To obtain we will need to resort to estimation of both and .
Chapter 2 - Estimators
Chapter 2 - Estimators
dives deeper in the estimation techniques used in the following derivation.
  • An estimate of the sample variance is given by the following point estimator:
    • where is the sample mean. The sample mean can also be obtained using another point estimator:
  • An estimate of the sample autocovariance (for lag ) is defined as the average product of the devations of and from their respective means. It is given by the following point estimator:
Once these point estimates have been obtained, the resulting point estimator for the autocorrelation is given by
Finally, since , we can expect . The result if the formula for autocorrelation as expressed in definition 1.4.a.

1.2.1 The Correlogram

To get a graphical representation of the autocorrelation, a correlogram can be used. A correlogram is a plot of against the order . More specifically, it is a plot of the autocorrelation function for different orders of . In most cases, two lines can be found on a correlogram. These lines represent the critical values of the test statistic for testing the null-hypothesis that for a specific order value . As has been stated before, the autocorrelation (for a specific order ) being near zero provides certain indications about the time series.
page icon
Example 1.3 - The Correlogram To continue example 1.1, the correlogram for the Microsoft stock is computed with a lag range of . In the correlogram, a dampened oscillation can be seen. This is quite typical for stock market data. The effect observed here is often called mean-reverting behavior (possibly due to cyclical or seasonal effects). In terms of stock market data, this indicates that the stock price tends to “revert” or “correct” itself after moving away from an equilibrium (or a central tendency, such as the moving average). When compared to the data presented in the original example, this is indeed the case. Furthermore, the oscillation seems to dampen when higher lags are considered. This is indicative of the fact that stock prices typically do not exhibit long-term memory (i.e. after some number of days, the autocorrelation decays to near-zero). This means that there is no significant relationship between prices separated by longer time intervals. The oscillation in stock data can also be attributed to a variety of other factors, such as market cycles, trends and investor behavior. Furthermore, indicators such as the (exponential) moving average, among others often influence investors and algorithmic traders. Their trading strategies can amplify short-term trends, leading to autocorrelated price movements over short lags.
notion image
Population Level Vs. Sample Level
One important note on the correlogram is the level at which the correlogram is constructed. In statistics, often scientists try to reject or not reject a null hypothesis. A null hypothesis is a claim about the population. Often, this hypothesis states that there is no effect or relationship. As mentioned before, the null hypothesis for autocorrelation can be expressed as
meaning that there is no autocorrelation present at a lag in the population. The null hypothesis is framed at the population level since it reflects general assumptions. The sample data can then be used to test these assumptions. More specifically, they allow to generalize back to the population.
 
The correlogram is calculated at sample level because it relies on finite, observed data. The values present in a correlogram estimate the true population autocorrelation but they do no represent it directly. The key takeaway here is that correlograms are empirical tools used to infer patterns from sample data. Their accuracy depends on the sample size and data quality.
 
Theory Vs. Practice
In most cases, for a time series model, a theoretical correlogram can be constructed using the rules governing expected values and variance within a time series. This allows for a theoretical correlogram, which represents the true autocorrelation function (ACF) of the population (note, we just discussed that in practice correlograms are not at population level). The theoretical correlogram is deterministic and is not influenced by sampling variability.
 
In practice however, the correlogram has to be computed from observed sample data. This data is subject to randomness and sampling variability, which in turn results in deviations from the theoretical correlogram if a certain model is used or expected. Practical correlograms can be used to approximate the theoretical correlogram, where the accuracy will improve with sample size and data quality.

1.3 Difference Operators

The importance of stationary data in time series analysis is paramount, as many analytical methods and models rely on the assumption that the underlying data is stationary. However, not all time series data naturally exhibits stationary behavior. For instance, a time series with a noticeable linear trend is considered non-stationary, as it fails to meet the conditions that are required for stationarity (such as constant mean and variance over time). In such cases, difference operators play a vital role by transforming non-stationary data into stationary data, allowing for more effective analysis techniques to be used. The two key operators used for this transformation are the Lag operator and the (first) Difference operator. These operators help to remove trends and stabilize the data, making it suitable for various time series analysis techniques.
 
Lag Operator
The Lag operator, often denoted as , is an operator that can be used to shift or “lag” a time series backwards by a specific number of time steps. For any time series , applying the lag operator once moves the observations at time back to time . Mathematically, this is represented as:
In general, applying the lag operator shifts the series back by intervals:
The lag operator is an essential operator in time series analysis, because it allows for easy manipulation of past values in models such as autoregressive, moving average and ARIMA processes.
 
Difference Operator
The (first) Difference operator, denoted as , is closely related to the lag operator. This operator can be used to transform a non-stationary time series into a stationary one by removing trends or other forms of long-term dependencies. The first difference of a time series is the difference between consecutive sample observations:
In general, using the lag operator , the difference operator can be written as follows:
For example, linear trends can be eliminated by applying once. If a stationary process is then obtained, it is said that the time series is integrated of order . In practice however, applying higher-order difference operators can reduce more complex trends from a time series. The second difference operator for example, measures the change in the first difference. This is useful for the removal of quadratic trends in the data. In general, the second difference operator is defined as:
 
Seasonal Difference Operator
Another operator is the Seasonal Difference operator . Seasonal differencing is applied to remove seasonal patterns from a time series. It subtracts the value from the same season in the previous cycle. For instance, for a seasonal period , the seasonal difference operator is defined as:
Besides seasonal differencing and first/second order differencing, higher order differencing can be used to eliminate even more complex trends from the time series data. In general, the -th order difference operator can be defined by repeatedly working out the lower difference operators that appear in the derivation.
 
Log Difference Operator
In some cases, a time series exhibits exponential growth or multiplicative relationships. To help stabilize variance and normalize the data, the Log Difference operator can be used. It is defined as:
page icon
Def 1.5- Difference Operators The two most important operators are
  • the lag operator
  • the difference operator (including the seasonal/higher order/log operators)
page icon
Example 1.4 - Difference Operator In this example, the time series data of a random walk with drift is visualized (top). More specifically, the time series is defined as where is i.i.d. white noise. When visualized, a linear trend can be observed in the time series data. The strong upward trend is due to the drift present in the random walk. Applying the first difference operator eliminates this drift (bottom).
notion image

1.4 Chapter Summary

  • A time series is a sequence of data points measured over successive time intervals. A time series contains both a temporal and data dimension. To interpret a time series, the frequency of samples and the sample size have to be known.
  • A stochastic process is a sequence of stochastic variables . The observation of the process from to yields a time series.
  • A stochastic process is stationary if the expected value, variance and covariance is the same for all observed samples.
  • An autocorrelation measures how similar a time series is to a “lagged” version of itself. Autocorrelations can be seen through a correlogram.
  • Difference operators such as the lag and (first) difference operators can be used to transform non-stationary data into stationary data. Specific varieties of the difference operator exist to deal with higher order or seasonal data.
 

Continue reading →
Chapter 2 - Estimators
Chapter 2 - Estimators

Sources
No external sources except the original course slides were used to develop this chapter.

TODO
  • Add representative examples from economy, physics, process control, etc
  • Add pointer to ARMA model in SP example