\documentclass [12pt,fleqn] {article}
\setlength{\mathindent}{0.5cm} \setlength{\parindent}{1.1cm}
%\newcommand{\eqref}[1]{(\ref{#1})}
\def\NEG#1{\ensuremath{\slashed{#1}}}
\usepackage{graphicx}
%\usepackage{amssymb}
%\usepackage{amsmath}
%\usepackage{chicago}
%\usepackage{slashed}
%\usepackage{amsfonts}
\setlength{\paperwidth}{8.5in} \setlength{\paperheight}{11.0in}
\setlength{\topmargin}{0.0in} \setlength{\headheight}{0.4in}
\setlength{\headsep}{0.0in} \setlength{\textwidth}{6.7in}
\setlength{\textheight}{8.5in} \setlength{\oddsidemargin}{0.0in}
\setlength{\oddsidemargin}{-0.1in}
\setlength{\evensidemargin}{-0.1in}
\renewcommand{\baselinestretch}{1.5}
\renewcommand{\textfraction}{0.33}
\def\thepage{}
\def\eps{\varepsilon}
\begin{document}
\title{Handout 6}
\maketitle
\section{Some Basic Time-Series Concepts}
You can find many good introductions to time series. See, for example, chapters 1 to 3 and 5 of Hamilton's "Time Series Analysis."
\subsection{Recursive Substitution}
Consider the process
\[
y_t = \rho y_{t-1} +\epsilon_t. \label{ar1}
\]
The equation above summarizes the evolution of $y_t$ through time using a first-order difference equation. The order is given by the fact that only the first lag of $y$ appears in the equation. Notice also that the equation is linear. For most of the course we'll be concerned with linear processes, only touching on non-linear stochastic processes towards the end of the course. Notice that the relationship in equation (\ref{ar1}) implies
\begin{eqnarray*}
y_1 & = & \rho y_{0} +\epsilon_1 \\
y_2 & = & \rho y_{1} +\epsilon_2 \\
y_3 & = & \rho y_{2} +\epsilon_3 \\
&.& \\
&.& \\
&.& \\
y_T & = & \rho y_{T-1} +\epsilon_T
\end{eqnarray*}
Knowing the starting value of $y$, $y_0$, we could back out the path $\epsilon$ through time. Notice also that, by repeated substitution, we can express $y_t$ as:
\[
y_t = \rho^{t}y_0 + \rho^{t-1} \epsilon_1 + \rho^{t-2} \epsilon_2 + ... + \rho \epsilon_{t-1} + \epsilon_t.
\]
From the equation above, we can also notice that if $\epsilon$ is uncorrelated through time, the partial effect of a change in $\epsilon_1$ on $y_t$ is given by:
\[
\frac{\partial y_t}{\partial \epsilon_1} = \rho^{t-1},
\]
which also has the interpretation as the dynamic multiplier on $y_t$ of the innovation $\epsilon_1$. Notice that tracing the dynamic multipliers for an innovation through time corresponds to computing an impulse response to the innovation. Thus think of an impulse response function as a collection of dynamic multipliers:
\[
IRF(\epsilon_1) = \left\{ \frac{\partial y_1}{\partial \epsilon_1}, \frac{\partial y_2}{\partial \epsilon_1}, \frac{\partial y_2}{\partial \epsilon_1}, ... \right\}
\]
Notice that if $|\rho|<1$ then $\lim_{t \rightarrow \infty} \frac{\partial y_{t} }{\partial \epsilon_1} = 0$. In other words, the effect of any one innovation is temporary if $\rho$ lies within the unit circle. By contrast, if $\rho$ lies on or outside the unit circle, an innovation has non-zero effects that extend into the infinite future.
\subsection{Higher order difference equations}
The method of repeated substitutions extends readily to higher-order processes. Consider, for example
\[
y_t= \rho_1 y_{t-1} + \rho_2 y_{t-2} + \rho_3 y_{t-3} + \epsilon_t.
\]
In this case, rewrite the equation in companion form:
\begin{equation}
\xi_t = \rho \xi_{t-1}+
\left( \begin{array}{c}
\epsilon_t \\
0 \\
0 \end{array}
\right),
\end{equation}
where
\begin{equation}\xi_t =
\left( \begin{array}{c}
y_t \\
y_{t-1}\\
y_{t-2} \end{array} \right) \,\,\,\, \mbox{and} \,\,\,\, \rho = \left(
\begin{array}{ccc}
\rho_1 & \rho_2 &\rho_3 \\
1 & 0 & 0 \\
0 & 1 & 0 \\
\end{array}
\right).
\end{equation}
Again, knowing the starting values for $y$, say $y_{-1}$ and $y_0$, allows us to back out a series for $\epsilon$ from date 1 till the end of the observed sample.
Notice that by repeated substitution,
\begin{equation}
\xi_t = \rho^{t}\xi_0 + \rho^{t-1} \left( \begin{array}{c} \epsilon_1 \\ 0 \\ 0 \end{array} \right) + \rho^{t-2} \left( \begin{array}{c} \epsilon_2 \\ 0 \\ 0 \end{array} \right) + ... + \rho \left( \begin{array}{c} \epsilon_{t-1} \\ 0 \\ 0 \end{array} \right) + \left( \begin{array}{c} \epsilon_t \\ 0 \\ 0 \end{array} \right).
\end{equation}
From the equation above, one can see that
\[
\frac{\partial y_t}{\partial \epsilon_1} = \rho^{t-1}_{1,1},
\]
where $\rho^{t-1}_{1,1}$ denotes the $1,1$ element of the matrix $\rho$ taken to the $t-1$ power. In this case, investigating whether or not $\lim_{t \rightarrow \infty} \frac{\partial y_{t} }{\partial \epsilon_1} $ is finite, will depend on the eigenvalues of the matrix $\rho$. Let's consider this question a little more thoroughly.
Remember that if a square matrix $\rho$ of dimension $n$ has $n$ distinct eigenvalues, the Jordan canonical decomposition of $\rho$ takes the form
\begin{equation}
\rho = V D V^{-1}, \label{jordan_decomposition}
\end{equation}
where D is a matrix with the eigenvalues of $\rho$ along its diagonal and zeros elsewhere. $V$ is a matrix whose columns are the eigenvectors of $\rho$. From equation (\ref{jordan_decomposition}), one can see that:
\begin{equation}
\rho^{t-1} = V D^{t-1} V^{-1}.
\end{equation}
But since D is diagonal, the $t-1$ power of $D$ obtains by simply raising to the power of $t-1$ each of its diagonal element individually. Thus, one can see that if all the eigenvalues of $\rho$ are within the unit circle, $\lim_{t \rightarrow \infty} \frac{\partial y_{t} }{\partial \epsilon_1} = 0$.
For an extension of the argument above to the case in which $\rho$ does not have $n$ distinct eigenvalues see Chapter 1 of Hamilton's ``Time Series Analysis.''
\section{Unconditional Mean and Variance of AR Processes}
Extending the argument used in the solution of AR processes by repeated substitution, one can rewrite an AR process as a function of innovations for the infinite past, call this the MA($\infty$) representation. We can use this representation to find the mean and autocovariances as long as the roots of the process lie within the unit circle. To fix ideas we shall work with an AR(1) process, but the arguments presented readily extend to higher-order processes.
Consider again
\[
y_t = \rho y_{t-1} + \epsilon_t
\]
where $\epsilon_t$ is process independently and identically distributed through time with mean zero and variance $\sigma^2$. By repeated substitution:
\[
y_t = \epsilon_t + \rho \epsilon_{t-1} + \rho^{2} \epsilon_{t-2} + ...
\]
Taking expectations:
\[
E [y_t] = E [\epsilon_t + \rho \epsilon_{t-1} + \rho^{2} \epsilon_{t-2} + ... ] .
\]
But given $|\rho|<1$ the expression on the RHS is absolutely summable and distributing the expectation operator to each term
\[
E[y_t] = E[\epsilon_t] + \rho E[\epsilon_{t-1}] + \rho^2E[\epsilon_{t-2}] + ... = 0.
\]
One can also use the MA($\infty$) representation to facilitate computing the unconditional variance of the process. Thus,
\[
VAR(y_t) = VAR(\epsilon_t + \rho \epsilon_{t-1} + \rho^{2} \epsilon_{t-2} + ... )
\]
From the fact that the innovations are independently distributed through time
\[
VAR(y_t) = VAR(\epsilon_t) + VAR( \rho \epsilon_{t-1}) + VAR(\rho^{2} \epsilon_{t-2}) + ...
\]
With a little manipulation, as long as $|\rho|<1$,
\[
VAR(y_t) = \sigma^2 + \rho^2 \sigma^2 + \rho^4 \sigma^2 + ... = \frac{\sigma^2}{1-\rho^2}
\]
\subsection{Autocovariances, autocorrelations, and the correlogram}
Autocovariances are covariances of a process with its own lag. Trivially, the autocovariance at lag zero is the variance. The autocovariance for lag 1 is $COV(y_t,y_{t-1})$, and so on. The MA($\infty$) representation of an AR process can also come in handy in computing the autocovariances.
Again, taking as an example and AR(1) process:
\begin{eqnarray}
y_t &= & \epsilon_t + \rho \epsilon_{t-1} + \rho^{2} \epsilon_{t-2} + \rho^{3} \epsilon_{t-3}... \\
y_{t-1} &=& \,\,\,\,\,\,\, \,\,\,\,\,\,\, \epsilon_{t-1} + \rho \, \, \, \epsilon_{t-2} + \rho^2 \epsilon_{t-3}...
\end{eqnarray}
Then autocovariance at lag 1 can be found as
\[
COV(y_t, y_{t-1}) = E [\epsilon_t + \rho \epsilon_{t-1} + \rho^{2} \epsilon_{t-2} + ...] \cdot [\epsilon_{t-1} + \rho^2 \epsilon_{t-2} + ...]
\]
Collecting terms, as long as $|\rho|<1$,
\[
COV(y_t,y_{t-1}) = \rho \sigma^2 + \rho^3 \sigma^2 + \rho^5 \sigma^2 + ... = \rho \frac{\sigma^2}{1-\rho^2}.
\]
By extension,
\[
COV(y_t,y_{t-j}) = \rho^j \frac{\sigma^2}{1-\rho^2}.
\]
Normalizing the covariance by the variance produces the autocorrelation. Thus, for an AR(1) process the autocorrelation at lag j is given by
\[
\frac{COV(y_t,y_{t-j})}{VAR(y_t)} = \frac{\rho^j \frac{\sigma^2}{1-\rho^2}}{\frac{\sigma^2}{1-\rho^2}}= \rho^j.
\]
A plot of the autocorrelation function with the value of the autocorrelation along the ordinates and the lag on the abscissae produces the correlogram, as demonstrated by the Matlab program plot\_ar1.m in the zipped file corresponding to this handout. The pattern of exponential decay of the correlogram is a feature that extends to autoregressive processes of higher orders and cannot be used to distinguish visually among autocorrelation processes of different order.
\subsection{Some definitions}
\begin{itemize}
\item \textbf{Weak Stationarity}. If the mean and the autocovariances at all lags do not depend on the date $t$, then a time series process is said to be (weakly) stationary.
\item \textbf{Strong stationarity}. If the joint distribution of $X(t_1),...,X(t_k)$ is the same as the joint distribution of $X(t_{1+\tau}),...,X(t_{k+\tau})$ for all $k$ and $\tau$, then the time series process governing $X$ is said to be strictly stationary.
\item \textbf{Gaussianity}.
If the joint distribution $X(t_1),...,X(t_k)$ is jointly normal for any $k$, then the process is said to be Gaussian. It follows that a covariance-stationary Gaussian process is also strictly stationary.
\item \textbf{White Noise}.
Consider the sequence $\{\epsilon_t \}^\infty_{t=-\infty}$. If all elements have mean 0 and variance $\sigma$ and if $E(\epsilon_t,\epsilon_{t+\tau})=0$ for all $\tau$ except $\tau=0$, then the process governing the $\epsilon$ terms is said to be white noise.
\end{itemize}
\section{Estimating autoregressive processes}
A typical assumption of the multiple regression model is that the regressors do not covary with the error term of the regression at any lag of time. That assumption is clearly violated in the case of an autoregressive process. As a consequence, the OLS estimator is not the best linear unbiased estimator (it will be biased, in fact) and there is no particular reason for sticking with it. What other methods are available then for the estimation of an autoregressive process? Let's consider two alternatives: conditional and unconditional maximum likelihood estimation.
\subsection{Maximum likelihood estimation}
To fix ideas we shall consider how to set up the likelihood of an AR(1) process, but the procedure easily extends to higher-order processes. Consider, again, the process
\[
y_t = \rho y_{t-1} +\epsilon_t. \label{ar1_repeat},
\]
where $\epsilon_t$ is NID with variance $\sigma^2$. Suppose that there are only two observations, then their joint density function can be written as
\begin{equation}
P(y_2,y_1) = P(y_2|y_1)P(y_1) \label{p2}
\end{equation}
Similarly for 3 observations
\begin{equation}
P(y_3,y_2,y_1) = P(y_3|y_2,y_1)P(y_2,y_1) \label{p3}
\end{equation}
Substituting (\ref{p2}) into (\ref{p3})
\begin{equation}
P(y_3,y_2,y_1) = P(y_3|y_2,y_1)P(y_2|y_1)P(y_1) \label{p3}
\end{equation}
Extending this reasoning to $T$ observations, you can see that the likelihood can be written as:
\begin{equation}
L(Y_T,\rho,\sigma^2) = P(y_1) \prod_{t=2}^T P(y_t|Y_{t-1})
\end{equation}
where $Y_{t}$ represents the vector of observations from 1 through $y_{t-1}$. Remember that if $\epsilon$ is distributed as Normal with mean $\mu$ and variance $\sigma^2$, then the probability density function for epsilon is given by:
\[
PDF(\epsilon) = \frac{1}{ \sqrt{2 \pi \sigma^2}} \exp \left(\frac{-(\epsilon-\mu)^2}{2\sigma^2} \right)
\]
In turn, for the AR(1) process in equation (\ref{ar1_repeat}), the log-likelihood function takes the form:
\begin{equation}
\log L(Y_T,\rho,\sigma^2) = -\frac{T-1}{2} \log (2 \pi) - \frac{T-1}{2} \log \sigma^2 - \sum_{t-2}^T \frac{(y_t - \rho y_{t-1})^2 }{2 \sigma^2} + \log P(y_1) \label{log_flikel}
\end{equation}
While the evaluation of ($\ref{log_likel}$) seems straight-forward, the term $\log P(y_1)$ deserves some further discussion. If this term can be thought of as being fixed in repeated draws of the observed sample, then it simply drops out of the likelihood:
\begin{equation}
\log L(Y_T,\rho,\sigma^2) = -\frac{T-1}{2} \log (2 \pi) - \frac{T-1}{2} \log \sigma^2 - \sum_{t-2}^T \frac{(y_t - \rho y_{t-1})^2 }{2 \sigma^2} \label{log_likel_cond}
\end{equation}
Maximizing the remaining terms yields the \textbf{conditional} maximum likelihood estimates for the AR(1) process. Conveniently, there is no need to numerically optimize the function above, as the maximum likelihood estimate can also be thought of as the estimate produced by minimizing the residual sum of square, as performed by the OLS estimator.
An alternative way of proceeding for stationary series (in this case, when $|\rho|<1$) is to think that $y_1$ was drawn for the unconditional distribution for $y$. As seen above, $y_1$ then has mean 0 and variance $\frac{\sigma^2}{1-\rho^2}$. Thus, equation (\ref{log_likel}) becomes
\begin{equation}
\log L(Y_T,\rho,\sigma^2) = -\frac{T}{2} \log (2 \pi) - \frac{T}{2} \log \sigma^2 + \frac{1}{2} \log(1-\rho^2) - \sum_{t-2}^T \frac{(y_t - \rho y_{t-1})^2 }{2 \sigma^2} - \frac{y_1^2}{\frac{\sigma^2}{1-\rho^2} }\label{log_likel_unc}
\end{equation}
which is referred to as the \textbf{unconditional} likelihood. Unfortunately, in this case, numerical optimization of the likelihood cannot be avoided.
\section{Bootstrap Standard Errors and Confidence Intervals}
So far, we have not considered the question of how to retrieve the standard errors for the parameter estimates of the AR model. The inversion of the information matrix is still a valid way of retrieving the variance covariance matrix for the estimates and has asymptotic justification. An alternative is the bootstrap method, that in practice seems to have better properties in small samples, even though its justification is asymptotic in nature.
Here is a cookbook recipe for the application of the bootstrap method to an AR(1) process:
\[
y_t = \rho y_{t-1} + \epsilon_t
\]
\begin{enumerate}
\item Estimate $\rho$ by $\hat{\rho}$
\item Form the residuals $\hat{\epsilon_t} = y_t - \hat{\rho} y_{t-1}$
\item Sample from the residuals with replacement, and form new data ${y_{i,t}}$
\item With the new data, form the estimate $\hat{\rho_i}$
\item Repeat the replication and re-estimation of $\rho$ an appropriately large number of times
\item The variance of $\hat{\rho_i}$ can be taken to be the estimate of the variance of $\hat{\rho}$.
\end{enumerate}
The procedure above also yields a way to compute confidence intervals for the parameter estimates. Inverting the normal distribution, a 90\% confidence interval can be constructed as $\hat{\rho} - 1.6*\sqrt{VAR(\hat{\rho_i})} < \rho < \hat{\rho} + 1.6 *\sqrt{VAR(\hat{\rho_i})}$.
\section{Moving Average Processes}
Consider the process
\begin{equation}
y_t = \mu + \epsilon_t + \rho \epsilon_{t-1},
\end{equation}
where $\mu$ and $\rho$ could be any constants and $\epsilon$ is governed by a white noise process. The process for $y$ is called a moving average process of first-order, or MA(1).
\subsection{Mean, Variance, and Covariances}
The expectation of $y$ is given by:
\[
E(y_t) = \mu + E(\epsilon_t) + \rho E(\epsilon_{t-1}).
\]
Which leads to:
\[
E(y_t) = \mu.
\]
The unconditional variance of $y$ is given by:
\[
VAR(y_t) = E(y_t-\mu)^2 = E(\epsilon_t+\rho\epsilon_{t-1})^2.
\]
But is $\epsilon$ is governed by a white noise process:
\[
VAR(y_t) = 1+\rho^2 \sigma^2.
\]
The covariance at lag one is given by:
\[
E[(y_t-\mu)(y_{t-1}-\mu)] = E[(\epsilon_t+\rho\epsilon_{t-1})(\epsilon_{t-1}+\rho\epsilon_{t-2})] = \rho \sigma^2
\]
For the MA(1) process, the covariances at all lags higher than 1 are zero.
\subsection{The $n^{th}$ order moving average process}
Consider the process
\begin{equation}
y_t = \mu + \epsilon_t + \sum_{i=1}^n \rho_i \epsilon_{t-i},
\end{equation}
where $\epsilon$ is governed by a white noise process. The process for $y$ is MA(n).
The variance of the process is given by:
\[
VAR(y_t) = (1 + \sum_{i=1}^n \rho_i^2)\sigma^2.
\]
One can easily see that the covariance at lag $j>0$, but$ j<=n$, takes the form:
\[
COV(y_t,y_{t-j}) = (\rho_j + \rho_{j+1}\rho_1 + \rho_{j+2}\rho_2 + ... + \rho_n \rho_{n-1}) \sigma^2.
\]
For all lags greater than $n$, the covariance is zero.
To give a concrete example, for an MA(2) process, the formulae above imply:
\begin{eqnarray*}
&& VAR(y_t) = (1 + \rho_1^2 + \rho_2^2)\sigma^2 \\
&& COV(y_t,y_{t-1}) = (\rho_1 + \rho_2\rho_1) \sigma^2 \\
&& COV(y_t,y_{t-2}) = \rho_2 \sigma^2 \\
&& COV(y_t,y_{t-3}) = 0.
\end{eqnarray*}
Notice that from the formulas above, we can also infer that there are no restrictions on the parameters governing an MA process to ensure that the process be covariance stationary.
\end{document}