Unconditional Variance for GARCH(1,1)

We use the term "conditional variance" in the timeseries context to refer to the result of our model fit for the one-period ahead variance of the invariant $X_t$. GARCH(1,1) is a special case of a linear recursion model where we consider only three terms: a constant, the lagged squared residual $\varepsilon_t=X_t-\mathbb E\left[X_t|\mathcal F_{t-1}\right]$, and the lagged forecast. \begin{equation} \operatorname{var}\left[X_t|\mathcal F_{t-1}\right]= \mathbb E\left[\varepsilon^2_t|\mathcal F_{t-1}\right]=\sigma^2_t=\alpha_0+\alpha_1\varepsilon^2_{t-1} +\beta_1\sigma^2_{t-1} \end{equation} with non-negatitive coefficients to ensure positivity.

The index of the filtration, the particular sigma algebra and measure we are using, is crucial. Let's gradually restrict the information and see what happens to this forecast. \begin{align} \mathbb E\left[\varepsilon^2_t|\mathcal F_{t-2}\right]&=\alpha_0+\alpha_1 \mathbb E\left[\varepsilon^2_{t-1}|\mathcal F_{t-2}\right] +\beta_1\sigma^2_{t-1}\\ &=\alpha_0+\left(\alpha_1+\beta_1\right)\sigma^2_{t-1}\\ &=\alpha_0+\left(\alpha_1+\beta_1\right)\left(\alpha_0+\alpha_1\varepsilon^2_{t-2} +\beta_1\sigma^2_{t-2}\right)\\ &=\alpha_0\left(1+\left(\alpha_1+\beta_1\right)\right) +\alpha_1\left(\alpha_1+\beta_1\right)\varepsilon^2_{t-2} +\beta_1\left(\alpha_1+\beta_1\right)\sigma^2_{t-2} \end{align} That's interesting. Let's take another step back. \begin{align} \mathbb E\left[\varepsilon^2_t|\mathcal F_{t-3}\right]&= \alpha_0\left(1+\left(\alpha_1+\beta_1\right)\right)+ \left(\alpha_1+\beta_1\right)^2\sigma^2_{t-2}\\ &=\alpha_0\left(1+\left(\alpha_1+\beta_1\right)+\left(\alpha_1+\beta_1\right)^2\right) +\alpha_1\left(\alpha_1+\beta_1\right)^2\varepsilon^2_{t-3} +\beta_1\left(\alpha_1+\beta_1\right)^2\sigma^2_{t-3} \end{align} By recursion, we can prove that, for any natural $k>1$, \begin{equation} \mathbb E\left[\varepsilon^2_t|\mathcal F_{t-k}\right] =\alpha_0\left(1+\left(\alpha_1+\beta_1\right)+\cdots+\left(\alpha_1+\beta_1\right)^{k-2}\right) +\left(\alpha_1+\beta_1\right)^{k-1}\sigma^2_{t-k+1} \end{equation} Taking the limit $k\to\infty$, we see that, if and only if $\alpha_1+\beta_1<1$, \begin{equation} \mathbb E\left[\varepsilon^2_t|\mathcal F_{-\infty}\right] =\alpha_0\left(1+\left(\alpha_1+\beta_1\right)+\left(\alpha_1+\beta_1\right)^2+\cdots\right) \end{equation} exists and is finite.

Since the conditioning sigma algebra above contains no information at all, we call the result the "unconditional variance", \begin{equation} \mathbb E\left[\varepsilon^2_t\right] =\frac{\alpha_0}{1-\alpha_1-\beta_1} \end{equation} [This is a little misleading, since $\mathbb E\left[X_t|\mathcal F_{t-1}\right]\ne\mathbb E\left[X_t\right]$ in general; hence $\mathbb E\left[\varepsilon^2_t\right]\ne\operatorname{var}\left[X_t\right]$ in general.]

While $\alpha_1+\beta_1>1$ would result in a meaningless negative variance, the edge case $\alpha_1+\beta_1=1$ is meaningful as long as $\alpha_0$ is also zero. In this case the unconditional variance is undefined and the model degenerates to the exponentially weighted moving average, \begin{equation} \sigma^2_t=\left(1-\beta_1\right)\varepsilon^2_{t-1}+\beta_1\sigma^2_{t-1} =\left(1-\beta_1\right)\left(\varepsilon^2_{t-1}+\beta_1\varepsilon^2_{t-2} +\beta_1^2\varepsilon^2_{t-3}+\cdots\right) \end{equation}

The other meaningful edge case is $\alpha_1=0$, in which case the model degenerates to a constant-volatility forecast $\sigma^2_t=\alpha_0$.