This exercise is part of the ECLR page.
In time-series modelling it is crucial to understand the properties of the time-series used. It is now well understood that (standard) statistical inference in regression models relies on the series used in regression models being stationary.
In this walkthrough we will demonstrate how to use appropriate statistical tests (here the Augmented Dickey-Fuller, ADF, tests), to establish whether a series, or some differences version of a series, is stationary.
In order to work on this problem we upload a range of libraries.
library(tidyverse)
library(ggplot2)
library(stargazer)
library(fbi) # for easy access to the FRED-MD database
library(tseries) # for ADF test
library(urca) # Another package for unit root testing
library(zoo) # to transform data into ts format
library(quantmod) # to download data from FRED
The data we shall use come from the FRED-MD database. This is a prepared database of macro variables for the U.S. (see McCracken and Ng, 2016). If you wish to learn more about this very useful dataset refer to this walkthrough which discusses the finer detail of this dataset.
filepath <- "https://files.stlouisfed.org/files/htdocs/fred-md/monthly/2024-10.csv"
# for the most up to date dataset use
# filepath <- "https://files.stlouisfed.org/files/htdocs/fred-md/monthly/current.csv"
data.md <- fredmd(filepath, date_start = NULL, date_end = NULL, transform = FALSE)
The dataset contains 126 macro variables and the variable
date
in which the dates are stored. When applying
time-series techniques, such as the ADF test below, to series, it is
often advantageous for R to know that the data are time-series data.
From data.md
R does not know that. Yes, there is a variable
called date
and that variable does have a Date
format (check str(data.md)
to confirm), but all other
series are merely numerical variables.
There are many different ways in which we can format all of the
series as a time-series. The simplest uses the zoo
package.
That is a package designed to deal with time-series data.
data.md.ts <- read.zoo(data.md) # create a new object with ts objects
head(data.md.ts,2)
## RPI W875RX1 DPCERA3M086SBEA CMRMTSPLx RETAILx INDPRO IPFPNSS
## 1959-01-01 2583.560 2426.0 15.188 276676.8 18235.77 21.9616 23.3868
## 1959-02-01 2593.596 2434.8 15.346 278714.0 18369.56 22.3917 23.7024
## IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ IPMAT IPDMAT IPNMAT
## 1959-01-01 22.2620 31.6664 18.9498 38.1496 8.0538 20.0318 11.9870 30.6611
## 1959-02-01 22.4549 31.8987 19.0492 38.5142 8.1616 20.6662 12.5391 31.1980
## IPMANSICS IPB51222S IPFUELS CUMFNS HWI HWIURATIO CLF16OV CE16OV
## 1959-01-01 20.8325 19.9173 34.6443 80.1973 1357 0.3335792 67936 63868
## 1959-02-01 21.2154 19.8631 34.1724 81.4428 1421 0.3583859 67649 63684
## UNRATE UEMPMEAN UEMPLT5 UEMP5TO14 UEMP15OV UEMP15T26 UEMP27OV
## 1959-01-01 6.0 16.3 1574 1169 1396 594 802
## 1959-02-01 5.9 15.5 1554 1164 1277 545 732
## CLAIMSx PAYEMS USGOOD CES1021000001 USCONS MANEMP DMANEMP NDMANEMP
## 1959-01-01 291078 52478 18796 713.0 2993 14998 8740 6258
## 1959-02-01 282958 52688 18890 704.2 2980 15115 8839 6276
## SRVPRD USTPU USWTRADE USTRADE USFIRE USGOVT CES0600000007 AWOTMAN
## 1959-01-01 33682 10774 2568.7 5350.3 2418 8105 39.8 2.5
## 1959-02-01 33798 10816 2575.4 5381.3 2420 8116 39.7 2.6
## AWHMAN HOUST HOUSTNE HOUSTMW HOUSTS HOUSTW PERMIT PERMITNE PERMITMW
## 1959-01-01 40.2 1657 350 452 505 350 NA NA NA
## 1959-02-01 40.3 1667 346 469 508 344 NA NA NA
## PERMITS PERMITW ACOGNO AMDMNOx ANDENOx AMDMUOx BUSINVx ISRATIOx
## 1959-01-01 NA NA NA NA NA NA 84889.56 1.560000
## 1959-02-01 NA NA NA NA NA NA 85181.81 1.539338
## M1SL M2SL M2REAL BOGMBASE TOTRESNS NONBORRES BUSLOANS REALLN
## 1959-01-01 138.9 286.6 987.9 50.5 18.9 18.3 35.2130 24.9242
## 1959-02-01 139.4 287.7 992.1 49.8 18.6 18.1 35.2201 25.2270
## NONREVSL CONSPI S&P 500 S&P div yield S&P PE ratio FEDFUNDS CP3Mx
## 1959-01-01 48.96116 0.1249647 55.62 3.158342 18.44574 2.48 3.30
## 1959-02-01 49.51371 0.1257651 54.77 3.219518 18.41812 2.43 3.26
## TB3MS TB6MS GS1 GS5 GS10 AAA BAA COMPAPFFx TB3SMFFM TB6SMFFM
## 1959-01-01 2.82 3.09 3.36 4.01 4.02 4.12 4.87 0.82 0.34 0.61
## 1959-02-01 2.70 3.13 3.54 3.96 3.96 4.14 4.89 0.83 0.27 0.70
## T1YFFM T5YFFM T10YFFM AAAFFM BAAFFM TWEXAFEGSMTHx EXSZUSx EXJPUSx
## 1959-01-01 0.88 1.53 1.54 1.64 2.39 NA 4.3122 359.8417
## 1959-02-01 1.11 1.53 1.53 1.71 2.46 NA 4.3133 359.8417
## EXUSUKx EXCAUSx WPSFD49207 WPSFD49502 WPSID61 WPSID62 OILPRICEx
## 1959-01-01 2.8065 0.9671 33.1 33.4 30.6 31.6 3
## 1959-02-01 2.8093 0.9748 33.2 33.4 30.7 31.4 3
## PPICMM CPIAUCSL CPIAPPSL CPITRNSL CPIMEDSL CUSR0000SAC CUSR0000SAD
## 1959-01-01 32.5 29.01 44.8 29.3 21.1 33.3 38.1
## 1959-02-01 32.5 29.00 44.7 29.4 21.2 33.3 38.1
## CUSR0000SAS CPIULFSL CUSR0000SA0L2 CUSR0000SA0L5 PCEPI
## 1959-01-01 22.9 28.9 30.7 29.6 15.164
## 1959-02-01 23.0 28.9 30.7 29.6 15.179
## DDURRG3M086SBEA DNDGRG3M086SBEA DSERRG3M086SBEA CES0600000008
## 1959-01-01 63.517 18.294 10.152 2.13
## 1959-02-01 63.554 18.302 10.167 2.14
## CES2000000008 CES3000000008 UMCSENTx DTCOLNVHFNM DTCTHFNM INVEST
## 1959-01-01 2.45 2.04 NA 6476 12298 84.2043
## 1959-02-01 2.46 2.05 NA 6476 12298 83.5280
## VIXCLSx
## 1959-01-01 NA
## 1959-02-01 NA
In addition to the series in the FRED-MD dataset we also wish to look
at GDP. We use the series with id GDPC1
from FRED, which is an
inflation adjusted and seasonally adjusted real GDP measnured in
Billions of 2017 Dollars.
# Download U.S. Real GDP data from FRED
# after running just this line you will not see GDPC1 in your environment,
# but it is there, you can use it in the next line
getSymbols("GDPC1", src = "FRED")
## [1] "GDPC1"
# Let's restrict the series to be pre-Covid
GDPC1 <- GDPC1["1975/2019"]
# calculate log GDP
logGDPC1 <- log(GDPC1)
The GDPC1 object comes as a xts
time series object.
There are different types of time series formats and you will typically
have to check up how to restrict sample sizes depending on the
format.
One quick way to find out is to search for "R xts cheatsheet". You are likely to find this cheatsheet and on there you can find out how to restrict the sample for subperiods from the "Selecting, Subsetting & Indexing" section.
Let's plot the series.
plot(GDPC1)
Or as the logged series:
plot(logGDPC1)
In this workthrough we will not cover all details you may learn in yoru econometrics class about ADF tests. If that is what you are after you should refer to your Time-Series Econometrics textbook of your choice. However, recall that, if you wish to apply an ADF test to a series \(y\), you will be estimating the following regression:
\[\begin{equation} \Delta y_t = \mu + \beta t + \gamma y_{t-1} + \sum_{j=1}^p \theta_j \Delta y_{t-j} + w_t \end{equation}\]
It is the coefficient \(\gamma\) that will reveal if series \(y\) is stationary (\(\gamma < 0\)) or not (\(\gamma \geq 0\)). We use the t-test statistic
\[\begin{equation} t_{\gamma} = \frac{\hat{\gamma}}{se_{\gamma}} \end{equation}\]
to test \(H_0: \gamma = 0\) against \(H_A: \gamma <0\). If we can reject the null hypothesis we conclude that the series is stationary. However, we need to recognise that the distribution of this test statistic does not follow a t-distribution or asymptotic normal distribution. They follow what is often called a Dickey-Fuller distribution. This distribution also depends on whether a constant and a time trend are included.
We start by applying the ADF test procedure from the
tseries
package.
y <- logGDPC1
# Perform ADF test with default settings
adf_test_tseries <- adf.test(y, alternative = "stationary")
# Display the results
print(adf_test_tseries)
##
## Augmented Dickey-Fuller Test
##
## data: y
## Dickey-Fuller = -1.6632, Lag order = 5, p-value = 0.717
## alternative hypothesis: stationary
The lag length used in the adf.test
function is purely
dependent on the sample size,
lag order = trunc((n-1)^(1/3))
where n
is the
sample length, here 180.
The ADF test implemented in the adf.test
function
includes (check the help ?adf.test
)
This function does not give you control over which deterministic terms you want to include.
If you wish to be specific over the choice of deterministic terms,
then the unit root tests from the urca
package are a better
choice. Replicating the same test as above (i.e. including a constant
and a time trend) this is used as follows.
y <- logGDPC1
adf.urca.lG.trend <- ur.df(y, type = "trend", lags = 5)
summary(adf.urca.lG.trend)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0270361 -0.0031664 0.0005562 0.0039237 0.0306668
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.840e-01 1.075e-01 1.712 0.088736 .
## z.lag.1 -2.045e-02 1.229e-02 -1.663 0.098149 .
## tt 1.303e-04 8.633e-05 1.510 0.133002
## z.diff.lag1 3.051e-01 7.686e-02 3.970 0.000107 ***
## z.diff.lag2 8.514e-02 7.972e-02 1.068 0.287089
## z.diff.lag3 3.611e-02 8.002e-02 0.451 0.652420
## z.diff.lag4 1.667e-02 7.961e-02 0.209 0.834405
## z.diff.lag5 1.479e-03 7.671e-02 0.019 0.984641
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.006845 on 166 degrees of freedom
## Multiple R-squared: 0.1483, Adjusted R-squared: 0.1124
## F-statistic: 4.129 on 7 and 166 DF, p-value: 0.0003279
##
##
## Value of test-statistic is: -1.6632 6.9429 2.0179
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2 6.22 4.75 4.07
## phi3 8.43 6.49 5.47
The test statistic is identical -1.7308. You can find the 10%, 5% and
1% critical values in the tau3
row of the table for
critical values. The null hypothesis is that the series does have a unit
root. This null hypothesis is rejected in favour of the alternative
hypothesis of stationarity if the test statistic is smaller than the
critical value. Here the test statistic is not smaller even than the 10%
critical value and therefore we cannot reject the null hypothesis of a
unit root.
After seeing a plot of the time-series that is not a surprise.
When thinking of GDP, stationarity around a constant time trend is not very plausible and hence we may want to repeat the analysis with a ADF test that does not include a time trend.
y <- logGDPC1
adf.urca.lG.drift <- ur.df(y, type = "drift", lags = 5)
summary(adf.urca.lG.drift)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0273025 -0.0032421 0.0004136 0.0035704 0.0312430
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.023267 0.014706 1.582 0.115509
## z.lag.1 -0.002032 0.001539 -1.320 0.188573
## z.diff.lag1 0.299831 0.077070 3.890 0.000144 ***
## z.diff.lag2 0.076333 0.079808 0.956 0.340226
## z.diff.lag3 0.025626 0.080023 0.320 0.749194
## z.diff.lag4 0.005025 0.079535 0.063 0.949700
## z.diff.lag5 -0.014943 0.076229 -0.196 0.844822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.006872 on 167 degrees of freedom
## Multiple R-squared: 0.1366, Adjusted R-squared: 0.1056
## F-statistic: 4.403 on 6 and 167 DF, p-value: 0.0003691
##
##
## Value of test-statistic is: -1.3202 9.2041
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
Note that the test statistic, but also the critical values have changed. The critical values do depend on the deterministic terms that are being included. Yet the conclusion remains the same.
We still set the lag length to five, but the ur.df
function has the option to let an information criterion decide how many
lag lengths should be included. Recall that we include lag terms to
ensure that the ADF regression error terms are uncorrelated.
y <- logGDPC1
adf.urca.lG.drift <- ur.df(y, type = "drift", selectlags = "BIC")
summary(adf.urca.lG.drift)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0279167 -0.0032112 0.0004497 0.0037575 0.0305949
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.030355 0.013677 2.219 0.0277 *
## z.lag.1 -0.002728 0.001446 -1.887 0.0609 .
## z.diff.lag 0.325351 0.071178 4.571 9.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.006842 on 175 degrees of freedom
## Multiple R-squared: 0.1386, Adjusted R-squared: 0.1288
## F-statistic: 14.08 on 2 and 175 DF, p-value: 2.134e-06
##
##
## Value of test-statistic is: -1.8866 22.5329
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1 6.52 4.63 3.81
As we conclude that the series is certainly not ststionary (I(0)) we
now check whether it is I(1) or I(2). This is done by applying the same
test statistic to the difference of the series,
diff(logGDPC1)
. Judging by the plot for this series, there
seems to be no drift and hence we will not include any deterministic
term into the ADF regresison.
dy <- diff(logGDPC1)[2:length(logGDPC1)] # remove first obs which is NA
plot(dy)
adf.urca.dlG.drift <- ur.df(dy, type = "none", selectlags = "BIC")
summary(adf.urca.dlG.drift)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.022998 -0.001764 0.001891 0.005741 0.036387
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.25092 0.05957 -4.212 4.04e-05 ***
## z.diff.lag -0.27091 0.07237 -3.743 0.000246 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007328 on 175 degrees of freedom
## Multiple R-squared: 0.2362, Adjusted R-squared: 0.2275
## F-statistic: 27.06 on 2 and 175 DF, p-value: 5.775e-11
##
##
## Value of test-statistic is: -4.2125
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
The test statistic is now -6.545. We reject the null hypothesis that
diff(logGDPC1)
has a unit root (c.v. at 1% is -2.58). It is
therefore stationary, I(0), which in turn implies that
logGDPC1
is I(1).
Let us now return to the data series which are included in the
FRED-MD dataset, which we already loaded into the data.md
dataframe. We use the describe_md
function (part of the
fbi
package) to obtain information on the variables.
var.info <-describe_md(names(data.md),name.only = FALSE)
Let's look at the variable describing industrial production.
var.info[var.info$fred == "INDPRO",]
## id tcode ttype fred description
## 6 6 5 First difference of natural log: ln(x)-ln(x-1) INDPRO IP Index
## gsi gsi:description group edited
## 6 M_116460980 IP: total Output and Income FALSE
In the ttype
variable you see that this series'
transformation type is
First difference of natural log: ln(x)-ln(x-1)
. What this
says is that to obtain a stationary series for this variable, the first
difference of the series' log is calculated. This is exactly the same we
found for the GDP variable avove. We should therefore also expect to
find diff(log(INDPRO))
to be I(0) and
log(INDPRO)
to be I(1).
y = log(data.md.ts$INDPRO)
dy = diff(log(data.md.ts$INDPRO))
dy = dy[!is.na(dy)] # remove missing observations
plot(y)
plot(dy)
For now we are using the full available sample, but you could again think about restricting the sample. For instance you may want to consider whether your conclusions change if you exclude the Covid period.
Let's calculate the ADF tests.
adf.dip <- ur.df(dy, type = "none", selectlags = "BIC")
summary(adf.dip)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.128448 -0.002651 0.001546 0.005458 0.060564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.71521 0.04160 -17.191 <2e-16 ***
## z.diff.lag 0.04477 0.03559 1.258 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009609 on 784 degrees of freedom
## Multiple R-squared: 0.3443, Adjusted R-squared: 0.3426
## F-statistic: 205.8 on 2 and 784 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -17.1911
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
So, diff(log(INDPRO))
is I(0).
adf.ip <- ur.df(y, type = "trend", selectlags = "BIC")
summary(adf.ip)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.131509 -0.003865 0.000410 0.004118 0.057717
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.754e-02 9.530e-03 1.841 0.0661 .
## z.lag.1 -4.407e-03 2.831e-03 -1.557 0.1199
## tt 5.045e-06 5.661e-06 0.891 0.3731
## z.diff.lag 2.842e-01 3.419e-02 8.310 4.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009489 on 783 degrees of freedom
## Multiple R-squared: 0.0941, Adjusted R-squared: 0.09063
## F-statistic: 27.11 on 3 and 783 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -1.5567 7.7862 3.8656
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2 6.09 4.68 4.03
## phi3 8.27 6.25 5.34
The ADF test statistic here is -2.634. So in fact, the null
hypothesis of a unit root in log(INDPRO)
can be rejected at
the 10% significance level, but not at the 5% one. Concluding that the
evidence to reject the null hypothesis of a unit root in
log(INDPRO)
is not strong enough, we conclude that the
series is I(1).
If you include a time trend into the ADF regression, do you find
log(INDPRO)
to have a unit root?
It is important to note that the conclusions from the ADF testing strategy is not always crystal clear.
Let's have a look at the different transformation types associated to the time-series in FRED-MD.
unique(var.info$ttype)
## [1] First difference of natural log: ln(x)-ln(x-1)
## [2] First difference: x(t)-x(t-1)
## [3] Level (i.e. no transformation): x(t)
## [4] Natural log: ln(x)
## [5] Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))
## [6] First difference of percent change: (x(t)/x(t-1)-1)-(x(t-1)/x(t-2)-1)
## 6 Levels: First difference of natural log: ln(x)-ln(x-1) ...
Let's find a series which has no transformation
Level (i.e. no transformation)
applied, hence is deemed
stationary as is. Note that the variable name is saved in
varinfo$fred
.
var.info$fred[var.info$ttype == "Level (i.e. no transformation): x(t)"]
## [1] "CES0600000007" "AWHMAN" "COMPAPFFx" "TB3SMFFM"
## [5] "TB6SMFFM" "T1YFFM" "T5YFFM" "T10YFFM"
## [9] "AAAFFM" "BAAFFM"
Let's pick the AAAFFM
series.
var.info[var.info$fred == "AAAFFM",]
## id tcode ttype fred
## 99 99 1 Level (i.e. no transformation): x(t) AAAFFM
## description gsi gsi:description
## 99 Moody s Aaa Corporate Bond Minus FEDFUNDS <NA> Aaa-FF spread
## group edited
## 99 Interest and Exchange Rates FALSE
y <- data.md.ts$AAAFFM
It is the difference between the Corporate bond rate for AAA rated companies and the federal funds rate.
plot(y)
Given the transformation (or in this case absence of transformation) we should find this series to be stationary, I(0).
adf.aaa <- ur.df(y, type = "drift", selectlags = "BIC")
summary(adf.aaa)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9283 -0.1455 -0.0001 0.1493 5.6849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.078773 0.023485 3.354 0.000834 ***
## z.lag.1 -0.040378 0.008547 -4.724 2.73e-06 ***
## z.diff.lag 0.304927 0.034053 8.955 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4518 on 784 degrees of freedom
## Multiple R-squared: 0.1061, Adjusted R-squared: 0.1039
## F-statistic: 46.55 on 2 and 784 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -4.7244 11.1678
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1 6.43 4.59 3.78
Indeed, the test statistic is -4.72, which is statistically significant even at the 1% level. Hence we reject the H0 of non-stationarity in favour of the alternative of stationarity.
This explains why no transformation is required to achieve stationarity.
Next we chose a series which, according to the FRED-MD guidance, requires differencing twice to achieve stationarity.
var.info$fred[var.info$ttype == "Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))"]
## [1] "M1SL" "M2SL" "TOTRESNS" "BUSLOANS"
## [5] "REALLN" "NONREVSL" "WPSFD49207" "WPSFD49502"
## [9] "WPSID61" "WPSID62" "OILPRICEx" "PPICMM"
## [13] "CPIAUCSL" "CPIAPPSL" "CPITRNSL" "CPIMEDSL"
## [17] "CUSR0000SAC" "CUSR0000SAD" "CUSR0000SAS" "CPIULFSL"
## [21] "CUSR0000SA0L2" "CUSR0000SA0L5" "PCEPI" "DDURRG3M086SBEA"
## [25] "DNDGRG3M086SBEA" "DSERRG3M086SBEA" "CES0600000008" "CES2000000008"
## [29] "CES3000000008" "DTCOLNVHFNM" "DTCTHFNM" "INVEST"
You can see that there are many. Let's select OILPRICEx
,
which measures the oil price.
var.info[var.info$fred == "OILPRICEx",]
## id tcode
## 110 110 6
## ttype
## 110 Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))
## fred description gsi gsi:description
## 110 OILPRICEx Crude Oil, spliced WTI and Cushing M_110157273 Spot market price
## group edited
## 110 Prices TRUE
y <- log(data.md.ts$OILPRICEx)
dy <- diff(y)
d2y <- diff(dy)
dy <- dy[!is.na(dy)] # remove NAs
d2y <- d2y[!is.na(d2y)]
Let's plot the three series.
plot(y)
plot(dy)
plot(d2y)
The first thing you notice when looking at these plots is that until 1980 oil prices moved in distinct discrete steps, mainly determined by OPEC. From 1980 onwards there are more continuous price changes. So, let's restrict the series to start in 1981.
y <- window(y, start = as.Date("1981-01-01"), end = as.Date("2024-09-01")) # cut sample
dy <- diff(y)
d2y <- diff(dy)
dy <- dy[!is.na(dy)] # remove NAs
d2y <- d2y[!is.na(d2y)]
Let's only look at the post 1980 oil prices
Just looking at the plots we would certainly entertain the
possibility that log(OILPRICEx)
could be I(1). Let's run
the ADF tests.
adf.oil <- ur.df(y, type = "drift", selectlags = "BIC")
summary(adf.oil)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50843 -0.05175 0.00181 0.04958 0.69453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.049946 0.023019 2.170 0.0305 *
## z.lag.1 -0.013548 0.006255 -2.166 0.0308 *
## z.diff.lag 0.283102 0.042150 6.717 4.89e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08922 on 520 degrees of freedom
## Multiple R-squared: 0.08407, Adjusted R-squared: 0.08055
## F-statistic: 23.86 on 2 and 520 DF, p-value: 1.215e-10
##
##
## Value of test-statistic is: -2.1661 2.3673
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1 6.43 4.59 3.78
adf.doil <- ur.df(dy, type = "drift", selectlags = "BIC")
summary(adf.doil)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51415 -0.05171 -0.00024 0.04863 0.65174
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.000945 0.003896 0.243 0.80844
## z.lag.1 -0.815961 0.052502 -15.542 < 2e-16 ***
## z.diff.lag 0.126116 0.043593 2.893 0.00398 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08899 on 519 degrees of freedom
## Multiple R-squared: 0.3719, Adjusted R-squared: 0.3695
## F-statistic: 153.7 on 2 and 519 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -15.5416 120.7723
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1 6.43 4.59 3.78
adf.d2oil <- ur.df(d2y, type = "drift", selectlags = "BIC")
summary(adf.d2oil)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression drift
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.47654 -0.05619 -0.00483 0.05068 0.98676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0001996 0.0045266 -0.044 0.965
## z.lag.1 -1.6494538 0.0674624 -24.450 < 2e-16 ***
## z.diff.lag 0.2864183 0.0421338 6.798 2.93e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1033 on 518 degrees of freedom
## Multiple R-squared: 0.6704, Adjusted R-squared: 0.6692
## F-statistic: 526.9 on 2 and 518 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -24.45 298.9007
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1 6.43 4.59 3.78
Judging from these results one would judge that
diff(log(OILPRICEx))
is actually stationary and hence that
differencing once would be sufficient to create a stationary series.
Prices are often seen as I(2) series. This implies that inflation is
then a non-stationary I(1) series. While the McCracken and Ng (2016)
paper does not discuss their choice of transformation for achieving
stationarity in detail, you can see from their Appendix Table "Group 7 -
Prices" that almost all proce series are being treated as I(2)
series.
Explore the stationarity properties of more series in the FRED-MD database and compare those against the transformation chosen in FRED-MD.
This exercise is part of the ECLR page.