This exercise is part of the ECLR page.

Introduction

In time-series modelling it is crucial to understand the properties of the time-series used. It is now well understood that (standard) statistical inference in regression models relies on the series used in regression models being stationary.

In this walkthrough we will demonstrate how to use appropriate statistical tests (here the Augmented Dickey-Fuller, ADF, tests), to establish whether a series, or some differences version of a series, is stationary.

Data and Library upload

In order to work on this problem we upload a range of libraries.

library(tidyverse)
library(ggplot2)
library(stargazer)
library(fbi)      # for easy access to the FRED-MD database
library(tseries)  # for ADF test
library(urca)     # Another package for unit root testing
library(zoo)      # to transform data into ts format
library(quantmod) # to download data from FRED

The data we shall use come from the FRED-MD database. This is a prepared database of macro variables for the U.S. (see McCracken and Ng, 2016). If you wish to learn more about this very useful dataset refer to this walkthrough which discusses the finer detail of this dataset.

filepath <- "https://files.stlouisfed.org/files/htdocs/fred-md/monthly/2024-10.csv"
# for the most up to date dataset use
# filepath <- "https://files.stlouisfed.org/files/htdocs/fred-md/monthly/current.csv"

data.md <- fredmd(filepath, date_start = NULL, date_end = NULL, transform = FALSE)

The dataset contains 126 macro variables and the variable date in which the dates are stored. When applying time-series techniques, such as the ADF test below, to series, it is often advantageous for R to know that the data are time-series data. From data.md R does not know that. Yes, there is a variable called date and that variable does have a Date format (check str(data.md) to confirm), but all other series are merely numerical variables.

There are many different ways in which we can format all of the series as a time-series. The simplest uses the zoo package. That is a package designed to deal with time-series data.

data.md.ts <- read.zoo(data.md) # create a new object with ts objects
head(data.md.ts,2)
##                 RPI W875RX1 DPCERA3M086SBEA CMRMTSPLx  RETAILx  INDPRO IPFPNSS
## 1959-01-01 2583.560  2426.0          15.188  276676.8 18235.77 21.9616 23.3868
## 1959-02-01 2593.596  2434.8          15.346  278714.0 18369.56 22.3917 23.7024
##            IPFINAL IPCONGD IPDCONGD IPNCONGD IPBUSEQ   IPMAT  IPDMAT  IPNMAT
## 1959-01-01 22.2620 31.6664  18.9498  38.1496  8.0538 20.0318 11.9870 30.6611
## 1959-02-01 22.4549 31.8987  19.0492  38.5142  8.1616 20.6662 12.5391 31.1980
##            IPMANSICS IPB51222S IPFUELS  CUMFNS  HWI HWIURATIO CLF16OV CE16OV
## 1959-01-01   20.8325   19.9173 34.6443 80.1973 1357 0.3335792   67936  63868
## 1959-02-01   21.2154   19.8631 34.1724 81.4428 1421 0.3583859   67649  63684
##            UNRATE UEMPMEAN UEMPLT5 UEMP5TO14 UEMP15OV UEMP15T26 UEMP27OV
## 1959-01-01    6.0     16.3    1574      1169     1396       594      802
## 1959-02-01    5.9     15.5    1554      1164     1277       545      732
##            CLAIMSx PAYEMS USGOOD CES1021000001 USCONS MANEMP DMANEMP NDMANEMP
## 1959-01-01  291078  52478  18796         713.0   2993  14998    8740     6258
## 1959-02-01  282958  52688  18890         704.2   2980  15115    8839     6276
##            SRVPRD USTPU USWTRADE USTRADE USFIRE USGOVT CES0600000007 AWOTMAN
## 1959-01-01  33682 10774   2568.7  5350.3   2418   8105          39.8     2.5
## 1959-02-01  33798 10816   2575.4  5381.3   2420   8116          39.7     2.6
##            AWHMAN HOUST HOUSTNE HOUSTMW HOUSTS HOUSTW PERMIT PERMITNE PERMITMW
## 1959-01-01   40.2  1657     350     452    505    350     NA       NA       NA
## 1959-02-01   40.3  1667     346     469    508    344     NA       NA       NA
##            PERMITS PERMITW ACOGNO AMDMNOx ANDENOx AMDMUOx  BUSINVx ISRATIOx
## 1959-01-01      NA      NA     NA      NA      NA      NA 84889.56 1.560000
## 1959-02-01      NA      NA     NA      NA      NA      NA 85181.81 1.539338
##             M1SL  M2SL M2REAL BOGMBASE TOTRESNS NONBORRES BUSLOANS  REALLN
## 1959-01-01 138.9 286.6  987.9     50.5     18.9      18.3  35.2130 24.9242
## 1959-02-01 139.4 287.7  992.1     49.8     18.6      18.1  35.2201 25.2270
##            NONREVSL    CONSPI S&P 500 S&P div yield S&P PE ratio FEDFUNDS CP3Mx
## 1959-01-01 48.96116 0.1249647   55.62      3.158342     18.44574     2.48  3.30
## 1959-02-01 49.51371 0.1257651   54.77      3.219518     18.41812     2.43  3.26
##            TB3MS TB6MS  GS1  GS5 GS10  AAA  BAA COMPAPFFx TB3SMFFM TB6SMFFM
## 1959-01-01  2.82  3.09 3.36 4.01 4.02 4.12 4.87      0.82     0.34     0.61
## 1959-02-01  2.70  3.13 3.54 3.96 3.96 4.14 4.89      0.83     0.27     0.70
##            T1YFFM T5YFFM T10YFFM AAAFFM BAAFFM TWEXAFEGSMTHx EXSZUSx  EXJPUSx
## 1959-01-01   0.88   1.53    1.54   1.64   2.39            NA  4.3122 359.8417
## 1959-02-01   1.11   1.53    1.53   1.71   2.46            NA  4.3133 359.8417
##            EXUSUKx EXCAUSx WPSFD49207 WPSFD49502 WPSID61 WPSID62 OILPRICEx
## 1959-01-01  2.8065  0.9671       33.1       33.4    30.6    31.6         3
## 1959-02-01  2.8093  0.9748       33.2       33.4    30.7    31.4         3
##            PPICMM CPIAUCSL CPIAPPSL CPITRNSL CPIMEDSL CUSR0000SAC CUSR0000SAD
## 1959-01-01   32.5    29.01     44.8     29.3     21.1        33.3        38.1
## 1959-02-01   32.5    29.00     44.7     29.4     21.2        33.3        38.1
##            CUSR0000SAS CPIULFSL CUSR0000SA0L2 CUSR0000SA0L5  PCEPI
## 1959-01-01        22.9     28.9          30.7          29.6 15.164
## 1959-02-01        23.0     28.9          30.7          29.6 15.179
##            DDURRG3M086SBEA DNDGRG3M086SBEA DSERRG3M086SBEA CES0600000008
## 1959-01-01          63.517          18.294          10.152          2.13
## 1959-02-01          63.554          18.302          10.167          2.14
##            CES2000000008 CES3000000008 UMCSENTx DTCOLNVHFNM DTCTHFNM  INVEST
## 1959-01-01          2.45          2.04       NA        6476    12298 84.2043
## 1959-02-01          2.46          2.05       NA        6476    12298 83.5280
##            VIXCLSx
## 1959-01-01      NA
## 1959-02-01      NA

In addition to the series in the FRED-MD dataset we also wish to look at GDP. We use the series with id GDPC1 from FRED, which is an inflation adjusted and seasonally adjusted real GDP measnured in Billions of 2017 Dollars.

# Download U.S. Real GDP data from FRED
# after running just this line you will not see GDPC1 in your environment,
# but it is there, you can use it in the next line
getSymbols("GDPC1", src = "FRED")
## [1] "GDPC1"
# Let's restrict the series to be pre-Covid
GDPC1 <- GDPC1["1975/2019"]
# calculate log GDP
logGDPC1 <- log(GDPC1)

The GDPC1 object comes as a xts time series object. There are different types of time series formats and you will typically have to check up how to restrict sample sizes depending on the format.

One quick way to find out is to search for "R xts cheatsheet". You are likely to find this cheatsheet and on there you can find out how to restrict the sample for subperiods from the "Selecting, Subsetting & Indexing" section.

Let's plot the series.

plot(GDPC1)

Or as the logged series:

plot(logGDPC1)

ADF test

In this workthrough we will not cover all details you may learn in yoru econometrics class about ADF tests. If that is what you are after you should refer to your Time-Series Econometrics textbook of your choice. However, recall that, if you wish to apply an ADF test to a series \(y\), you will be estimating the following regression:

\[\begin{equation} \Delta y_t = \mu + \beta t + \gamma y_{t-1} + \sum_{j=1}^p \theta_j \Delta y_{t-j} + w_t \end{equation}\]

It is the coefficient \(\gamma\) that will reveal if series \(y\) is stationary (\(\gamma < 0\)) or not (\(\gamma \geq 0\)). We use the t-test statistic

\[\begin{equation} t_{\gamma} = \frac{\hat{\gamma}}{se_{\gamma}} \end{equation}\]

to test \(H_0: \gamma = 0\) against \(H_A: \gamma <0\). If we can reject the null hypothesis we conclude that the series is stationary. However, we need to recognise that the distribution of this test statistic does not follow a t-distribution or asymptotic normal distribution. They follow what is often called a Dickey-Fuller distribution. This distribution also depends on whether a constant and a time trend are included.

We start by applying the ADF test procedure from the tseries package.

y <- logGDPC1
# Perform ADF test with default settings
adf_test_tseries <- adf.test(y, alternative = "stationary")

# Display the results
print(adf_test_tseries)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  y
## Dickey-Fuller = -1.6632, Lag order = 5, p-value = 0.717
## alternative hypothesis: stationary

The lag length used in the adf.test function is purely dependent on the sample size, lag order = trunc((n-1)^(1/3)) where n is the sample length, here 180.

The ADF test implemented in the adf.test function includes (check the help ?adf.test)

This function does not give you control over which deterministic terms you want to include.

If you wish to be specific over the choice of deterministic terms, then the unit root tests from the urca package are a better choice. Replicating the same test as above (i.e. including a constant and a time trend) this is used as follows.

y <- logGDPC1
adf.urca.lG.trend <- ur.df(y, type = "trend",  lags = 5)
summary(adf.urca.lG.trend)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0270361 -0.0031664  0.0005562  0.0039237  0.0306668 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.840e-01  1.075e-01   1.712 0.088736 .  
## z.lag.1     -2.045e-02  1.229e-02  -1.663 0.098149 .  
## tt           1.303e-04  8.633e-05   1.510 0.133002    
## z.diff.lag1  3.051e-01  7.686e-02   3.970 0.000107 ***
## z.diff.lag2  8.514e-02  7.972e-02   1.068 0.287089    
## z.diff.lag3  3.611e-02  8.002e-02   0.451 0.652420    
## z.diff.lag4  1.667e-02  7.961e-02   0.209 0.834405    
## z.diff.lag5  1.479e-03  7.671e-02   0.019 0.984641    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006845 on 166 degrees of freedom
## Multiple R-squared:  0.1483, Adjusted R-squared:  0.1124 
## F-statistic: 4.129 on 7 and 166 DF,  p-value: 0.0003279
## 
## 
## Value of test-statistic is: -1.6632 6.9429 2.0179 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2  6.22  4.75  4.07
## phi3  8.43  6.49  5.47

The test statistic is identical -1.7308. You can find the 10%, 5% and 1% critical values in the tau3 row of the table for critical values. The null hypothesis is that the series does have a unit root. This null hypothesis is rejected in favour of the alternative hypothesis of stationarity if the test statistic is smaller than the critical value. Here the test statistic is not smaller even than the 10% critical value and therefore we cannot reject the null hypothesis of a unit root.

After seeing a plot of the time-series that is not a surprise.

When thinking of GDP, stationarity around a constant time trend is not very plausible and hence we may want to repeat the analysis with a ADF test that does not include a time trend.

y <- logGDPC1
adf.urca.lG.drift <- ur.df(y, type = "drift",  lags = 5)
summary(adf.urca.lG.drift)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0273025 -0.0032421  0.0004136  0.0035704  0.0312430 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.023267   0.014706   1.582 0.115509    
## z.lag.1     -0.002032   0.001539  -1.320 0.188573    
## z.diff.lag1  0.299831   0.077070   3.890 0.000144 ***
## z.diff.lag2  0.076333   0.079808   0.956 0.340226    
## z.diff.lag3  0.025626   0.080023   0.320 0.749194    
## z.diff.lag4  0.005025   0.079535   0.063 0.949700    
## z.diff.lag5 -0.014943   0.076229  -0.196 0.844822    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006872 on 167 degrees of freedom
## Multiple R-squared:  0.1366, Adjusted R-squared:  0.1056 
## F-statistic: 4.403 on 6 and 167 DF,  p-value: 0.0003691
## 
## 
## Value of test-statistic is: -1.3202 9.2041 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

Note that the test statistic, but also the critical values have changed. The critical values do depend on the deterministic terms that are being included. Yet the conclusion remains the same.

We still set the lag length to five, but the ur.df function has the option to let an information criterion decide how many lag lengths should be included. Recall that we include lag terms to ensure that the ADF regression error terms are uncorrelated.

y <- logGDPC1
adf.urca.lG.drift <- ur.df(y, type = "drift",  selectlags = "BIC")
summary(adf.urca.lG.drift)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0279167 -0.0032112  0.0004497  0.0037575  0.0305949 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.030355   0.013677   2.219   0.0277 *  
## z.lag.1     -0.002728   0.001446  -1.887   0.0609 .  
## z.diff.lag   0.325351   0.071178   4.571 9.15e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006842 on 175 degrees of freedom
## Multiple R-squared:  0.1386, Adjusted R-squared:  0.1288 
## F-statistic: 14.08 on 2 and 175 DF,  p-value: 2.134e-06
## 
## 
## Value of test-statistic is: -1.8866 22.5329 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.46 -2.88 -2.57
## phi1  6.52  4.63  3.81

As we conclude that the series is certainly not ststionary (I(0)) we now check whether it is I(1) or I(2). This is done by applying the same test statistic to the difference of the series, diff(logGDPC1). Judging by the plot for this series, there seems to be no drift and hence we will not include any deterministic term into the ADF regresison.

dy <- diff(logGDPC1)[2:length(logGDPC1)] # remove first obs which is NA
plot(dy)

adf.urca.dlG.drift <- ur.df(dy, type = "none",  selectlags = "BIC")
summary(adf.urca.dlG.drift)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.022998 -0.001764  0.001891  0.005741  0.036387 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.25092    0.05957  -4.212 4.04e-05 ***
## z.diff.lag -0.27091    0.07237  -3.743 0.000246 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007328 on 175 degrees of freedom
## Multiple R-squared:  0.2362, Adjusted R-squared:  0.2275 
## F-statistic: 27.06 on 2 and 175 DF,  p-value: 5.775e-11
## 
## 
## Value of test-statistic is: -4.2125 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

The test statistic is now -6.545. We reject the null hypothesis that diff(logGDPC1) has a unit root (c.v. at 1% is -2.58). It is therefore stationary, I(0), which in turn implies that logGDPC1 is I(1).

Revisiting FRED-MD stationarity properties

Let us now return to the data series which are included in the FRED-MD dataset, which we already loaded into the data.md dataframe. We use the describe_md function (part of the fbi package) to obtain information on the variables.

var.info <-describe_md(names(data.md),name.only = FALSE)

Industrial Production

Let's look at the variable describing industrial production.

var.info[var.info$fred == "INDPRO",]
##   id tcode                                          ttype   fred description
## 6  6     5 First difference of natural log: ln(x)-ln(x-1) INDPRO    IP Index
##           gsi gsi:description             group edited
## 6 M_116460980       IP: total Output and Income  FALSE

In the ttype variable you see that this series' transformation type is First difference of natural log: ln(x)-ln(x-1). What this says is that to obtain a stationary series for this variable, the first difference of the series' log is calculated. This is exactly the same we found for the GDP variable avove. We should therefore also expect to find diff(log(INDPRO)) to be I(0) and log(INDPRO) to be I(1).

y = log(data.md.ts$INDPRO)
dy = diff(log(data.md.ts$INDPRO))
dy = dy[!is.na(dy)]  # remove missing observations
plot(y)

plot(dy)

For now we are using the full available sample, but you could again think about restricting the sample. For instance you may want to consider whether your conclusions change if you exclude the Covid period.

Let's calculate the ADF tests.

adf.dip <- ur.df(dy, type = "none",  selectlags = "BIC")
summary(adf.dip)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.128448 -0.002651  0.001546  0.005458  0.060564 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.71521    0.04160 -17.191   <2e-16 ***
## z.diff.lag  0.04477    0.03559   1.258    0.209    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009609 on 784 degrees of freedom
## Multiple R-squared:  0.3443, Adjusted R-squared:  0.3426 
## F-statistic: 205.8 on 2 and 784 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -17.1911 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

So, diff(log(INDPRO)) is I(0).

adf.ip <- ur.df(y, type = "trend",  selectlags = "BIC")
summary(adf.ip)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.131509 -0.003865  0.000410  0.004118  0.057717 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.754e-02  9.530e-03   1.841   0.0661 .  
## z.lag.1     -4.407e-03  2.831e-03  -1.557   0.1199    
## tt           5.045e-06  5.661e-06   0.891   0.3731    
## z.diff.lag   2.842e-01  3.419e-02   8.310  4.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009489 on 783 degrees of freedom
## Multiple R-squared:  0.0941, Adjusted R-squared:  0.09063 
## F-statistic: 27.11 on 3 and 783 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -1.5567 7.7862 3.8656 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

The ADF test statistic here is -2.634. So in fact, the null hypothesis of a unit root in log(INDPRO) can be rejected at the 10% significance level, but not at the 5% one. Concluding that the evidence to reject the null hypothesis of a unit root in log(INDPRO) is not strong enough, we conclude that the series is I(1).

If you include a time trend into the ADF regression, do you find log(INDPRO) to have a unit root?

It is important to note that the conclusions from the ADF testing strategy is not always crystal clear.

Corporate bond rates

Let's have a look at the different transformation types associated to the time-series in FRED-MD.

unique(var.info$ttype)
## [1] First difference of natural log: ln(x)-ln(x-1)                       
## [2] First difference: x(t)-x(t-1)                                        
## [3] Level (i.e. no transformation): x(t)                                 
## [4] Natural log: ln(x)                                                   
## [5] Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))  
## [6] First difference of percent change: (x(t)/x(t-1)-1)-(x(t-1)/x(t-2)-1)
## 6 Levels: First difference of natural log: ln(x)-ln(x-1) ...

Let's find a series which has no transformation Level (i.e. no transformation) applied, hence is deemed stationary as is. Note that the variable name is saved in varinfo$fred.

var.info$fred[var.info$ttype == "Level (i.e. no transformation): x(t)"]
##  [1] "CES0600000007" "AWHMAN"        "COMPAPFFx"     "TB3SMFFM"     
##  [5] "TB6SMFFM"      "T1YFFM"        "T5YFFM"        "T10YFFM"      
##  [9] "AAAFFM"        "BAAFFM"

Let's pick the AAAFFM series.

var.info[var.info$fred == "AAAFFM",]
##    id tcode                                ttype   fred
## 99 99     1 Level (i.e. no transformation): x(t) AAAFFM
##                                  description  gsi gsi:description
## 99 Moody s Aaa Corporate Bond Minus FEDFUNDS <NA>   Aaa-FF spread
##                          group edited
## 99 Interest and Exchange Rates  FALSE
y <- data.md.ts$AAAFFM

It is the difference between the Corporate bond rate for AAA rated companies and the federal funds rate.

plot(y)

Given the transformation (or in this case absence of transformation) we should find this series to be stationary, I(0).

adf.aaa <- ur.df(y, type = "drift",  selectlags = "BIC")
summary(adf.aaa)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9283 -0.1455 -0.0001  0.1493  5.6849 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.078773   0.023485   3.354 0.000834 ***
## z.lag.1     -0.040378   0.008547  -4.724 2.73e-06 ***
## z.diff.lag   0.304927   0.034053   8.955  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4518 on 784 degrees of freedom
## Multiple R-squared:  0.1061, Adjusted R-squared:  0.1039 
## F-statistic: 46.55 on 2 and 784 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -4.7244 11.1678 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78

Indeed, the test statistic is -4.72, which is statistically significant even at the 1% level. Hence we reject the H0 of non-stationarity in favour of the alternative of stationarity.

This explains why no transformation is required to achieve stationarity.

Oil prices

Next we chose a series which, according to the FRED-MD guidance, requires differencing twice to achieve stationarity.

var.info$fred[var.info$ttype == "Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))"]
##  [1] "M1SL"            "M2SL"            "TOTRESNS"        "BUSLOANS"       
##  [5] "REALLN"          "NONREVSL"        "WPSFD49207"      "WPSFD49502"     
##  [9] "WPSID61"         "WPSID62"         "OILPRICEx"       "PPICMM"         
## [13] "CPIAUCSL"        "CPIAPPSL"        "CPITRNSL"        "CPIMEDSL"       
## [17] "CUSR0000SAC"     "CUSR0000SAD"     "CUSR0000SAS"     "CPIULFSL"       
## [21] "CUSR0000SA0L2"   "CUSR0000SA0L5"   "PCEPI"           "DDURRG3M086SBEA"
## [25] "DNDGRG3M086SBEA" "DSERRG3M086SBEA" "CES0600000008"   "CES2000000008"  
## [29] "CES3000000008"   "DTCOLNVHFNM"     "DTCTHFNM"        "INVEST"

You can see that there are many. Let's select OILPRICEx, which measures the oil price.

var.info[var.info$fred == "OILPRICEx",]
##      id tcode
## 110 110     6
##                                                                   ttype
## 110 Second difference of natural log: (ln(x)-ln(x-1))-(ln(x-1)-ln(x-2))
##          fred                        description         gsi   gsi:description
## 110 OILPRICEx Crude Oil, spliced WTI and Cushing M_110157273 Spot market price
##      group edited
## 110 Prices   TRUE
y <- log(data.md.ts$OILPRICEx)
dy <- diff(y)
d2y <- diff(dy)
dy <- dy[!is.na(dy)]  # remove NAs
d2y <- d2y[!is.na(d2y)]

Let's plot the three series.

plot(y)

plot(dy)

plot(d2y)

The first thing you notice when looking at these plots is that until 1980 oil prices moved in distinct discrete steps, mainly determined by OPEC. From 1980 onwards there are more continuous price changes. So, let's restrict the series to start in 1981.

y <- window(y, start = as.Date("1981-01-01"), end = as.Date("2024-09-01"))  # cut sample
dy <- diff(y)
d2y <- diff(dy)
dy <- dy[!is.na(dy)]  # remove NAs
d2y <- d2y[!is.na(d2y)]

Let's only look at the post 1980 oil prices

Just looking at the plots we would certainly entertain the possibility that log(OILPRICEx) could be I(1). Let's run the ADF tests.

adf.oil <- ur.df(y, type = "drift",  selectlags = "BIC")
summary(adf.oil)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50843 -0.05175  0.00181  0.04958  0.69453 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.049946   0.023019   2.170   0.0305 *  
## z.lag.1     -0.013548   0.006255  -2.166   0.0308 *  
## z.diff.lag   0.283102   0.042150   6.717 4.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08922 on 520 degrees of freedom
## Multiple R-squared:  0.08407,    Adjusted R-squared:  0.08055 
## F-statistic: 23.86 on 2 and 520 DF,  p-value: 1.215e-10
## 
## 
## Value of test-statistic is: -2.1661 2.3673 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78
adf.doil <- ur.df(dy, type = "drift",  selectlags = "BIC")
summary(adf.doil)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51415 -0.05171 -0.00024  0.04863  0.65174 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.000945   0.003896   0.243  0.80844    
## z.lag.1     -0.815961   0.052502 -15.542  < 2e-16 ***
## z.diff.lag   0.126116   0.043593   2.893  0.00398 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08899 on 519 degrees of freedom
## Multiple R-squared:  0.3719, Adjusted R-squared:  0.3695 
## F-statistic: 153.7 on 2 and 519 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -15.5416 120.7723 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78
adf.d2oil <- ur.df(d2y, type = "drift",  selectlags = "BIC")
summary(adf.d2oil)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.47654 -0.05619 -0.00483  0.05068  0.98676 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0001996  0.0045266  -0.044    0.965    
## z.lag.1     -1.6494538  0.0674624 -24.450  < 2e-16 ***
## z.diff.lag   0.2864183  0.0421338   6.798 2.93e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1033 on 518 degrees of freedom
## Multiple R-squared:  0.6704, Adjusted R-squared:  0.6692 
## F-statistic: 526.9 on 2 and 518 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -24.45 298.9007 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78

Judging from these results one would judge that diff(log(OILPRICEx)) is actually stationary and hence that differencing once would be sufficient to create a stationary series. Prices are often seen as I(2) series. This implies that inflation is then a non-stationary I(1) series. While the McCracken and Ng (2016) paper does not discuss their choice of transformation for achieving stationarity in detail, you can see from their Appendix Table "Group 7 - Prices" that almost all proce series are being treated as I(2) series.

Extension

Explore the stationarity properties of more series in the FRED-MD database and compare those against the transformation chosen in FRED-MD.

Reading

McCracken, M. and Ng, S. (2016) FRED-MD: A Monthly Database for Macroeconomic Research, Journal of Business & Economic Statistics, 34, 574-589.

This exercise is part of the ECLR page.