13 Panel Data Misspecification Tests

Please note that the information provided in this section (panel data misspecification tests) are only for those of you who are interested to explore further. You are not responsible from these tests in your 6036ECN exam.

In this section, we will continue with the examples from the previous section and test for Autocorrelation and Heteroscedasticity.

European Countries Gasoline Consumption data is obtained from Abay Mulatu 316ECN Applied Econometrics lecture material, Coventry University.

We will be using the all country sample of the gasoline demand data.

We start by loading the required libraries.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(Hmisc) # add labels to variables


Attaching package: 'Hmisc'

The following objects are masked from 'package:dplyr':

    src, summarize

The following objects are masked from 'package:base':

    format.pval, units

#library(ggplot2)
#library(dplyr) # for data manipulation
library(plm) # to estimate linear panel data models


Attaching package: 'plm'

The following objects are masked from 'package:dplyr':

    between, lag, lead

#library(fastDummies) # create dummies based on categorical (factor) variable

13.1 Testing for Autocorrelation

13.1.1 Task 1

Using the gasoline-demand-all-countries.csv data, estimate a one-way fixed effects gasoline demand model.

13.1.1.1 Guidance

This is a replication of what we have done in the previous section.

df <- read.csv("~/Desktop/R-workshops/assets/data/gasoline-demand-all-countries.csv", stringsAsFactors=TRUE)
#View(df)
# label variables
label(df$L_gas_cons_pcar) <- "Logarithm of gasoline consumption per car"
label(df$L_income_pc) <- "Logarithm of real income per capita"
label(df$L_gas_price) <- "Logarithm of real gasoline price per gallon"
label(df$L_cars_pc) <- "Logarithm of number of cars per capita"

We estimate a fixed effects model below:

# Panel Data Fixed Effects Model
fixed_1 <- plm(L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
               data = df, index = c("country", "year"), 
               model = "within")
summary(fixed_1)

Oneway (individual) effect Within Model

Call:
plm(formula = L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
    data = df, model = "within", index = c("country", "year"))

Balanced Panel: n = 18, T = 19, N = 342

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-0.378774 -0.039758  0.004650  0.045412  0.362856 

Coefficients:
             Estimate Std. Error  t-value  Pr(>|t|)    
L_income_pc  0.662250   0.073386   9.0242 < 2.2e-16 ***
L_gas_price -0.321702   0.044099  -7.2950 2.355e-12 ***
L_cars_pc   -0.640483   0.029679 -21.5804 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    17.061
Residual Sum of Squares: 2.7365
R-Squared:      0.8396
Adj. R-Squared: 0.82961
F-statistic: 560.093 on 3 and 321 DF, p-value: < 2.22e-16

13.1.2 Task 2

Test for the existence of Autocorrelation.

13.1.2.1 Guidance

Wooldridge Test for AR(1) Errors in FE Panel Models.

Could be used for short and long panels.
Could be used for fixed effects models only.
Tests for Autocorrelation of order 1.

pwartest(fixed_1)


    Wooldridge's test for serial correlation in FE panels

data:  fixed_1
F = 212, df1 = 1, df2 = 322, p-value < 2.2e-16
alternative hypothesis: serial correlation

According to the test results, there is autocorrelation of order 1.

Breusch-Godfrey Test for Panel Models

Suitable for long panels.
Allows to choose the order to test for.
It could be used for both the fixed effects and random effects models.

pbgtest(fixed_1)


    Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc
chisq = 185.08, df = 19, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

There is autocorrelation of order 1.

Let’s say we want to test for Autocorrelation of order 3:

pbgtest(fixed_1, order = 3)


    Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc
chisq = 172.76, df = 3, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

There is autocorrelation up to order 3.

13.1.3 Task 3

Estimate a random effects model

13.1.3.1 Guidance

We have this in the previous section. Here is a random effects version of the above fixed effects model.

# Panel Data Fixed Effects Model
random_1 <- plm(L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
               data = df, index = c("country", "year"), 
               model = "random")
summary(random_1)

Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
    data = df, model = "random", index = c("country", "year"))

Balanced Panel: n = 18, T = 19, N = 342

Effects:
                   var  std.dev share
idiosyncratic 0.008525 0.092330 0.182
individual    0.038238 0.195545 0.818
theta: 0.8923

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.3977058 -0.0520350  0.0050877  0.0582288  0.3763726 

Coefficients:
             Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept)  1.996698   0.184326  10.8324 < 2.2e-16 ***
L_income_pc  0.554986   0.059128   9.3861 < 2.2e-16 ***
L_gas_price -0.420389   0.039978 -10.5155 < 2.2e-16 ***
L_cars_pc   -0.606840   0.025515 -23.7836 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    18.054
Residual Sum of Squares: 3.0817
R-Squared:      0.82931
Adj. R-Squared: 0.8278
Chisq: 1642.2 on 3 DF, p-value: < 2.22e-16

13.1.4 Task 4

Test for autocorrelation in the above estimated random effects model.

13.1.4.1 Guidance

We can start by applying the Breusch-Godfrey test for panel models.

pbgtest(random_1)


    Breusch-Godfrey/Wooldridge test for serial correlation in panel models

data:  L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc
chisq = 193.42, df = 19, p-value < 2.2e-16
alternative hypothesis: serial correlation in idiosyncratic errors

There is autocorrelation of order 1.

Baltagi and Li Serial Dependence Test for Random Effect Models

The test can be used with random effects models only.

pbltest(random_1)


    Baltagi and Li two-sided LM test

data:  formula(x$formula)
chisq = 225.16, df = 1, p-value < 2.2e-16
alternative hypothesis: AR(1)/MA(1) errors in RE panel model

This test confirms the results of the previous one, in that, there is autocorrelation of order 1.

13.2 Heteroscedasticity

13.2.1 Task 5

Test for the existence of heteroscedasticity in each of the models estimated above.

13.2.1.1 Guidance

Lagrange FF Multiplier Tests for Panel Data

We will apply plmtest() with type = "bp".

plmtest(fixed_1, type = "bp")


    Lagrange Multiplier Test - (Breusch-Pagan)

data:  L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc
chisq = 1465.6, df = 1, p-value < 2.2e-16
alternative hypothesis: significant effects

plmtest(random_1, type = "bp")


    Lagrange Multiplier Test - (Breusch-Pagan)

data:  L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc
chisq = 1465.6, df = 1, p-value < 2.2e-16
alternative hypothesis: significant effects

There is heteroscedasticity.

13.2.2 Task 6

Report the fixed and random effects estimation results with autocorrelation and heteroscedasticity robust standard errors.

13.2.2.1 Guidance

Existence of autocorrelation and heteroscedasticity affects the standard error estimates. Although the coefficients will remain unbiased, the standard errors will be biased and hence all tests we perform on the model will be misleading. We may either try to resolve these issues though changing our model or estimation strategy or computer heteroskedasticity or autocorrelation adjusted standard errors.

Calculation of robust standard errors is common in Applied Econometrics. Since the models presented here are panel data estimation, we will be using plm package’s vcovHC function. Below is the information from R help about the arguments that this function can take:

vcovHC(
  x,
  method = c("arellano", "white1", "white2"),
  type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"),
  cluster = c("group", "time"),
  ...
)

All methods assume no intragroup (serial) correlation between errors and allow for heteroskedasticity across groups (time periods).
white1 allows for general heteroskedasticity but no serial (cross–sectional) correlation;
white2 is white1 restricted to a common variance inside every group (time period)
arellano allows a fully general structure w.r.t. heteroskedasticity and serial (cross–sectional) correlation.

Hence, we will be using arellano here.

The type argument in vcovHC() allows you to specify different weighting schemes for the robust standard errors. The most common options include:

"HC0": The classic White (1980) estimator. It does not account for small sample bias.
"HC1": A finite-sample correction used in Eicker-Huber-White heteroskedasticity-consistent covariance matrix estimators.
"HC2": Corrects for leverage by dividing residuals by (\(1−h_{ii}\)), where \(h_{ii}\) is the leverage of observation \(i\).
"HC3": A further adjustment for leverage, dividing residuals by \((1−h_{ii})^2\), making it more robust in small samples.
"HC4": A more extreme correction for leverage, giving higher influence to high-leverage points.

R’s default:

summary(fixed_1, vcov = vcovHC)

Oneway (individual) effect Within Model

Note: Coefficient variance-covariance matrix supplied: vcovHC

Call:
plm(formula = L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
    data = df, model = "within", index = c("country", "year"))

Balanced Panel: n = 18, T = 19, N = 342

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-0.378774 -0.039758  0.004650  0.045412  0.362856 

Coefficients:
             Estimate Std. Error t-value  Pr(>|t|)    
L_income_pc  0.662250   0.153279  4.3205 2.078e-05 ***
L_gas_price -0.321702   0.122275 -2.6310  0.008925 ** 
L_cars_pc   -0.640483   0.096654 -6.6266 1.451e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    17.061
Residual Sum of Squares: 2.7365
R-Squared:      0.8396
Adj. R-Squared: 0.82961
F-statistic: 17.005 on 3 and 17 DF, p-value: 2.3028e-05

Use of method = "arellano" with type = "HC1":

summary(fixed_1, function(x) vcovHC(x, method = "arellano", type = "HC1"))

Oneway (individual) effect Within Model

Note: Coefficient variance-covariance matrix supplied: function(x) vcovHC(x, method = "arellano", type = "HC1")

Call:
plm(formula = L_gas_cons_pcar ~ L_income_pc + L_gas_price + L_cars_pc, 
    data = df, model = "within", index = c("country", "year"))

Balanced Panel: n = 18, T = 19, N = 342

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-0.378774 -0.039758  0.004650  0.045412  0.362856 

Coefficients:
            Estimate Std. Error t-value  Pr(>|t|)    
L_income_pc  0.66225    0.15396  4.3016 2.254e-05 ***
L_gas_price -0.32170    0.12281 -2.6194  0.009227 ** 
L_cars_pc   -0.64048    0.09708 -6.5975 1.726e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    17.061
Residual Sum of Squares: 2.7365
R-Squared:      0.8396
Adj. R-Squared: 0.82961
F-statistic: 16.8559 on 3 and 17 DF, p-value: 2.4334e-05