Title: | Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models |
Version: | 1.1 |
Maintainer: | Junyu Chen <junyu.chen@outlook.de> |
Description: | Implements the Arellano-Bond estimation method combined with LASSO for dynamic linear panel models. See Chernozhukov et al. (2024) "Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models". arXiv preprint <doi:10.48550/arXiv.2402.00584>. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Imports: | hdm, matrixStats, mvtnorm, stats |
Depends: | R (≥ 2.10) |
LazyData: | true |
NeedsCompilation: | no |
Packaged: | 2025-02-02 12:25:17 UTC; junyuchen |
Author: | Victor Chernozhukov [aut], Ivan Fernandez-Val [aut], Chen Huang [aut], Weining Wang [aut], Junyu Chen [cre] |
Repository: | CRAN |
Date/Publication: | 2025-02-02 12:50:07 UTC |
AB-LASSO Estimator with Random Sample Splitting for Multivariate Models
Description
Implements the AB-LASSO estimation method for the multivariate model
Y_{it} = \alpha_{i} + \gamma_{t} + \sum_{j=1}^{L} \beta_{j} Y_{i,t-j} + \theta_{0} D_{it} + \theta_{1} C_{i,t-1} + \varepsilon_{it}
, with random sample splitting. Note that D_{it}
and C_{it}
are predetermined with respect to \varepsilon_{it}
.
Usage
ablasso_mv_ss(Y, D, C, lag = 1, Kf = 2, nboot = 100, seed = 202302)
Arguments
Y |
A |
D |
A |
C |
A list of |
lag |
The lag order of |
Kf |
The number of folds for K-fold cross-validation, with options being |
nboot |
The number of random sample splits, default is |
seed |
Seed for random number generation, default |
Value
A dataframe that includes the estimated coefficients (\beta_{j}, \theta_{0}, \theta_{1}
), their standard errors, and T-statistics.
Examples
# Use the Covid data
N = length(unique(covid_data$fips))
P = length(unique(covid_data$week))
Y = matrix(covid_data$logdc, nrow = P, ncol = N)
D = matrix(covid_data$dlogtests, nrow = P, ncol = N)
C = list()
C[[1]] = matrix(covid_data$school, nrow = P, ncol = N)
C[[2]] = matrix(covid_data$college, nrow = P, ncol = N)
C[[3]] = matrix(covid_data$pmask, nrow = P, ncol = N)
C[[4]] = matrix(covid_data$pshelter, nrow = P, ncol = N)
C[[5]] = matrix(covid_data$pgather50, nrow = P, ncol = N)
results.kf2 <- ablasso_mv_ss(Y = Y, D = D, C = C, lag = 4, nboot = 2)
print(results.kf2)
results.kf5 <- ablasso_mv_ss(Y = Y, D = D, C = C, lag = 4, Kf = 5, nboot = 2)
print(results.kf5)
AB-LASSO Estimator Without Sample Splitting
Description
Implements the AB-LASSO estimation method for the univariate model Y_{it} = \alpha_{i} + \gamma_{t} + \theta_{1} Y_{i,t-1} + \theta_{2} D_{it} + \varepsilon_{it}
, without sample splitting. Note that D_{it}
is predetermined with respect to \varepsilon_{it}
.
Usage
ablasso_uv(Y, D)
Arguments
Y |
A |
D |
A |
Value
A list with three elements:
theta.hat: Estimated coefficients.
std.hat: Estimated Standard errors.
stat: T-Statistics.
Examples
# Generate data
data1 <- generate_data(N = 300, P = 40)
# You can use your own data by providing matrices `Y` and `D`
results <- ablasso_uv(Y = data1$Y, D = data1$D)
print(results)
AB-LASSO Estimator with Random Sample Splitting
Description
Implements the AB-LASSO estimation method for the univariate model Y_{it} = \alpha_{i} + \gamma_{t} + \theta_{1} Y_{i,t-1} + \theta_{2} D_{it} + \varepsilon_{it}
, incorporating random sample splitting. Note that D_{it}
is predetermined with respect to \varepsilon_{it}
.
Usage
ablasso_uv_ss(Y, D, nboot = 100, Kf = 2, seed = 202304)
Arguments
Y |
A |
D |
A |
nboot |
The number of random sample splits, default is |
Kf |
The number of folds for K-fold cross-validation, with options being |
seed |
Seed for random number generation, default |
Value
A list with three elements:
theta.hat: Estimated coefficients.
std.hat: Estimated Standard errors.
stat: T-Statistics.
Examples
# Generate data
data1 <- generate_data(N = 300, P = 40)
# You can use your own data by providing matrices `Y` and `D`
results.ss <- ablasso_uv_ss(Y = data1$Y, D = data1$D, nboot = 2)
print(results.ss)
results.ss2 <- ablasso_uv_ss(Y = data1$Y, D = data1$D, nboot = 2, Kf = 5)
print(results.ss2)
COVID-19 Spread and School Policy Effects Data
Description
A balanced panel data set analyzing the impact of K-12 school openings and other policy measures on the spread of COVID-19 across U.S. counties. The data spans 32 weeks from April 1st to December 2nd, 2020, and covers 2510 counties.
Usage
covid_data
Format
A data frame with 80320 (2510 counties times 32 weeks) rows and 9 columns. Each column represents a variable:
- fips
County FIPS
- week
Week
- school
A measure of visits to K-12 schools from SafeGraph foot traffic data
- logdc
Logarithm of the number of reported COVID-19 cases
- pmask
Policy indicators on mask mandates
- pgather50
Policy indicators on ban on gatherings of more than 50 persons
- college
Measure of visits to colleges
- pshelter
Policy indicators on stay-at-home orders
- dlogtests
A measure of the weekly growth rate in the number of tests
Source
Data initially provided by Victor Chernozhukov, Hiroyuki Kasahara, and Paul Schrimpf on the GitHub repository https://github.com/ubcecon/covid-schools. Counties with missing values are dropped to obtain a balanced panel dataset.
Examples
data(covid_data) # Access the dataset
Generate a Dataset for Simulations
Description
Generates data according to the following process:
Y_{it} = \alpha_{i} + \gamma_{t} + \theta_{1} Y_{i,t-1} + \theta_{2} D_{it} + \varepsilon_{it}
and
D_{it} = \rho D_{i,t-1} + v_{i,t}
.
Note that D_{it}
is predetermined with respect to \varepsilon_{it}
.
Usage
generate_data(
N,
P,
sigma_alpha = 1,
sigma_gamma = 1,
sigma_eps.d = 1,
sigma_eps.y = 1,
cov_eps = 0.5,
rho = 0.5,
theta = c(0.8, 1),
seed = 202304
)
Arguments
N |
An integer specifying the number of individuals. |
P |
An integer specifying the number of time periods. |
sigma_alpha |
Standard deviation for the normal distribution from which the individual effect |
sigma_gamma |
Standard deviation for the normal distribution from which the time effect |
sigma_eps.d |
Standard deviation for the error term associated with the policy variable/treatment ( |
sigma_eps.y |
Standard deviation for the error term associated with the outcome/response variable ( |
cov_eps |
Covariance between error terms of |
rho |
Autocorrelation coefficient for |
theta |
Regression Coefficients for univariate AR(1) dynamic panal, default |
seed |
Seed for random number generation, default |
Value
A list of two P
x N
matrices named Y
(outcome/response variable) and D
(policy variable/treatment).
Examples
# Generate data using default parameters
data1 <- generate_data(N = 300, P = 40)
str(data1)
data2 <- generate_data(N = 500, P = 20)
str(data2)