Econometrics Tools¶
OLS¶
To estimate an OLS regression, you pass the reg()
function at least three arguments
- The DataFrame that contains the data.
- The name of the dependent variable as a string.
- The name(s) of the independent variable(s) as a string (for one variable) or as a list.
Following these arguments, there are a number of keyword arguments for various
other options. For example, the following code estimates a basic wage
regression with state-level clustering and fixed effects, weighting by the
variable sample_wt
.
import pandas as pd
import econtools.metrics as mt
# Load a data file with columns 'ln_wage', 'educ', and 'state'
df = pd.read_csv('my_data.csv')
y = 'wage'
X = ['educ', 'age', 'male']
fe_var = 'state'
cluster_var = 'state'
weights_var = 'sample_wt'
results = mt.reg(
df, # DataFrame
y, # Dependent var (string)
X, # Independent var(s) (string or list of strings)
fe_name=fe_var, # Fixed-effects/absorb var (string)
cluster=cluster_var # Cluster var (string)
awt_name=weights_var # Sample weights
)
Note that reg()
does not automatically estimate a
constant term. In order to have a constant/intercept in your model, you can (a)
add a column of ones to your DataFrame, or (b) use the addcons
keyword arg:
results = mt.reg(
df,
y,
X, # does not include a constant/intercept
addcons=True # Adds a constant term
)
Instrumental Variables¶
Estimating an instrumental variables model is very similar, but is done using
the ivreg()
function. The order of arguments is
also slightly different in order to differentiate between the instruments,
endogenous regressors, and exogenous regressors. Other keyword options, such as
addcons
, cluster
, and so forth, are exactly the same as with
reg()
.
One additional keyword argument is method, which sets the IV method used to
estimate the model. Currently supported values are '2sls'
(the default) and
'liml'
.
# <Imports and loading data>
y = 'wage' # Dependent var
X = ['educ'] # Endogenous regressor(s)
Z = ['treatment'] # Instrumental variable(s)
W = [ 'age', 'male'] # Exogenous regressor(s)
results = mt.ivreg(df, y, X, Z, W)
Returned Results¶
The regression functions reg()
and
ivreg()
return a custom
Results
object that contains beta
estimates, variance-covariance matrix, and other relevant info.
The easiest way to see regression results is the summary
attribute. But
direct access to estimates is also possible.
import pandas as pd
import econtools.metrics as mt
df = pd.read_stata('some_data.dta')
results = mt.reg(df, 'ln_wage', ['educ', 'age'], addcons=True)
# Print a nice summary of the regression results (a string)
print(results)
# Print DataFrame w/ betas, se's, t-stats, etc.
print(results.summary)
# Print only betas
print(results.beta)
# Print std. err. for `educ` coefficient
print(results.se['educ'])
# Print full variance-covariance matrix
print(results.vce)
The full list of attributes is listed here
.
F tests¶
econtools.metrics
contains two functions for conducting F tests.
The first, Ftest()
, is for simple,
Stata-like tests for joint significance or equality. It is a method on the
Results
object.
results = mt.reg(df, 'ln_wage', ['educ', 'age'], addcons=True)
# Test for joint significance
F1, pF1 = results.Ftest(['educ', 'age'])
# Test for equality
F2, pF2 = results.Ftest(['educ', 'age'], equal=True)
The second, f_test()
, is for F tests of arbitrary
linear combinations of coefficients. The tests are defined by an R
matrix and an r
vector such that the null hypothesis is \(R\beta = r\).
Other Estimation Options¶
Save memory by not computing predicted values¶
The save_mem
flag can be used to reduce the memory footprint of the
Results
object by not saving predicted
values for the dependent variable (yhat
) and the residuals (resid
), as
well as the sample flag (sample
). Since these vectors are always size N (or
bigger for sample
), setting save_mem=True
can be very useful when
running many regressions on large samples.
Check for colinear columns¶
The check_colinear
flag can be used to check whether the list of regressors
contains any colinear variables. More technically, when check_colinear
is
True
, the regression function checks whether the regressor matrix X is full
rank. If it is not full rank, it figures out which columns are colinear and
prints the names of those columns to screen. It does not automatically drop
colinear columns.
Because these checks can be computationally expensive, check_colinear
defaults to False
.
Spatial HAC (Conley errors)¶
Spatial HAC standard errors (as in
Conley (1999),
Kelejian and Prucha (2007),
etc.) can be calculated by passing a dictionary with the relevant fields to the
shac
keyword:
shac_params = {
'x': 'longitude', # Column in `df`
'y': 'latitude', # Column in `df`
'kern': 'unif', # Kernel name
'band': 2, # Kernel bandwidth
}
df = pd.read_stata('reg_data.dta')
results = mt.reg(df, 'lnp', ['sqft', 'rooms'],
fe_name='state',
shac=shac_params)
Important
The band
parameter is assumed to be in the same units as x
and
y
. If x
and y
are degrees latitude/longitude, band
should
also be in degrees. econtools
does not do any advanced geographic
distance calculations here, just simple Euclidean distance.