Function Signatures¶
Data I/O¶
-
econtools.
load_or_build
(raw_filepath: str, copydta: bool = False, path_args: list = []) → Callable¶ Loads raw_filepath as a DataFrame if it exists, otherwise builds the data and saves it to raw_filepath.
Parameters: raw_filepath (str) – Path to saved DataFrame. If raw_filepath
includes named replacement fields (e.g., “'{arg_name}'
”) with the same name as function arguments, passed values will be inserted into the file path.Example
@load_or_build('data_for_{year}.pkl') def make_data(year): <Make the data> df = make_data(2018) # Saves to 'data_for_2018.pkl'
Keyword Arguments: - copydta (bool) – Defaults to False. If true, save a copy of the data in
Stata DTA format if
raw_filepath
is not already a DTA file. - path_args (list-like) – DEPRECATED: Use named replacement fields
instead. A list of ints or strs that point to args or
kwargs of the build function, respectively. The value of these
arguments will then be use to format
raw_filepath
.
Example
@load_or_build('file_{}_{}.csv', path_args=[0, 'b']) def foo(a, b=None): return pd.DataFrame([a, b]) if __name__ == '__main__': # Saves `df` to `file_infix_suffix.csv` foo('infix', 'suffix')
Other Parameters: - These are additional kwargs that can be passed to the wrapped function – that affect the behavior of
load_or_build
. - _rebuild (bool) – Defaults to False. If True, build the DataFrame and save it to filepath even if filepath already exists.
- _load (bool) – Defaults to True. If True, try loading the data before building it. If False, the building function is called and the result returned with no data written to disk.
- copydta (bool) – Defaults to False. If true, save a copy of the data in
Stata DTA format if
-
econtools.
save_cli
() → bool¶ Add CLI boolean flag
--save
Returns: True if --save
was entered on command line, else False.Return type: bool
-
econtools.
confirmer
(prompt_str: str, default_no: bool = True) → bool¶ Prompt user for yes/no answer.
Parameters: - prompt_str (str) – Prompt to show user.
- default_no (bool) – Defaults to True. If True, the default response is ‘No’.
Returns: True if user responded ‘Yes’, else False.
Return type: bool
-
econtools.
read
(path: str, **kwargs) → pandas.core.frame.DataFrame¶ Read file to DataFrame by file’s extension.
Parameters: - path (str) – Path to read the file from. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata)
- **kwargs – Arbitrary keyword arguments to pass to the
pandas
read method.
Returns: Return type: DataFrame
-
econtools.
write
(df: pandas.core.frame.DataFrame, path: str, **kwargs) → None¶ Read file to DataFrame by file’s extension.
Parameters: - df (DataFrame) – DataFrame to write to disk.
- path (str) – Path to write the file to. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata)
- **kwargs – Arbitrary keyword arguments to pass to the
pandas
write method.
Returns: Return type: None
Data Manipulation¶
-
econtools.
stata_merge
(left: pandas.core.frame.DataFrame, right: pandas.core.frame.DataFrame, assertval: Union[int, NoneType] = None, gen: str = '_m', **kwargs) → pandas.core.frame.DataFrame¶ Merge two DataFrames via
pandas.merge
but with some additional features. Specifically, an additional column is added to the returned DataFrame with the default label'_m'
. For each row of the returned DataFrame,'_m'
equals 1 if the row existed only inleft
, 2 if the row exited only inright
, and 3 if it exists in both, i.e., was successfully merged.Parameters: - left (DataFrame) – Left DataFrame to merge.
- right (DataFrame) – Right DataFrame to merge.
Keyword Arguments: - assertval (int) – Assert that all values of
'_m'
are equal toassertval
. Under default (None
) and no assertion is made. - gen (str) – Name of the merge status variable. Default is
'_m'
. - kwargs – Any standard keyword arg for
pandas.merge
, such ason
orhow
.
Returns: A
DataFrame
that is the merged output ofleft
andright
.Return type: pandas.DataFrame
-
econtools.
group_id
(df: pandas.core.frame.DataFrame, cols: Union[list, NoneType] = None, name: str = 'group_id', merge: bool = False) → pandas.core.frame.DataFrame¶ Generate a unique integer ID from a DataFrame or columns of the DataFrame. Specifically, create a unique number for every combination
Parameters: df (DataFrame) – DataFrame of interest.
Keyword Arguments: - cols (list) – List of columns to use for ID generation. Default
(
None
) uses all columns indf
. - name (str) – Name of the new ID variable. Default is
'group_id'
. - merge (bool) – Return the full input DataFrame df with the new group
ID column merged on. Default is
False
.
Returns: A
DataFrame
with the new group ID andcols
ifmerge=False
, or ifmerge=True
, the inputDataFrame
with group ID merged on as a new column.Return type: pandas.DataFrame
- cols (list) – List of columns to use for ID generation. Default
(
Econometrics¶
-
econtools.metrics.
reg
(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: Union[bool, NoneType] = None, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results¶ OLS Regression.
Parameters: - df (DataFrame) – Data with any relevant variables.
- y_name (str) – Column name in
df
of the dependent variable. - x_name (str or list) – Column name(s) in
df
of the independent variables/regressors
Keyword Arguments: - vce_type (str) –
Type of estimator to use for variance-covariance matrix of estimated coefficients. Default is standard OLS. Possible choices are:
- ’robust’ or ‘hc1’
- ’hc2’
- ’hc3’
- ’cluster’ (requires kwarg
cluster
) - ’shac’ (requires kwarg
shac
)
- cluster (str) – Column name in
df
used to cluster standard errors. - shac (dict) –
Arguments to pass to spatial HAC estimator. Requires:
- x (str): Column name in
df
to serve as longitude. - y (str): Column name in
df
to serve as latitude. - kern (str): Kernel to use in estimation. May be
triangle (
tria
) or uniform (unif
). - band (float): Bandwidth for kernel.
- x (str): Column name in
- fe_name (str) – transformation (demeaning).
- a_name (str) –
- awt_name (str) – Column name in
df
to use for analytic weights in regression. - addcons (bool) – Defaults to False. Add a constant to independent
variables. Has no effect if
a_name
is passed. - nocons (bool) – Defaults to False. Flag so estimators know that
independent variables
df
do not include a constant. Only affects degrees of freedom. - nosingles (bool) – Defaults to True. Drop observations that are obsorbed
by the within transformation. Has no effect if
a_name=None
. - save_mem (bool) – Defaults to False. If True, the returned
Results
object will not save large objects, specificallyyhat
,sample
, andresid
. - check_colinear (bool) – Default False. Checks rank of regressor matrix, X. If X is rank deficient, an error is raised that prints the colinear columns.
Returns: A
Results
object
-
econtools.metrics.
ivreg
(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], z_name: Union[str, typing.List[str]], w_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, iv_method: str = '2sls', _kappa_debug=None, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: bool = False, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results¶ Instrumental Variables Regression
Parameters: - df (DataFrame) – Data with any relevant variables.
- y_name (str) – Column name in
df
of the dependent variable. - x_name (str or list) – Column name(s) in
df
of the endogenous regressor(s). - z_name (str or list) – Column name(s) in
df
of the excluded instrument(s) - w_name (str or list) – Column name(s) in
df
of the included instruments/exogenous regressors
Keyword Arguments: - fe_name (str) – transformation (demeaning). **All other keyword args in
reg()
may also be used. - iv_method (str) –
Instrumental variables method to use. Options are:
'2sls'
, two-stage least squares (default)'liml'
, limited-information maximum likelihood.
Returns: A
Results
object with (a) no r-squared (r2
orr2_a
attributes), and (b) akappa
attribute (always 1 ifiv_method='2sls'
)
-
class
econtools.metrics.core.
Results
(**kwargs)¶ Regression Results container.
-
summary
¶ DataFrame – Summary of regression results.
-
beta
¶ Series – All beta coefficients. Index is regressor names.
-
se
¶ Series – Standard errors.
-
t_stat
¶ Series – t-stats.
-
pt
¶ Series – p-scores for t-stats.
-
ci_lo
¶ Series – Confidence interval, lower bound.
-
ci_hi
¶ Series – Confidence interval, upper bound.
-
r2
¶ float – R-squared
-
r2_a
¶ float – Adjusted R-squared.
-
K
¶ int – Number of regressors
-
N
¶ int – Number of observations
-
vce
¶ DataFrame – K-by-K variance-covariance matrix.
-
F
¶ float – F-stat of joint significance of beta coefficients.
-
pF
¶ float – p-score for F-stat.
-
df_m
¶ int – Model degrees of freedom (excluding constant).
-
df_r
¶ int – Residual degrees of freedom.
-
ssr
¶ float – Sum of squared residuals.
-
sst
¶ float – Total sum of squares.
-
yhat
¶ array – Fit values (\(X\hat{\beta}\))
-
resid
¶ array – Regression residuals (\(\hat{\varepsilon}\))
-
sample
¶ array – Boolean array the same length of DataFrame passed to original regression function. Row is True is the observation is included in the regression, False otherwise. Regression function will automatically drop observations where the outcome, regressor, weights, etc., are missing/null.
-
-
Results.
Ftest
(col_names, equal=False)¶ F test using regression results.
Parameters: col_names (str or list) – Regressor name(s) to test. Keyword Arguments: equal (bool) – Defaults to False. If True, test if all coefficients in col_names
are equal. If False, test ifcol_names
are jointly significant.Returns: - A tuple containing:
- F (float): F-stat.
- pF (float): p-score for
F
.
Return type: tuple
-
econtools.metrics.
f_test
(V: numpy.ndarray, R: numpy.ndarray, beta: numpy.ndarray, r: int, df_d: int) → Tuple[float, float]¶ Arbitrary F test.
Parameters: - V (array) – K-by-K variance-covariance matrix.
- R (array) – K-by-K Test matrix.
- beta (array) – Length-K vector of coefficient estimates.
- r (array) – Length-K vector of null hypotheses.
- df_d (int) – Denominator degrees of freedom.
Returns: - A tuple containing:
- F (float): F-stat.
- pF (float): p-score for
F
.
Return type: tuple
-
econtools.metrics.
kdensity
(x: Union[numpy.ndarray, pandas.core.frame.DataFrame], x0: Union[float, numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, wt: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, kernel: str = 'epan') → Tuple[[Union[numpy.ndarray, pandas.core.frame.DataFrame], numpy.ndarray], dict]¶ Kernel density estimation.
Parameters: x (array-like) – Variable over which to estimate density.
Keyword Arguments: - x0 (float or array-like) – Default
None
. Values at which to caluculate density. IfNone
, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one ofx0
andN
must beNone
.x0
may also be a scalar. - N (int) – Default
None
. Number ofx0
values to calculate ifx0
is not specified. At least one ofx0
andN
must beNone
. - h (str or float) – Defaults to None (Silverman’s rule of thumb).
Bandwidth for kernel. May pass a float or any of the following for
Silverman’s rule of thumb:
'silverman'
,'thumb'
,'rot'
. - kernel (str) –
Type of kernel to be used. Options are:
'epan'
, Epanechnikov (default)'unif'
, Uniform'tria'
, Triangle
- wt (array-like) – Weights. Must be same length as
x
.
Returns: - A tuple containing
- x0 (float or array) - Points are which kernel is
estimated. If
x0
is passed explicitly, this will be the same. - f_hat (float or array) - Estimated kernel density at
point(s)
x0
. - est_stats (dict) - Contains bandwidth and kernel name.
- x0 (float or array) - Points are which kernel is
estimated. If
Return type: tuple
- x0 (float or array-like) – Default
-
econtools.metrics.
llr
(y: numpy.ndarray, x: numpy.ndarray, x0: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, degree: int = 1, kernel: str = 'epan', ci: bool = False)¶ Local-linear Regression
Parameters: - y (array) – Dependent variable
- x (array) – Independent variable
Keyword Arguments: - x0 (float or array-like) – Default
None
. Values at which to caluculate regression. IfNone
, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one ofx0
andN
must beNone
. - N (int) – Default
None
. Number ofx0
values to calculate ifx0
is not specified. At least one ofx0
andN
must beNone
. - h (str or float) – Defaults to None (Silverman’s rule of thumb).
Bandwidth for kernel. May pass a float or any of the following for
Silverman’s rule of thumb:
'silverman'
,'thumb'
,'rot'
. - kernel (str) –
Type of kernel to be used. Options are:
'epan'
, Epanechnikov (default)'unif'
, Uniform'tria'
, Triangle
- degree (int) – Defaults to 1. Degree of polynomial to use in local regression.
- ci (bool) – Defaults to False. If True, also return confidence interval for each point.
Returns: Stuff.
LaTeX¶
-
econtools.
outreg
(regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], var_names: Union[list, NoneType] = None, var_labels: Union[list, NoneType] = None, digits: int = 4, stars: bool = True, se: str = '(', options: bool = False) → str¶ Create the guts of a Latex tabular enviornment from regression results.
Parameters: - regs (Results or iterable of Results) – Regressions to output to table.
- var_names (str or iterable of str) – Variable names to pull from regs. If none specified, by default uses the pandas dataframe colum names.
- var_labels (str or iterable of str) – Pretty names for variables in table. If none specified, will use var_names.
Keyword Arguments: - digits (int) – Defaults to 4. How many digits to include past decimal.
- stars (bool) – Defaults to True. If True, adds stars to mark statistical significance.
- se (str) – Defaults to “(“. Marker for standard errors. May
also choose brackets with
se="["
. - options (bool) – Default to False: If True, return a
dict
with formatting options that were generated byoutreg
:name_just
,stat_just
, etc., for additional calls totable_mainrow
andtable_statrow
.
Returns: LaTeX fragment meant to be wrapped in a tabular environment.
Return type: str
-
econtools.
table_mainrow
(rowname: str, varname: Union[int, str], regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], name_just: int = 24, stat_just: int = 12, digits: int = 3, se: str = '(', stars: bool = True) → str¶ Add a table row of regression coefficients with standard errors.
Parameters: - rowname (str) – First cell of table row, i.e., the row’s name.
- varname (str) – Name of variable to pull from
Results
object. - regs (Results or iterable of Results) – Regressions from which
to pull coefficients named
varname
.
Keyword Arguments: - name_just (int) –
- stat_just (int) –
- digits (int) –
- se (str) –
- stars (bool) –
Returns: String of table row.
Return type: str
-
econtools.
table_statrow
(rowname: str, vals: Iterable, name_just: int = 24, stat_just: int = 12, wrapnum: bool = False, sd: bool = False, digits: Union[int, NoneType] = None, empty_left: int = 0, empty_right: int = 0, empty_slots: list = [], **kwargs) → str¶ Make a table row. Useful for bottom rows of regression tables (e.g., R-squared) or tables of summary statistics.
Parameters: - rowname (str) – Row’s name.
- vals (iterable) – Values to fill cell rows. Can add empty cells with
''
.
Keyword Arguments: - name_just (int) – Width/justification of the
rowname
column. - stat_just (int) – Width/justification of the
vals
columns. - wrapnum (bool) – If True, wrap cell values in LaTeX function
num
, which automatically adds commas as needed. Requires LaTeX packagesiunitx
in LaTeX document. - sd (bool or str) – If True, wrap cell value in parentheses as per
convention. May also set
sd="["
to wrap in brackets. - digits (int) – How many digits to print after decimal. If
None
, prints contents ofvals
exactly as is. - empty_left (int) – Adds empty_left empty cells to left side of row.
Is mutually exclusive with
empty_slots
. - empty_right (int) – See
empty_left
. - empty_slots (list) – Make table row have empty cells at index values
in
empty_slots
(zero-indexed). Mutually exclusive withempty_left
andempty_right
. For example, passingvals=(1, 2, 3)
andempty_slots=(1, 3, 5)
is the same as passingvals=(1, '', 2, '', 3, '')
.
Returns: LaTeX tabular row with
rowname
andvals
with the specified formatting.Return type: str
Example
>>> table_str = table_statrow("Method", ['OLS', '2SLS', 'LIML']) >>> table_str += table_statrow("N", [100, 200, 300]) >>> print(table_str) Method & OLS & 2SLS & LIML \\ N & 100 & 200 & 300 \\
-
econtools.
write_notes
(notes: str, table_path: str) → None¶ Write notes for a table.
Parameters: - notes (str) – String to write to disk.
- table_path (str) – The filepath of the accompanying LaTeX table.
Returns: Writes
notes
to<table_path_root>_notes.tex
. So iftable_path=table_1.tex
,notes
will be written totable_1_notes.tex
.Return type: None
Example
table_path = 'table_1.tex' notes = "Sample size is 277." write_notes(notes, table_path) # str ``notes`` written to ``table_1_notes.tex``
Plotting¶
-
econtools.
binscatter
(x: Union[str, numpy.ndarray], y: Union[str, numpy.ndarray], n: int = 20, data: Union[pandas.core.frame.DataFrame, NoneType] = None, discrete: bool = False, median: bool = False) → Tuple[numpy.ndarray, numpy.ndarray]¶ Binscatter.
Parameters: - x (array or str) – x-axis values. If type
str
, column indata
. - y (array or str) – y-axis values, same length as
x
. If typestr
, column indata
.
Keyword Arguments: - n (int) – Default 20. Number of bins.
- discrete (bool) – Default False. If True, every unique value in
x
is given its own bin. - median (bool) – Default False. Calculate the median for each bin instead of the mean. Only applies to y-axis values.
Returns: - x_bin_value (array) - Array of x bin values.
- y_bin_value (array) - Array of y bin values.
Return type: tuple
- x (array or str) – x-axis values. If type
-
econtools.
legend_below
(ax, *args, **kwargs) → None¶ Create a legend below and outside the main axis object.
Parameters: - ax (Axis) – The main
Axis
object. - *args – other args to pass to
ax.legend
- **kwargs – other keyword args to pass to
ax.legend
Keyword Arguments: - shrink (bool) – Default False. Should be True.
- anchor (tuple) – 2-tuple to pass to bbox_to_anchor. This aligns the legend to the rest of the Axis. If you need more space between the legend and your figure, make the second digit more negative.
Returns: Return type: None
- ax (Axis) – The main
Reference¶
-
econtools.
state_name_to_fips
(name: str) → int¶ Take state name and return fips as int
Parameters: x (str) – State name (e.g., Arizona). Returns: FIPS code. Return type: int
-
econtools.
state_fips_to_name
(fips: int) → str¶ Take fips as int and return state name
Parameters: x (int) – State FIPS. Returns: State name (e.g., Colorado) Return type: str
-
econtools.
state_name_to_abbr
(name: str) → str¶ Take state name, return 2-letter state abbreviation
Parameters: x (str) – State name (e.g., Idaho) Returns: 2-letter state abbreviation (e.g., ID) Return type: str
-
econtools.
state_abbr_to_name
(abbr: str) → str¶ Take 2-letter state abbreviation, return name
Parameters: x (str) – 2-letter state abbreviation (e.g., CA) Returns: State name (e.g., California) Return type: str