Function Signatures

Data I/O

econtools.load_or_build(raw_filepath: str, copydta: bool = False, path_args: list = []) → Callable

Loads raw_filepath as a DataFrame if it exists, otherwise builds the data and saves it to raw_filepath.

Parameters:raw_filepath (str) – Path to saved DataFrame. If raw_filepath includes named replacement fields (e.g., “'{arg_name}'”) with the same name as function arguments, passed values will be inserted into the file path.

Example

@load_or_build('data_for_{year}.pkl')
def make_data(year):
    <Make the data>

df = make_data(2018)    # Saves to 'data_for_2018.pkl'
Keyword Arguments:
 
  • copydta (bool) – Defaults to False. If true, save a copy of the data in Stata DTA format if raw_filepath is not already a DTA file.
  • path_args (list-like) – DEPRECATED: Use named replacement fields instead. A list of ints or strs that point to args or kwargs of the build function, respectively. The value of these arguments will then be use to format raw_filepath.

Example

@load_or_build('file_{}_{}.csv', path_args=[0, 'b'])
def foo(a, b=None):
    return pd.DataFrame([a, b])

if __name__ == '__main__':
    # Saves `df` to `file_infix_suffix.csv`
    foo('infix', 'suffix')
Other Parameters:
 
  • These are additional kwargs that can be passed to the wrapped function – that affect the behavior of load_or_build.
  • _rebuild (bool) – Defaults to False. If True, build the DataFrame and save it to filepath even if filepath already exists.
  • _load (bool) – Defaults to True. If True, try loading the data before building it. If False, the building function is called and the result returned with no data written to disk.
econtools.save_cli() → bool

Add CLI boolean flag --save

Returns:True if --save was entered on command line, else False.
Return type:bool
econtools.confirmer(prompt_str: str, default_no: bool = True) → bool

Prompt user for yes/no answer.

Parameters:
  • prompt_str (str) – Prompt to show user.
  • default_no (bool) – Defaults to True. If True, the default response is ‘No’.
Returns:

True if user responded ‘Yes’, else False.

Return type:

bool

econtools.read(path: str, **kwargs) → pandas.core.frame.DataFrame

Read file to DataFrame by file’s extension.

Parameters:
  • path (str) – Path to read the file from. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata)
  • **kwargs – Arbitrary keyword arguments to pass to the pandas read method.
Returns:

Return type:

DataFrame

econtools.write(df: pandas.core.frame.DataFrame, path: str, **kwargs) → None

Read file to DataFrame by file’s extension.

Parameters:
  • df (DataFrame) – DataFrame to write to disk.
  • path (str) – Path to write the file to. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata)
  • **kwargs – Arbitrary keyword arguments to pass to the pandas write method.
Returns:

Return type:

None

Data Manipulation

econtools.stata_merge(left: pandas.core.frame.DataFrame, right: pandas.core.frame.DataFrame, assertval: Union[int, NoneType] = None, gen: str = '_m', **kwargs) → pandas.core.frame.DataFrame

Merge two DataFrames via pandas.merge but with some additional features. Specifically, an additional column is added to the returned DataFrame with the default label '_m'. For each row of the returned DataFrame, '_m' equals 1 if the row existed only in left, 2 if the row exited only in right, and 3 if it exists in both, i.e., was successfully merged.

Parameters:
  • left (DataFrame) – Left DataFrame to merge.
  • right (DataFrame) – Right DataFrame to merge.
Keyword Arguments:
 
  • assertval (int) – Assert that all values of '_m' are equal to assertval. Under default (None) and no assertion is made.
  • gen (str) – Name of the merge status variable. Default is '_m'.
  • kwargs – Any standard keyword arg for pandas.merge, such as on or how.
Returns:

A DataFrame that is the merged output of left and right.

Return type:

pandas.DataFrame

econtools.group_id(df: pandas.core.frame.DataFrame, cols: Union[list, NoneType] = None, name: str = 'group_id', merge: bool = False) → pandas.core.frame.DataFrame

Generate a unique integer ID from a DataFrame or columns of the DataFrame. Specifically, create a unique number for every combination

Parameters:

df (DataFrame) – DataFrame of interest.

Keyword Arguments:
 
  • cols (list) – List of columns to use for ID generation. Default (None) uses all columns in df.
  • name (str) – Name of the new ID variable. Default is 'group_id'.
  • merge (bool) – Return the full input DataFrame df with the new group ID column merged on. Default is False.
Returns:

A DataFrame with the new group ID and cols if merge=False, or if merge=True, the input DataFrame with group ID merged on as a new column.

Return type:

pandas.DataFrame

Econometrics

econtools.metrics.reg(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: Union[bool, NoneType] = None, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results

OLS Regression.

Parameters:
  • df (DataFrame) – Data with any relevant variables.
  • y_name (str) – Column name in df of the dependent variable.
  • x_name (str or list) – Column name(s) in df of the independent variables/regressors
Keyword Arguments:
 
  • vce_type (str) –

    Type of estimator to use for variance-covariance matrix of estimated coefficients. Default is standard OLS. Possible choices are:

    • ’robust’ or ‘hc1’
    • ’hc2’
    • ’hc3’
    • ’cluster’ (requires kwarg cluster)
    • ’shac’ (requires kwarg shac)
  • cluster (str) – Column name in df used to cluster standard errors.
  • shac (dict) –

    Arguments to pass to spatial HAC estimator. Requires:

    • x (str): Column name in df to serve as longitude.
    • y (str): Column name in df to serve as latitude.
    • kern (str): Kernel to use in estimation. May be triangle (tria) or uniform (unif).
    • band (float): Bandwidth for kernel.
  • fe_name (str) – transformation (demeaning).
  • a_name (str) –
  • awt_name (str) – Column name in df to use for analytic weights in regression.
  • addcons (bool) – Defaults to False. Add a constant to independent variables. Has no effect if a_name is passed.
  • nocons (bool) – Defaults to False. Flag so estimators know that independent variables df do not include a constant. Only affects degrees of freedom.
  • nosingles (bool) – Defaults to True. Drop observations that are obsorbed by the within transformation. Has no effect if a_name=None.
  • save_mem (bool) – Defaults to False. If True, the returned Results object will not save large objects, specifically yhat, sample, and resid.
  • check_colinear (bool) – Default False. Checks rank of regressor matrix, X. If X is rank deficient, an error is raised that prints the colinear columns.
Returns:

A Results object

econtools.metrics.ivreg(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], z_name: Union[str, typing.List[str]], w_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, iv_method: str = '2sls', _kappa_debug=None, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: bool = False, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results

Instrumental Variables Regression

Parameters:
  • df (DataFrame) – Data with any relevant variables.
  • y_name (str) – Column name in df of the dependent variable.
  • x_name (str or list) – Column name(s) in df of the endogenous regressor(s).
  • z_name (str or list) – Column name(s) in df of the excluded instrument(s)
  • w_name (str or list) – Column name(s) in df of the included instruments/exogenous regressors
Keyword Arguments:
 
  • fe_name (str) – transformation (demeaning). **All other keyword args in reg() may also be used.
  • iv_method (str) –

    Instrumental variables method to use. Options are:

    • '2sls', two-stage least squares (default)
    • 'liml', limited-information maximum likelihood.
Returns:

A Results object with (a) no r-squared (r2 or r2_a attributes), and (b) a kappa attribute (always 1 if iv_method='2sls')

class econtools.metrics.core.Results(**kwargs)

Regression Results container.

summary

DataFrame – Summary of regression results.

beta

Series – All beta coefficients. Index is regressor names.

se

Series – Standard errors.

t_stat

Series – t-stats.

pt

Series – p-scores for t-stats.

ci_lo

Series – Confidence interval, lower bound.

ci_hi

Series – Confidence interval, upper bound.

r2

float – R-squared

r2_a

float – Adjusted R-squared.

K

int – Number of regressors

N

int – Number of observations

vce

DataFrame – K-by-K variance-covariance matrix.

F

float – F-stat of joint significance of beta coefficients.

pF

float – p-score for F-stat.

df_m

int – Model degrees of freedom (excluding constant).

df_r

int – Residual degrees of freedom.

ssr

float – Sum of squared residuals.

sst

float – Total sum of squares.

yhat

array – Fit values (\(X\hat{\beta}\))

resid

array – Regression residuals (\(\hat{\varepsilon}\))

sample

array – Boolean array the same length of DataFrame passed to original regression function. Row is True is the observation is included in the regression, False otherwise. Regression function will automatically drop observations where the outcome, regressor, weights, etc., are missing/null.

Results.Ftest(col_names, equal=False)

F test using regression results.

Parameters:col_names (str or list) – Regressor name(s) to test.
Keyword Arguments:
 equal (bool) – Defaults to False. If True, test if all coefficients in col_names are equal. If False, test if col_names are jointly significant.
Returns:
A tuple containing:
  • F (float): F-stat.
  • pF (float): p-score for F.
Return type:tuple
econtools.metrics.f_test(V: numpy.ndarray, R: numpy.ndarray, beta: numpy.ndarray, r: int, df_d: int) → Tuple[float, float]

Arbitrary F test.

Parameters:
  • V (array) – K-by-K variance-covariance matrix.
  • R (array) – K-by-K Test matrix.
  • beta (array) – Length-K vector of coefficient estimates.
  • r (array) – Length-K vector of null hypotheses.
  • df_d (int) – Denominator degrees of freedom.
Returns:

A tuple containing:
  • F (float): F-stat.
  • pF (float): p-score for F.

Return type:

tuple

econtools.metrics.kdensity(x: Union[numpy.ndarray, pandas.core.frame.DataFrame], x0: Union[float, numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, wt: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, kernel: str = 'epan') → Tuple[[Union[numpy.ndarray, pandas.core.frame.DataFrame], numpy.ndarray], dict]

Kernel density estimation.

Parameters:

x (array-like) – Variable over which to estimate density.

Keyword Arguments:
 
  • x0 (float or array-like) – Default None. Values at which to caluculate density. If None, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one of x0 and N must be None. x0 may also be a scalar.
  • N (int) – Default None. Number of x0 values to calculate if x0 is not specified. At least one of x0 and N must be None.
  • h (str or float) – Defaults to None (Silverman’s rule of thumb). Bandwidth for kernel. May pass a float or any of the following for Silverman’s rule of thumb: 'silverman', 'thumb', 'rot'.
  • kernel (str) –

    Type of kernel to be used. Options are:

    • 'epan', Epanechnikov (default)
    • 'unif', Uniform
    • 'tria', Triangle
  • wt (array-like) – Weights. Must be same length as x.
Returns:

A tuple containing
  • x0 (float or array) - Points are which kernel is estimated. If x0 is passed explicitly, this will be the same.
  • f_hat (float or array) - Estimated kernel density at point(s) x0.
  • est_stats (dict) - Contains bandwidth and kernel name.

Return type:

tuple

econtools.metrics.llr(y: numpy.ndarray, x: numpy.ndarray, x0: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, degree: int = 1, kernel: str = 'epan', ci: bool = False)

Local-linear Regression

Parameters:
  • y (array) – Dependent variable
  • x (array) – Independent variable
Keyword Arguments:
 
  • x0 (float or array-like) – Default None. Values at which to caluculate regression. If None, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one of x0 and N must be None.
  • N (int) – Default None. Number of x0 values to calculate if x0 is not specified. At least one of x0 and N must be None.
  • h (str or float) – Defaults to None (Silverman’s rule of thumb). Bandwidth for kernel. May pass a float or any of the following for Silverman’s rule of thumb: 'silverman', 'thumb', 'rot'.
  • kernel (str) –

    Type of kernel to be used. Options are:

    • 'epan', Epanechnikov (default)
    • 'unif', Uniform
    • 'tria', Triangle
  • degree (int) – Defaults to 1. Degree of polynomial to use in local regression.
  • ci (bool) – Defaults to False. If True, also return confidence interval for each point.
Returns:

Stuff.

LaTeX

econtools.outreg(regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], var_names: Union[list, NoneType] = None, var_labels: Union[list, NoneType] = None, digits: int = 4, stars: bool = True, se: str = '(', options: bool = False) → str

Create the guts of a Latex tabular enviornment from regression results.

Parameters:
  • regs (Results or iterable of Results) – Regressions to output to table.
  • var_names (str or iterable of str) – Variable names to pull from regs. If none specified, by default uses the pandas dataframe colum names.
  • var_labels (str or iterable of str) – Pretty names for variables in table. If none specified, will use var_names.
Keyword Arguments:
 
  • digits (int) – Defaults to 4. How many digits to include past decimal.
  • stars (bool) – Defaults to True. If True, adds stars to mark statistical significance.
  • se (str) – Defaults to “(“. Marker for standard errors. May also choose brackets with se="[".
  • options (bool) – Default to False: If True, return a dict with formatting options that were generated by outreg: name_just, stat_just, etc., for additional calls to table_mainrow and table_statrow.
Returns:

LaTeX fragment meant to be wrapped in a tabular environment.

Return type:

str

econtools.table_mainrow(rowname: str, varname: Union[int, str], regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], name_just: int = 24, stat_just: int = 12, digits: int = 3, se: str = '(', stars: bool = True) → str

Add a table row of regression coefficients with standard errors.

Parameters:
  • rowname (str) – First cell of table row, i.e., the row’s name.
  • varname (str) – Name of variable to pull from Results object.
  • regs (Results or iterable of Results) – Regressions from which to pull coefficients named varname.
Keyword Arguments:
 
  • name_just (int) –
  • stat_just (int) –
  • digits (int) –
  • se (str) –
  • stars (bool) –
Returns:

String of table row.

Return type:

str

econtools.table_statrow(rowname: str, vals: Iterable, name_just: int = 24, stat_just: int = 12, wrapnum: bool = False, sd: bool = False, digits: Union[int, NoneType] = None, empty_left: int = 0, empty_right: int = 0, empty_slots: list = [], **kwargs) → str

Make a table row. Useful for bottom rows of regression tables (e.g., R-squared) or tables of summary statistics.

Parameters:
  • rowname (str) – Row’s name.
  • vals (iterable) – Values to fill cell rows. Can add empty cells with ''.
Keyword Arguments:
 
  • name_just (int) – Width/justification of the rowname column.
  • stat_just (int) – Width/justification of the vals columns.
  • wrapnum (bool) – If True, wrap cell values in LaTeX function num, which automatically adds commas as needed. Requires LaTeX package siunitx in LaTeX document.
  • sd (bool or str) – If True, wrap cell value in parentheses as per convention. May also set sd="[" to wrap in brackets.
  • digits (int) – How many digits to print after decimal. If None, prints contents of vals exactly as is.
  • empty_left (int) – Adds empty_left empty cells to left side of row. Is mutually exclusive with empty_slots.
  • empty_right (int) – See empty_left.
  • empty_slots (list) – Make table row have empty cells at index values in empty_slots (zero-indexed). Mutually exclusive with empty_left and empty_right. For example, passing vals=(1, 2, 3) and empty_slots=(1, 3, 5) is the same as passing vals=(1, '', 2, '', 3, '').
Returns:

LaTeX tabular row with rowname and vals with the specified formatting.

Return type:

str

Example

>>> table_str = table_statrow("Method", ['OLS', '2SLS', 'LIML'])
>>> table_str += table_statrow("N", [100, 200, 300])
>>> print(table_str)
Method      & OLS   & 2SLS   & LIML  \\
N           & 100   & 200    & 300   \\
econtools.write_notes(notes: str, table_path: str) → None

Write notes for a table.

Parameters:
  • notes (str) – String to write to disk.
  • table_path (str) – The filepath of the accompanying LaTeX table.
Returns:

Writes notes to <table_path_root>_notes.tex. So if table_path=table_1.tex, notes will be written to table_1_notes.tex.

Return type:

None

Example

table_path = 'table_1.tex'
notes = "Sample size is 277."
write_notes(notes, table_path)
# str ``notes`` written to ``table_1_notes.tex``

Plotting

econtools.binscatter(x: Union[str, numpy.ndarray], y: Union[str, numpy.ndarray], n: int = 20, data: Union[pandas.core.frame.DataFrame, NoneType] = None, discrete: bool = False, median: bool = False) → Tuple[numpy.ndarray, numpy.ndarray]

Binscatter.

Parameters:
  • x (array or str) – x-axis values. If type str, column in data.
  • y (array or str) – y-axis values, same length as x. If type str, column in data.
Keyword Arguments:
 
  • n (int) – Default 20. Number of bins.
  • discrete (bool) – Default False. If True, every unique value in x is given its own bin.
  • median (bool) – Default False. Calculate the median for each bin instead of the mean. Only applies to y-axis values.
Returns:

  • x_bin_value (array) - Array of x bin values.
  • y_bin_value (array) - Array of y bin values.

Return type:

tuple

econtools.legend_below(ax, *args, **kwargs) → None

Create a legend below and outside the main axis object.

Parameters:
  • ax (Axis) – The main Axis object.
  • *args – other args to pass to ax.legend
  • **kwargs – other keyword args to pass to ax.legend
Keyword Arguments:
 
  • shrink (bool) – Default False. Should be True.
  • anchor (tuple) – 2-tuple to pass to bbox_to_anchor. This aligns the legend to the rest of the Axis. If you need more space between the legend and your figure, make the second digit more negative.
Returns:

Return type:

None

Reference

econtools.state_name_to_fips(name: str) → int

Take state name and return fips as int

Parameters:x (str) – State name (e.g., Arizona).
Returns:FIPS code.
Return type:int
econtools.state_fips_to_name(fips: int) → str

Take fips as int and return state name

Parameters:x (int) – State FIPS.
Returns:State name (e.g., Colorado)
Return type:str
econtools.state_name_to_abbr(name: str) → str

Take state name, return 2-letter state abbreviation

Parameters:x (str) – State name (e.g., Idaho)
Returns:2-letter state abbreviation (e.g., ID)
Return type:str
econtools.state_abbr_to_name(abbr: str) → str

Take 2-letter state abbreviation, return name

Parameters:x (str) – 2-letter state abbreviation (e.g., CA)
Returns:State name (e.g., California)
Return type:str