Function Signatures¶

Data I/O
Data Manipulation
Econometrics
LaTeX
Plotting
Reference

Data I/O ¶

econtools.load_or_build(raw_filepath: str, copydta: bool = False, path_args: list = []) → Callable¶

Loads raw_filepath as a DataFrame if it exists, otherwise builds the data and saves it to raw_filepath.

Parameters:	raw_filepath (str) – Path to saved DataFrame. If `raw_filepath` includes named replacement fields (e.g., “`'{arg_name}'`”) with the same name as function arguments, passed values will be inserted into the file path.

Example

@load_or_build('data_for_{year}.pkl')
def make_data(year):
    <Make the data>

df = make_data(2018)    # Saves to 'data_for_2018.pkl'

Keyword Arguments:
	copydta (bool) – Defaults to False. If true, save a copy of the data in Stata DTA format if `raw_filepath` is not already a DTA file. path_args (list-like) – DEPRECATED: Use named replacement fields instead. A list of ints or strs that point to args or kwargs of the build function, respectively. The value of these arguments will then be use to format `raw_filepath`.

Example

@load_or_build('file_{}_{}.csv', path_args=[0, 'b'])
def foo(a, b=None):
    return pd.DataFrame([a, b])

if __name__ == '__main__':
    # Saves `df` to `file_infix_suffix.csv`
    foo('infix', 'suffix')

Other Parameters:

These are additional kwargs that can be passed to the wrapped function – that affect the behavior of load_or_build.
_rebuild (bool) – Defaults to False. If True, build the DataFrame and save it to filepath even if filepath already exists.
_load (bool) – Defaults to True. If True, try loading the data before building it. If False, the building function is called and the result returned with no data written to disk.

econtools.save_cli() → bool¶

Add CLI boolean flag --save

Returns:	True if `--save` was entered on command line, else False.
Return type:	bool

econtools.confirmer(prompt_str: str, default_no: bool = True) → bool¶

Prompt user for yes/no answer.

Parameters:	prompt_str (str) – Prompt to show user. default_no (bool) – Defaults to True. If True, the default response is ‘No’.
Returns:	True if user responded ‘Yes’, else False.
Return type:	bool

econtools.read(path: str, **kwargs) → pandas.core.frame.DataFrame¶

Read file to DataFrame by file’s extension.

Parameters:	path (str) – Path to read the file from. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata) **kwargs – Arbitrary keyword arguments to pass to the `pandas` read method.
Returns:
Return type:	DataFrame

econtools.write(df: pandas.core.frame.DataFrame, path: str, **kwargs) → None¶

Read file to DataFrame by file’s extension.

Parameters:	df (DataFrame) – DataFrame to write to disk. path (str) – Path to write the file to. Supported file suffixes are: - csv - pkl (pickle) - hdf (HDF5) - dta (Stata) **kwargs – Arbitrary keyword arguments to pass to the `pandas` write method.
Returns:
Return type:	None

econtools.stata_merge(left: pandas.core.frame.DataFrame, right: pandas.core.frame.DataFrame, assertval: Union[int, NoneType] = None, gen: str = '_m', **kwargs) → pandas.core.frame.DataFrame¶

Merge two DataFrames via pandas.merge but with some additional features. Specifically, an additional column is added to the returned DataFrame with the default label '_m'. For each row of the returned DataFrame, '_m' equals 1 if the row existed only in left, 2 if the row exited only in right, and 3 if it exists in both, i.e., was successfully merged.

Keyword Arguments:
Parameters:	left (DataFrame) – Left DataFrame to merge. right (DataFrame) – Right DataFrame to merge.
	assertval (int) – Assert that all values of `'_m'` are equal to `assertval`. Under default (`None`) and no assertion is made. gen (str) – Name of the merge status variable. Default is `'_m'`. kwargs – Any standard keyword arg for `pandas.merge`, such as `on` or `how`.
Returns:	A `DataFrame` that is the merged output of `left` and `right`.
Return type:	`pandas.DataFrame`

econtools.group_id(df: pandas.core.frame.DataFrame, cols: Union[list, NoneType] = None, name: str = 'group_id', merge: bool = False) → pandas.core.frame.DataFrame¶

Generate a unique integer ID from a DataFrame or columns of the DataFrame. Specifically, create a unique number for every combination

Keyword Arguments:
Parameters:	df (DataFrame) – DataFrame of interest.
	cols (list) – List of columns to use for ID generation. Default (`None`) uses all columns in `df`. name (str) – Name of the new ID variable. Default is `'group_id'`. merge (bool) – Return the full input DataFrame df with the new group ID column merged on. Default is `False`.
Returns:	A `DataFrame` with the new group ID and `cols` if `merge=False`, or if `merge=True`, the input `DataFrame` with group ID merged on as a new column.
Return type:	`pandas.DataFrame`

Econometrics ¶

econtools.metrics.reg(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: Union[bool, NoneType] = None, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results¶

OLS Regression.

Keyword Arguments:
Parameters:	df (DataFrame) – Data with any relevant variables. y_name (str) – Column name in `df` of the dependent variable. x_name (str or list) – Column name(s) in `df` of the independent variables/regressors
	vce_type (str) – Type of estimator to use for variance-covariance matrix of estimated coefficients. Default is standard OLS. Possible choices are: ’robust’ or ‘hc1’ ’hc2’ ’hc3’ ’cluster’ (requires kwarg `cluster`) ’shac’ (requires kwarg `shac`) cluster (str) – Column name in `df` used to cluster standard errors. shac (dict) – Arguments to pass to spatial HAC estimator. Requires: x (str): Column name in `df` to serve as longitude. y (str): Column name in `df` to serve as latitude. kern (str): Kernel to use in estimation. May be triangle (`tria`) or uniform (`unif`). band (float): Bandwidth for kernel. fe_name (str) – transformation (demeaning). a_name (str) – awt_name (str) – Column name in `df` to use for analytic weights in regression. addcons (bool) – Defaults to False. Add a constant to independent variables. Has no effect if `a_name` is passed. nocons (bool) – Defaults to False. Flag so estimators know that independent variables `df` do not include a constant. Only affects degrees of freedom. nosingles (bool) – Defaults to True. Drop observations that are obsorbed by the within transformation. Has no effect if `a_name=None`. save_mem (bool) – Defaults to False. If True, the returned `Results` object will not save large objects, specifically `yhat`, `sample`, and `resid`. check_colinear (bool) – Default False. Checks rank of regressor matrix, X. If X is rank deficient, an error is raised that prints the colinear columns.
Returns:	A `Results` object

econtools.metrics.ivreg(df: pandas.core.frame.DataFrame, y_name: str, x_name: Union[str, typing.List[str]], z_name: Union[str, typing.List[str]], w_name: Union[str, typing.List[str]], fe_name: Union[str, NoneType] = None, a_name: Union[str, NoneType] = None, nosingles: bool = True, iv_method: str = '2sls', _kappa_debug=None, vce_type: Union[str, NoneType] = None, cluster: Union[str, NoneType] = None, shac: Union[dict, NoneType] = None, addcons: bool = False, nocons: bool = False, awt_name: Union[str, NoneType] = None, save_mem: bool = False, check_colinear: bool = False) → econtools.metrics.results.Results¶

Instrumental Variables Regression

Keyword Arguments:
Parameters:	df (DataFrame) – Data with any relevant variables. y_name (str) – Column name in `df` of the dependent variable. x_name (str or list) – Column name(s) in `df` of the endogenous regressor(s). z_name (str or list) – Column name(s) in `df` of the excluded instrument(s) w_name (str or list) – Column name(s) in `df` of the included instruments/exogenous regressors
	fe_name (str) – transformation (demeaning). All other keyword args in `reg()` may also be used. iv_method** (str) – Instrumental variables method to use. Options are: `'2sls'`, two-stage least squares (default) `'liml'`, limited-information maximum likelihood.
Returns:	A `Results` object with (a) no r-squared (`r2` or `r2_a` attributes), and (b) a `kappa` attribute (always 1 if `iv_method='2sls'`)

class econtools.metrics.core.Results(**kwargs)¶

Regression Results container.

summary¶: DataFrame – Summary of regression results.

beta¶: Series – All beta coefficients. Index is regressor names.

se¶: Series – Standard errors.

t_stat¶: Series – t-stats.

pt¶: Series – p-scores for t-stats.

ci_lo¶: Series – Confidence interval, lower bound.

ci_hi¶: Series – Confidence interval, upper bound.

r2¶: float – R-squared

r2_a¶: float – Adjusted R-squared.

K¶: int – Number of regressors

N¶: int – Number of observations

vce¶: DataFrame – K-by-K variance-covariance matrix.

F¶: float – F-stat of joint significance of beta coefficients.

pF¶: float – p-score for F-stat.

df_m¶: int – Model degrees of freedom (excluding constant).

df_r¶: int – Residual degrees of freedom.

ssr¶: float – Sum of squared residuals.

sst¶: float – Total sum of squares.

yhat¶: array – Fit values (\(X\hat{\beta}\))

resid¶: array – Regression residuals (\(\hat{\varepsilon}\))

sample¶: array – Boolean array the same length of DataFrame passed to original regression function. Row is True is the observation is included in the regression, False otherwise. Regression function will automatically drop observations where the outcome, regressor, weights, etc., are missing/null.

Results.Ftest(col_names, equal=False)¶

F test using regression results.

Keyword Arguments:
Parameters:	col_names (str or list) – Regressor name(s) to test.
	equal (bool) – Defaults to False. If True, test if all coefficients in `col_names` are equal. If False, test if `col_names` are jointly significant.
Returns:	A tuple containing: F (float): F-stat. pF (float): p-score for `F`.
Return type:	tuple

econtools.metrics.f_test(V: numpy.ndarray, R: numpy.ndarray, beta: numpy.ndarray, r: int, df_d: int) → Tuple[float, float]¶

Arbitrary F test.

Parameters:

V (array) – K-by-K variance-covariance matrix.
R (array) – K-by-K Test matrix.
beta (array) – Length-K vector of coefficient estimates.
r (array) – Length-K vector of null hypotheses.
df_d (int) – Denominator degrees of freedom.

Returns:

A tuple containing:

F (float): F-stat.
pF (float): p-score for F.

Return type:

tuple

econtools.metrics.kdensity(x: Union[numpy.ndarray, pandas.core.frame.DataFrame], x0: Union[float, numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, wt: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, kernel: str = 'epan') → Tuple[[Union[numpy.ndarray, pandas.core.frame.DataFrame], numpy.ndarray], dict]¶

Kernel density estimation.

Keyword Arguments:
Parameters:	x (array-like) – Variable over which to estimate density.
	x0 (float or array-like) – Default `None`. Values at which to caluculate density. If `None`, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one of `x0` and `N` must be `None`. `x0` may also be a scalar. N (int) – Default `None`. Number of `x0` values to calculate if `x0` is not specified. At least one of `x0` and `N` must be `None`. h (str or float) – Defaults to None (Silverman’s rule of thumb). Bandwidth for kernel. May pass a float or any of the following for Silverman’s rule of thumb: `'silverman'`, `'thumb'`, `'rot'`. kernel (str) – Type of kernel to be used. Options are: `'epan'`, Epanechnikov (default) `'unif'`, Uniform `'tria'`, Triangle wt (array-like) – Weights. Must be same length as `x`.
Returns:	A tuple containing x0 (float or array) - Points are which kernel is estimated. If `x0` is passed explicitly, this will be the same. f_hat (float or array) - Estimated kernel density at point(s) `x0`. est_stats (dict) - Contains bandwidth and kernel name.
Return type:	tuple

econtools.metrics.llr(y: numpy.ndarray, x: numpy.ndarray, x0: Union[numpy.ndarray, pandas.core.frame.DataFrame, NoneType] = None, N: Union[int, NoneType] = None, h: Union[str, float, NoneType] = None, degree: int = 1, kernel: str = 'epan', ci: bool = False)¶

Local-linear Regression

Keyword Arguments:
Parameters:	y (array) – Dependent variable x (array) – Independent variable
	x0 (float or array-like) – Default `None`. Values at which to caluculate regression. If `None`, these values will be calculated automatically. Default length of x0 is min([len(x), 50]). At least one of `x0` and `N` must be `None`. N (int) – Default `None`. Number of `x0` values to calculate if `x0` is not specified. At least one of `x0` and `N` must be `None`. h (str or float) – Defaults to None (Silverman’s rule of thumb). Bandwidth for kernel. May pass a float or any of the following for Silverman’s rule of thumb: `'silverman'`, `'thumb'`, `'rot'`. kernel (str) – Type of kernel to be used. Options are: `'epan'`, Epanechnikov (default) `'unif'`, Uniform `'tria'`, Triangle degree (int) – Defaults to 1. Degree of polynomial to use in local regression. ci (bool) – Defaults to False. If True, also return confidence interval for each point.
Returns:	Stuff.

LaTeX ¶

econtools.outreg(regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], var_names: Union[list, NoneType] = None, var_labels: Union[list, NoneType] = None, digits: int = 4, stars: bool = True, se: str = '(', options: bool = False) → str¶

Create the guts of a Latex tabular enviornment from regression results.

Keyword Arguments:
Parameters:	regs (Results or iterable of Results) – Regressions to output to table. var_names (str or iterable of str) – Variable names to pull from regs. If none specified, by default uses the pandas dataframe colum names. var_labels (str or iterable of str) – Pretty names for variables in table. If none specified, will use var_names.
	digits (int) – Defaults to 4. How many digits to include past decimal. stars (bool) – Defaults to True. If True, adds stars to mark statistical significance. se (str) – Defaults to “(“. Marker for standard errors. May also choose brackets with `se="["`. options (bool) – Default to False: If True, return a `dict` with formatting options that were generated by `outreg`: `name_just`, `stat_just`, etc., for additional calls to `table_mainrow` and `table_statrow`.
Returns:	LaTeX fragment meant to be wrapped in a tabular environment.
Return type:	str

econtools.table_mainrow(rowname: str, varname: Union[int, str], regs: Union[econtools.metrics.results.Results, typing.Tuple[econtools.metrics.results.Results]], name_just: int = 24, stat_just: int = 12, digits: int = 3, se: str = '(', stars: bool = True) → str¶

Add a table row of regression coefficients with standard errors.

Keyword Arguments:
Parameters:	rowname (str) – First cell of table row, i.e., the row’s name. varname (str) – Name of variable to pull from `Results` object. regs (Results or iterable of Results) – Regressions from which to pull coefficients named `varname`.
	name_just (int) – stat_just (int) – digits (int) – se (str) – stars (bool) –
Returns:	String of table row.
Return type:	str

econtools.table_statrow(rowname: str, vals: Iterable, name_just: int = 24, stat_just: int = 12, wrapnum: bool = False, sd: bool = False, digits: Union[int, NoneType] = None, empty_left: int = 0, empty_right: int = 0, empty_slots: list = [], **kwargs) → str¶

Make a table row. Useful for bottom rows of regression tables (e.g., R-squared) or tables of summary statistics.

Keyword Arguments:
Parameters:	rowname (str) – Row’s name. vals (iterable) – Values to fill cell rows. Can add empty cells with `''`.
	name_just (int) – Width/justification of the `rowname` column. stat_just (int) – Width/justification of the `vals` columns. wrapnum (bool) – If True, wrap cell values in LaTeX function `num`, which automatically adds commas as needed. Requires LaTeX package `siunitx` in LaTeX document. sd (bool or str) – If True, wrap cell value in parentheses as per convention. May also set `sd="["` to wrap in brackets. digits (int) – How many digits to print after decimal. If `None`, prints contents of `vals` exactly as is. empty_left (int) – Adds empty_left empty cells to left side of row. Is mutually exclusive with `empty_slots`. empty_right (int) – See `empty_left`. empty_slots (list) – Make table row have empty cells at index values in `empty_slots` (zero-indexed). Mutually exclusive with `empty_left` and `empty_right`. For example, passing `vals=(1, 2, 3)` and `empty_slots=(1, 3, 5)` is the same as passing `vals=(1, '', 2, '', 3, '')`.
Returns:	LaTeX tabular row with `rowname` and `vals` with the specified formatting.
Return type:	str

Example

>>> table_str = table_statrow("Method", ['OLS', '2SLS', 'LIML'])
>>> table_str += table_statrow("N", [100, 200, 300])
>>> print(table_str)
Method      & OLS   & 2SLS   & LIML  \\
N           & 100   & 200    & 300   \\

econtools.write_notes(notes: str, table_path: str) → None¶

Write notes for a table.

Parameters:	notes (str) – String to write to disk. table_path (str) – The filepath of the accompanying LaTeX table.
Returns:	Writes `notes` to `<table_path_root>_notes.tex`. So if `table_path=table_1.tex`, `notes` will be written to `table_1_notes.tex`.
Return type:	None

Example

table_path = 'table_1.tex'
notes = "Sample size is 277."
write_notes(notes, table_path)
# str ``notes`` written to ``table_1_notes.tex``

Plotting ¶

econtools.binscatter(x: Union[str, numpy.ndarray], y: Union[str, numpy.ndarray], n: int = 20, data: Union[pandas.core.frame.DataFrame, NoneType] = None, discrete: bool = False, median: bool = False) → Tuple[numpy.ndarray, numpy.ndarray]¶

Binscatter.

Keyword Arguments:
Parameters:	x (array or str) – x-axis values. If type `str`, column in `data`. y (array or str) – y-axis values, same length as `x`. If type `str`, column in `data`.
	n (int) – Default 20. Number of bins. discrete (bool) – Default False. If True, every unique value in `x` is given its own bin. median (bool) – Default False. Calculate the median for each bin instead of the mean. Only applies to y-axis values.
Returns:	x_bin_value (array) - Array of x bin values. y_bin_value (array) - Array of y bin values.
Return type:	tuple

econtools.legend_below(ax, *args, **kwargs) → None¶

Create a legend below and outside the main axis object.

Keyword Arguments:
Parameters:	ax (Axis) – The main `Axis` object. args – other args to pass to `ax.legend` *kwargs – other keyword args to pass to `ax.legend`
	shrink (bool) – Default False. Should be True. anchor (tuple) – 2-tuple to pass to bbox_to_anchor. This aligns the legend to the rest of the Axis. If you need more space between the legend and your figure, make the second digit more negative.
Returns:
Return type:	None

Reference ¶

econtools.state_name_to_fips(name: str) → int¶

Take state name and return fips as int

Parameters:	x (str) – State name (e.g., Arizona).
Returns:	FIPS code.
Return type:	int

econtools.state_fips_to_name(fips: int) → str¶

Take fips as int and return state name

Parameters:	x (int) – State FIPS.
Returns:	State name (e.g., Colorado)
Return type:	str

econtools.state_name_to_abbr(name: str) → str¶

Take state name, return 2-letter state abbreviation

Parameters:	x (str) – State name (e.g., Idaho)
Returns:	2-letter state abbreviation (e.g., ID)
Return type:	str

econtools.state_abbr_to_name(abbr: str) → str¶

Take 2-letter state abbreviation, return name

Parameters:	x (str) – 2-letter state abbreviation (e.g., CA)
Returns:	State name (e.g., California)
Return type:	str

Function Signatures¶

Data I/O¶

Data Manipulation¶

Econometrics¶

LaTeX¶

Plotting¶

Reference¶

Data I/O ¶

Data Manipulation ¶

Econometrics ¶

LaTeX ¶

Plotting ¶

Reference ¶