delphi package

Subpackages

Submodules

delphi.db module

delphi.evaluation module

exception Error[source]

Bases: Exception

Base class for exceptions in this module.

exception InputError(expression, message)[source]

Bases: delphi.evaluation.Error

Exception raised for errors in the input.

expression -- input expression in which the error occurred
message -- explanation of the error
calculate_timestep(start_year: int, start_month: int, end_year: int, end_month: int) int[source]

Utility function that converts a time range given a start date and end date into a integer value.

Parameters:
  • start_year – The starting year (ex: 2012)

  • start_month – Starting month (1-12)

  • end_year – Ending year

  • end_month – Ending month

Returns:

The computed time step.

get_data_value(indicator: str, country: Optional[str] = 'South Sudan', state: Optional[str] = None, county: Optional[str] = None, year: int = 2015, month: int = 1, unit: Optional[str] = None, use_heuristic: bool = False) List[float][source]

Get a indicator value from the delphi database.

Parameters:
  • indicator – Name of the target indicator variable.

  • country – Specified Country to get a value for.

  • state – Specified State to get value for.

  • year – Specified Year to get a value for.

  • month – Specified Month to get a value for.

  • unit – Specified Units to get a value for.

  • use_heuristic – a boolean that indicates whether or not use a built-in

  • given (heurstic for partially missing data. In cases where data for a) –

  • data (year exists but no monthly) –

  • the (setting this to true divides) –

  • month. (yearly value by 12 for any) –

Returns:

Specified float value given the specified parameters.

data_to_list(indicator: str, start_year: int, start_month: int, end_year: int, end_month: int, use_heuristic: bool = False, **kwargs) Tuple[List[str], List[List[int]]][source]

Get the true values of the indicator variable given a start date and end date. Allows for other specifications as well.

Parameters:
  • variable – Name of target indicator variable.

  • start_year – An integer, designates the starting year (ex: 2012).

  • start_month – An integer, starting month (1-12).

  • end_year – An integer, ending year.

  • end_month – An integer, ending month.

  • use_heuristic – a boolean that indicates whether or not use a built-in

  • given (heurstic for partially missing data. In cases where data for a) –

  • data (year exists but no monthly) –

  • the (setting this to true divides) –

  • month. (yearly value by 12 for any) –

  • **kwargs – These are options for which you can specify

  • country

  • state

  • units.

Returns:

Returns a tuple where the first element is a list of the specified dates in year-month format and the second element is a list of lists of the true data for a given indicator. Each element of the outer list represents a time step and the inner lists contain the data for that time point.

mean_data_to_df(indicator: str, start_year: int, start_month: int, end_year: int, end_month: int, use_heuristic: bool = False, ci: Optional[float] = None, **kwargs) pandas.core.frame.DataFrame[source]
Get the true values of the indicator variable given a start date and

end data. Allows for other specifications as well. variable: Name of target indicator variable.

start_year: An integer, designates the starting year (ex: 2012).

start_month: An integer, starting month (1-12).

end_year: An integer, ending year.

end_month: An integer, ending month.

use_heuristic: a boolean that indicates whether or not use a built-in heurstic for partially missing data. In cases where data for a given year exists but no monthly data, setting this to true divides the yearly value by 12 for any month.

ci: confidence level. Only the mean is reported if left as None.

**kwargs: These are options for which you can specify country, state, units.

Returns:

Pandas Dataframe containing true values for target node’s indicator variable. The values are indexed by date.

pred_to_array(preds: Tuple[Tuple[Tuple[int, int], Tuple[int, int]], List[str], List[List[Dict[str, Dict[str, float]]]]], indicator: str) numpy.ndarray[source]

Outputs raw predictions for a given indicator that were generated by generate_prediction(). Each column is a time step and the rows are the samples for that time step.

Parameters:
  • preds – This is the entire prediction set returned by the

  • AnalysisGraph.cpp. (generate_prediction() method in) –

  • indicator – A string representing the indicator variable for which we

  • printed. (want predictions) –

Returns:

np.ndarray

mean_pred_to_df(preds: Tuple[Tuple[Tuple[int, int], Tuple[int, int]], List[str], List[List[Dict[str, Dict[str, float]]]]], indicator: str, ci: Optional[float] = 0.95, true_vals: bool = False, use_heuristic_for_true: bool = False, **kwargs) pandas.core.frame.DataFrame[source]

Outputs mean predictions for a given indicator that were generated by generate_prediction(). The rows are indexed by date. Other output includes the confidence intervals for the mean predictions and with true_vals = True, the true data values, residual error, and error bounds. Setting true_vals = True, assumes that real data exists for the given prediction range. A heuristic estimate is calculated for each missing data value in the true dateset.

Parameters:
  • preds – This is the entire prediction set returned by the

  • AnalysisGraph.cpp. (generate_prediction() method in) –

  • indicator – A string representing the indicator variable for which we

  • predictions (want mean) –

  • printed. (etc) –

  • ci – Confidence Level (as decimal). Default is 0.95 or 95%.

  • true_vals – A boolean, if set to True then the true data values,

  • errors (residual) –

  • set (and error bounds are return in the dataframe. If) –

  • False (to) –

  • intervals (for the mean predictions) –

  • **kwargs – Here country, state, and units can be specified. The same

  • kwargs (excluding k) –

Returns:

np.ndarray

calculate_prediction_rmse(preds: Tuple[Tuple[Tuple[int, int], Tuple[int, int]], List[str], List[List[Dict[str, Dict[str, float]]]]], indicator: str, **kwargs) float[source]
pred_plot(preds: Tuple[Tuple[Tuple[int, int], Tuple[int, int]], List[str], List[List[Dict[str, Dict[str, float]]]]], indicator: str, ci: Optional[float] = 0.95, plot_type: str = 'Prediction', show_rmse: bool = False, show_training_data: bool = False, save_as: Optional[str] = None, use_heuristic_for_true: bool = False, **kwargs) None[source]

Creates a line plot of the mean predictions for a given indicator that were generated by generate_prediction(). The y-axis are the indicator values(or errors) and the x-axis are the prediction dates. Certain settings assume that true data exists for the given prediction range.

There are 3 plots types:

-Prediction(Default): Plots just the mean prediction with confidence bounds

-Comparison: Plots the same as Prediction, but includes a line representing the true data values for the given prediction range.

-Error: Plots the residual errors between the mean prediction and true values along with error bounds. A reference line is included at 0.

Parameters:
  • preds – This is the entire prediction set returned by the

  • AnalysisGraph.cpp. (generate_prediction() method in) –

  • indicator – A string representing the indicator variable for which we

  • predictions (want mean) –

  • printed. (etc) –

  • ci – Confidence Level (as decimal). Default is 0.95 or 95%.

  • plot_type – A string that specifies plot type. Set as ‘Prediction’(default),

  • 'Comparison'

  • 'Error'. (or) –

  • save_as – A string representing the path and file name in which to save

  • plot (the) –

  • None (must include extensions. If) –

  • file. (no figure is saved to) –

  • **kwargs – Here country, state, and units can be specified. The same

  • kwargs (excluding k) –

Returns:

None

walk_forward_val(initial_training_window: Tuple[Tuple[int, int], Tuple[int, int]], end_prediction_date: Tuple[int, int], burn: int = 10000, res: int = 200, **kwargs) pandas.core.frame.DataFrame[source]
estimate_deltas(G, intervened_node: str, n_timesteps: int, start_year: int, start_month: int, country: Optional[str] = 'South Sudan', state: Optional[str] = None)[source]

Utility function that estimates Rate of Change (deltas) for the intervened node per timestep. This will use the units that the CAG was parameterized with. WARNING: The state and country should be same as what was passed to G.parameterize() or else you could get mismatched data.

Deltas are estimated by percent change between each time step. (i.e, (current - next)/current). Heuristics are in place to handle NAN and INF values. If changed from 0 to 0 (NAN case), then delta = 0. If increasing from 0 (+INF case), then delta = positive absolute mean of all finite deltas. If decreasing from 0 (-INF case), then delta = negative absolute mean of all finite deltas.

See function get_true_values to see how the data is aggregated to fill in values for missing time points which calculating the deltas.

Parameters:
  • G – A completely parameterized and quantified CAG with indicators,

  • matrx (estimated transition) –

  • values. (and indicator) –

  • intervened_node – A string of the full name of the node in which we

  • on. (are intervening) –

  • n_timesteps – Number of time steps.

  • start_year – The starting year (e.g, 2012).

  • start_month – The starting month (1-12).

Returns:

1D numpy array of deltas.

intervention(target_node: str, intervened_node: str, G=None, input=None, start_year=2012, start_month=None, end_year=2017, end_month=None, plot=False, plot_type='Compare', **kwargs)[source]

This is the main function of this module. This parameterizes a given CAG (see requirements in Args) and calls other functions within this module to predict values for a specified target node’s indicator variable given a start date and end date. Returns pandas dataframe containing predicted values, true values, and error.

Parameters:
  • target_node – A string of the full name of the node in which we

  • variable. (wish to predict values for its attached indicator) –

  • intervened_node – A string of the full name of the node upon which we

  • intervening. (are) –

  • G – A CAG. It must have mapped indicator values and estimated transition

  • matrix. (indicators and an estimated transition) –

  • input – This allows you to upload a CAG from a pickle file, instead of

  • mapped (passing it directly as an argument. The CAG must have) –

  • matrix.

  • start_year – An integer, designates the starting year (ex: 2012).

  • start_month – An integer, starting month (1-12).

  • end_year – An integer, ending year.

  • end_month – An integer, ending month.

  • plot – Set to true to display a plot according to the plot type.

  • plot_type – By default setting plot to true displays the “Compare” type

  • node's (plot which plots the predictions and true values of the target) –

  • steps (indicator variable on one plot labeled by time) –

  • type (There is also "Error") –

  • residuals (which plots the errors or) –

  • a (with) –

  • 0. (reference line at) –

  • **kwargs – These are options for parameterize() which specify

  • country (and get_true_values(). The appropriate arguments are) –

  • state

  • units

  • axes (fallback aggregation) –

  • aggregation (and) –

  • estimate_deltas() (function. Country and State also get passed into) –

  • country

  • state

:param : :param units: :param fallback_aggaxes: :param and aggfunc.:

Returns:

Returns a pandas dataframe.

delphi.jupyter_tools module

delphi.paths module

delphi.plot_delphi_results_offline module

delphi.plotter module

class LegendTitle(text_props=None)[source]

Bases: object

legend_artist(legend, orig_handle, fontsize, handlebox)[source]
delphi_plotter(model_state, num_bins=400, rotation=45, out_dir='plots', file_name_prefix='', month_year=False, num_distinct_derivative=25, save_csv=False)[source]

Module contents