ExplainableBoostingClassifier

ExplainableBoostingClassifier#

Link to Algorithm description: EBM

Greediness and smoothing in the EBM training algorithm were enabled by default in release v0.5.1 on Feb 8, 2024. Greediness and smoothing can improve accuracy but may change the learned feature function graphs compared to those learned using the earlier pure round-robin (cyclic boosting) algorithm without smoothing. To train EBMs similar to the defaults in v0.5.0 and earlier, set the greedy_ratio parameter to 0.

class interpret.glassbox.ExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=1024, max_interaction_bins=64, interactions='3x', exclude=None, validation_size=0.15, outer_bags=14, inner_bags=0, learning_rate=0.015, greedy_ratio=10.0, cyclic_progress=False, smoothing_rounds=75, interaction_smoothing_rounds=75, max_rounds=50000, early_stopping_rounds=100, early_stopping_tolerance=1e-05, callback=None, min_samples_leaf=4, min_hessian=0.0001, reg_alpha=0.0, reg_lambda=0.0, max_delta_step=0.0, gain_scale=5.0, min_cat_samples=10, cat_smooth=10.0, missing='separate', max_leaves=2, monotone_constraints=None, objective='log_loss', n_jobs=- 2, random_state=42)#

An Explainable Boosting Classifier.

Parameters:

feature_names (list of str, default=None) – List of feature names.
feature_types (list of FeatureType, default=None) –
List of feature types. FeatureType can be:
- ’auto’: Auto-detect
- ’quantile’: Continuous with equal density bins
- ’rounded_quantile’: Continuous with quantile bins, but the cut values are rounded when possible
- ’uniform’: Continuous with equal width bins
- ’winsorized’: Continuous with equal width bins, but the leftmost and rightmost cut are chosen by quantiles
- ’continuous’: Use the default binning for continuous features, which is ‘quantile’ currently
- [List of float]: Continuous with specified cut values. Eg: [5.5, 8.75]
- [List of str]: Ordinal categorical where the order has meaning. Eg: [“low”, “medium”, “high”]
- ’nominal’: Categorical where the order has no meaning. Eg: country names
max_bins (int, default=1024) – Max number of bins per feature for the main effects stage.
max_interaction_bins (int, default=64) – Max number of bins per feature for interaction terms.
interactions (int, float, str, or list of tuples of feature indices, default="3x") –
Interaction terms to be included in the model. Options are:
- Integer (1 <= interactions): Count of interactions to be automatically selected
- Percentage (interactions < 1.0): Determine the integer count of interactions by multiplying the number of features by this percentage
- String with format (float + “x”): Determine the integer count of interactions by multiplying the number of features by the float value.
- List of tuples: The tuples contain the indices of the features within each additive term. In addition to pairs, the interactions parameter accepts higher order interactions. It also accepts univariate terms which will cause the algorithm to boost the main terms at the same time as the interactions. When boosting mains at the same time as interactions, the exclude parameter should be set to ‘mains’ and currently max_bins needs to be equal to max_interaction_bins.
exclude ('mains' or list of tuples of feature indices|names, default=None) – Features or terms to be excluded.
validation_size (int or float, default=0.15) –
Validation set size. Used for early stopping during boosting, and is needed to create outer bags.
- Integer (1 <= validation_size): Count of samples to put in the validation sets
- Percentage (validation_size < 1.0): Percentage of the data to put in the validation sets
- 0: Turns off early stopping. Outer bags have no utility. Error bounds will be eliminated
outer_bags (int, default=14) – Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs.
inner_bags (int, default=0) – Number of inner bags. 0 turns off inner bagging.
learning_rate (float, default=0.015) – Learning rate for boosting.
greedy_ratio (float, default=10.0) – The proportion of greedy boosting steps relative to cyclic boosting steps. A value of 0 disables greedy boosting, effectively turning it off.
cyclic_progress (bool or float, default=False) – This parameter specifies the proportion of the boosting cycles that will actively contribute to improving the model’s performance. It is expressed as a bool or float between 0 and 1, with the default set to True(1.0), meaning 100% of the cycles are expected to make forward progress. If forward progress is not achieved during a cycle, that cycle will not be wasted; instead, it will be used to update internal gain calculations related to how effective each feature is in predicting the target variable. Setting this parameter to a value less than 1.0 can be useful for preventing overfitting.
smoothing_rounds (int, default=75) – Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
interaction_smoothing_rounds (int, default=75) – Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting.
max_rounds (int, default=50000) – Total number of boosting rounds with n_terms boosting steps per round.
early_stopping_rounds (int, default=100) – Number of rounds with no improvement to trigger early stopping. 0 turns off early stopping and boosting will occur for exactly max_rounds.
early_stopping_tolerance (float, default=1e-5) – Tolerance that dictates the smallest delta required to be considered an improvement which prevents the algorithm from early stopping. early_stopping_tolerance is expressed as a percentage of the early stopping metric. Negative values indicate that the individual models should be overfit before stopping. EBMs are a bagged ensemble of models. Setting the early_stopping_tolerance to zero (or even negative), allows learning to overfit each of the individual models a little, which can improve the accuracy of the ensemble as a whole. Overfitting each of the individual models reduces the bias of each model at the expense of increasing the variance (due to overfitting) of the individual models. But averaging the models in the ensemble reduces variance without much change in bias. Since the goal is to find the optimum bias-variance tradeoff for the ensemble of models — not the individual models — a small amount of overfitting of the individual models can improve the accuracy of the ensemble as a whole.
callback (Optional[Callable[[int, int, bool, float], bool]], default=None) – A user-defined function that is invoked at the end of each boosting step to determine whether to terminate boosting or continue. If it returns True, the boosting loop is stopped immediately. By default, no callback is used and training proceeds according to the early stopping settings. The callback function receives: (1) the bag index, (2) the number of boosting steps completed, (3) a boolean indicating whether progress was made in the current step, and (4) the current best score.
min_samples_leaf (int, default=4) – Minimum number of samples allowed in the leaves.
min_hessian (float, default=1e-4) – Minimum hessian required to consider a potential split valid.
reg_alpha (float, default=0.0) – L1 regularization.
reg_lambda (float, default=0.0) – L2 regularization.
max_delta_step (float, default=0.0) – Used to limit the max output of tree leaves. <=0.0 means no constraint.
gain_scale (float, default=5.0) – Scale factor to apply to nominal categoricals. A scale factor above 1.0 will cause the algorithm focus more on the nominal categoricals.
min_cat_samples (int, default=10) – Minimum number of samples in order to treat a category separately. If lower than this threshold the category is combined with other categories that have low numbers of samples.
cat_smooth (float, default=10.0) – Used for the categorical features. This can reduce the effect of noises in categorical features, especially for categories with limited data.
missing (str, default="separate") –
Method for handling missing values during boosting. The placement of the missing value bin can influence the resulting model graphs. For example, placing the bin on the “low” side may cause missing values to affect lower bins, and vice versa. This parameter does not affect the final placement of the missing bin in the model (the missing bin will remain at index 0 in the term_scores_ attribute). Possible values for missing are:
- ’low’: Place the missing bin on the left side of the graphs.
- ’high’: Place the missing bin on the right side of the graphs.
- ’separate’: Place the missing bin in its own leaf during each boosting step, effectively making it location-agnostic. This can lead to overfitting, especially when the proportion of missing values is small.
- ’gain’: Choose the best leaf for the missing value contribution at each boosting step, based on gain.
max_leaves (int, default=2) – Maximum number of leaves allowed in each tree.
monotone_constraints (list of int, default=None) –
This parameter allows you to specify monotonic constraints for each feature’s relationship with the target variable during model fitting. However, it is generally recommended to apply monotonic constraints post-fit using the monotonize function rather than setting them during the fitting process. This recommendation is based on the observation that, during fitting, the boosting algorithm may compensate for a monotone constraint on one feature by utilizing another correlated feature, potentially obscuring any monotonic violations.

If you choose to define monotone constraints, monotone_constraints should be a list with a length equal to the number of features. Each element in the list corresponds to a feature and should take one of the following values:
- 0: No monotonic constraint is imposed on the corresponding feature’s partial response.
- +1: The partial response of the corresponding feature should be monotonically increasing with respect to the target.
- -1: The partial response of the corresponding feature should be monotonically decreasing with respect to the target.
objective (str, default="log_loss") – The objective to optimize.
n_jobs (int, default=-2) – Number of jobs to run in parallel. Negative integers are interpreted as following joblib’s formula (n_cpus + 1 + n_jobs), just like scikit-learn. Eg: -2 means using all threads except 1.
random_state (int or None, default=42) – Random state. None uses device_random and generates non-repeatable sequences.

Variables:

classes_ (array of bool, int, or unicode with shape (n_classes,)) – The class labels.
n_features_in_ (int) – Number of features.
feature_names_in_ (List of str) – Resolved feature names. Names can come from feature_names, X, or be auto-generated.
feature_types_in_ (List of str) – Resolved feature types. Can be: ‘continuous’, ‘nominal’, or ‘ordinal’.
bins_ (List[Union[List[Dict[str, int]], List[array of float with shape (n_cuts,)]]]) – Per-feature list that defines how to bin each feature. Each feature in the list contains a list of binning resolutions. The first item in the binning resolution list is for binning main effect features. If there are more items in the binning resolution list, they define the binning for successive levels of resolutions. The item at index 1, if it exists, defines the binning for pairs. The last binning resolution defines the bins for all successive interaction levels. If the binning resolution list contains dictionaries, then the feature is either a ‘nominal’ or ‘ordinal’ categorical. If the binning resolution list contains arrays, then the feature is ‘continuous’ and the arrays will contain float cut points that separate continuous values into bins.
feature_bounds_ (array of float with shape (n_features, 2)) – min/max bounds for each feature. feature_bounds_[feature_index, 0] is the min value of the feature and feature_bounds_[feature_index, 1] is the max value of the feature. Categoricals have min & max values of NaN.
histogram_edges_ (List of None or array of float with shape (n_hist_edges,)) – Per-feature list of the histogram edges. Categorical features contain None within the List at their feature index.
histogram_weights_ (List of array of float with shape (n_hist_bins,)) – Per-feature list of the total sample weights within each feature’s histogram bins.
unique_val_counts_ (array of int with shape (n_features,)) – Per-feature count of the number of unique feature values.
term_features_ (List of tuples of feature indices) – Additive terms used in the model and their component feature indices.
term_names_ (List of str) – List of term names.
bin_weights_ (List of array of float with shape (n_feature0_bins, ..., n_featureN_bins)) – Per-term list of the total sample weights in each term’s tensor bins.
bagged_scores_ (List of array of float with shape (n_outer_bags, n_feature0_bins, ..., n_featureN_bins, n_classes) or (n_outer_bags, n_feature0_bins, ..., n_featureN_bins)) – Per-term list of the bagged model scores. The last dimension of length n_classes is dropped for binary classification.
term_scores_ (List of array of float with shape (n_feature0_bins, ..., n_featureN_bins, n_classes) or (n_feature0_bins, ..., n_featureN_bins)) – Per-term list of the model scores. The last dimension of length n_classes is dropped for binary classification.
standard_deviations_ (List of array of float with shape (n_feature0_bins, ..., n_featureN_bins, n_classes) or (n_feature0_bins, ..., n_featureN_bins)) – Per-term list of the standard deviations of the bagged model scores. The last dimension of length n_classes is dropped for binary classification.
link_ (str) – Link function used to convert the predictions or targets into linear space additive scores and vice versa via the inverse link. Possible values include: “monoclassification”, “custom_binary”, “custom_ovr”, “custom_multinomial”, “mlogit”, “vlogit”, “logit”, “probit”, “cloglog”, “loglog”, “cauchit”
link_param_ (float) – Float value that can be used by the link function. For classification it is only used by “custom_classification”.
bag_weights_ (array of float with shape (n_outer_bags,)) – Per-bag record of the total weight within each bag.
best_iteration_ (array of int with shape (n_stages, n_outer_bags)) – The number of boosting iterations performed within each stage until either early stopping, or the max_rounds was reached. Normally, the count of main effects boosting iterations will be in best_iteration_[0], and the count of interaction boosting iterations will be in best_iteration_[1].
intercept_ (array of float with shape (n_classes,) or (1,)) – Intercept of the model. Binary classification is shape (1,), and multiclass is shape (n_classes,).
bagged_intercept_ (array of float with shape (n_outer_bags, n_classes) or (n_outer_bags,)) – Bagged intercept of the model. Binary classification is shape (n_outer_bags,), and multiclass is shape (n_outer_bags, n_classes).

copy()#

Make a deepcopy of the EBM.

Returns:: The new copy.

decision_function(X, init_score=None)#

Predict scores from model before calling the link function.

Parameters:

X – NumPy array for samples.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

The sum of the additive term contributions.

estimate_mem(X)#

Estimate memory usage of the model. :param X: dataset

Returns:: Estimated memory usage in bytes. The estimate does not include the memory from the caller’s copy of X, nor the process’s code or other data. The estimate will be more accurate for larger datasets.

eval_terms(X)#

Term scores identical to the local explanation values obtained by calling ebm.explain_local(x).

Calling interpret.utils.inv_link(ebm.eval_terms(X).sum(axis=1) + ebm.intercept_, ebm.link_) is equivalent to calling ebm.predict(X) for regression or ebm.predict_proba(X) for classification.

Parameters:: X – NumPy array for samples.
Returns:: local explanation scores for each term of each sample.

explain_global(name=None)#

Provide global explanation for model.

Parameters:: name – User-defined explanation name.
Returns:: An explanation object, visualizing feature-value pairs as horizontal bar chart.

explain_local(X, y=None, name=None, init_score=None)#

Provide local explanations for provided samples.

Parameters:

X – NumPy array for X to explain.
y – NumPy vector for y to explain.
name – User-defined explanation name.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

An explanation object, visualizing feature-value pairs for each sample as horizontal bar charts.

fit(X, y, sample_weight=None, bags=None, init_score=None)#

Fit model to provided samples.

Parameters:

X – NumPy array for training samples.
y – NumPy array as training labels.
sample_weight – Optional array of weights per sample. Should be same length as X and y.
bags – Optional bag definitions. The first dimension should have length equal to the number of samples. The second dimension should have length equal to the number of outer_bags. The contents should be +1 for training, -1 for validation, and 0 if not included in the bag. Numbers other than 1 indicate how many times to include the sample in the training or validation sets.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Itself.

monotonize(term, increasing='auto', passthrough=0.0)#

Adjust a term to be monotone using isotonic regression.

An important consideration is that this function only adjusts a single term and will not modify pairwise terms. When a feature needs to be globally monotonic, any pairwise terms that include the feature should be excluded from the model.

Parameters:

term – Index or name of the term to monotonize
increasing – ‘auto’ or bool. ‘auto’ decides direction based on Spearman correlation estimate.
passthrough – the process of monotonization can result in a change to the mean response of the model. If passthrough is set to 0.0 then the model’s mean response to the training set will not change. If passthrough is set to 1.0 then any change to the mean response made by monotonization will be passed through to self.intercept_. Values between 0 and 1 will result in that percentage being passed through.

Returns:

Itself.

predict(X, init_score=None)#

Predict on provided samples.

Parameters:

X – NumPy array for samples.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Predicted class label per sample.

predict_proba(X, init_score=None)#

Probability estimates on provided samples.

Parameters:

X – NumPy array for samples.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

Probability estimate of sample for each class.

predict_with_uncertainty(X, init_score=None)#

Gets raw scores and uncertainties from the bagged base models. Generates predictions by averaging outputs across all bagged models, and estimates uncertainty using the standard deviation of predictions across bags.

Parameters:

X – ndarray of shape (n_samples, n_features) The input samples to predict on.
init_score – Optional. Either a model that can generate scores or per-sample initialization score. If samples scores it should be the same length as X.

Returns:

ndarray of shape (n_samples, 2): First column contains mean predictions Second column contains uncertainties

remove_features(features)#

Remove features (and their associated components) from a fitted EBM.

Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: histogram_edges_, histogram_weights_, unique_val_counts_, bins_, feature_names_in_, feature_types_in_, and feature_bounds_. Also, any terms that use the features being deleted will be deleted. The following attributes that the caller passed to the __init__ function are not modified: feature_names, and feature_types.

Parameters:: features – A list or enumerable of feature names or indices or booleans indicating which features to remove.
Returns:: Itself.

remove_terms(terms)#

Remove terms (and their associated components) from a fitted EBM.

Note that this will change the structure (i.e., by removing the specified indices) of the following components of self: term_features_, term_names_, term_scores_, bagged_scores_, standard_deviations_, and bin_weights_.

Parameters:: terms – A list (or other enumerable object) of term names or indices or booleans.
Returns:: Itself.

reorder_classes(classes)#

Re-order the class positions in a classification EBM.

Parameters:: classes – The new class order
Returns:: Itself.

scale(term, factor)#

Scale the individual term contribution by a constant factor.

For example, you can nullify the contribution of specific terms by setting their corresponding weights to zero; this would cause the associated global explanations (e.g., variable importance) to also be zero. A couple of things are worth noting: 1) this method has no affect on the fitted intercept and users will have to change that attribute directly (if desired), and 2) reweighting specific term contributions will also reweight their related components in a similar manner (e.g., variable importance scores, standard deviations, etc.).

Parameters:

term – term index or name of the term to be scaled.
factor – The amount to scale the term by.

Returns:

Itself.

score(X, y, sample_weight=None)#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

sweep(terms=True, bins=True, features=False)#

Purge unused elements from a fitted EBM.

Parameters:

terms – Boolean indicating if zeroed terms that do not affect the output should be purged from the model.
bins – Boolean indicating if unused bin levels that do not affect the output should be purged from the model.
features – Boolean indicating if features that are not used in any terms and therefore do not affect the output should be purged from the model.

Returns:

Itself.

term_importances(importance_type='avg_weight')#

Provide the term importances.

Parameters:: importance_type – the type of term importance requested (‘avg_weight’, ‘min_max’)
Returns:: An array term importances with one importance per additive term

to_excel(file)#

Exports the model to an Excel workbook.

Parameters:: file – a path-like object (str or os.PathLike), or a file-like object implementing .write().

to_excel_exportable(file)#

Converts the model to an Excel exportable representation.

Args:

Returns:: An xlsxwriter.Workbook object with an Excel representation of the model. This Workbook can be modified and then exported as any xlsxwriter object for advanced usages when custom export is required.

to_json(file, detail='all', indent=2)#

Export the model to a JSON text file.

Parameters:

file – a path-like object (str or os.PathLike), or a file-like object implementing .write().
detail – ‘minimal’, ‘interpretable’, ‘mergeable’, ‘all’
indent – If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or “” will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as “t”), that string is used to indent each level.

to_jsonable(detail='all')#

Convert the model to a JSONable representation.

Parameters:: detail – ‘minimal’, ‘interpretable’, ‘mergeable’, ‘all’
Returns:: JSONable object