EBM Internals - Multiclass#
This is part 3 of a 3 part series describing EBM internals and how to make predictions. For part 1, click here. For part 2, click here.
In this part 3 we’ll cover multiclass, specified bin cuts, term exclusion, and unknown values. Before reading this part you should be familiar with the information in part 1 and part 2
# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
import numpy as np
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
# make a dataset composed of a nominal, an unused feature, and a continuous
X = [["Peru", "", 7], ["Fiji", "", 8], ["Peru", "", 9], [None, "", None]]
y = [6000, 5000, 4000, 6000] # integer classes
# Fit a classification EBM without interactions
# Specify exact bin cuts for the continuous feature
# Exclude the middle feature during fitting
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingClassifier(
interactions=0,
feature_types=['nominal', 'nominal', [7.25, 9.0]],
exclude=[(1,)],
validation_size=0, outer_bags=1, min_samples_leaf=1, min_hessian=1e-9)
ebm.fit(X, y)
show(ebm.explain_global())
/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/interpret/glassbox/_ebm/_ebm.py:751: UserWarning: Missing values detected. Our visualizations do not currently display missing values. To retain the glassbox nature of the model you need to either set the missing values to an extreme value like -1000 that will be visible on the graphs, or manually examine the missing value score in ebm.term_scores_[term_index][0]
warn(
print(ebm.classes_)
[4000 5000 6000]
Like all scikit-learn classifiers, we store the list of classes in the ebm.classes_ attribute as a sorted array. In this example our classes are integers, but we also accept strings as seen in part 2
print(ebm.feature_types)
['nominal', 'nominal', [7.25, 9.0]]
In this example we passed feature_types into the __init__ function of the ExplainableBoostingClassifier. Per scikit-learn convention, this was recorded unmodified in the ebm object.
print(ebm.feature_types_in_)
['nominal', 'nominal', 'continuous']
The feature_types passed into __init__ were actualized into the base feature types of [‘nominal’, ‘nominal’, ‘continuous’].
print(ebm.feature_names)
None
feature_names were not specified in the call to the __init__ function of the ExplainableBoostingClassifier, so it was set to None following the scikit-learn convention of recording __init__ parameters unmodified.
print(ebm.feature_names_in_)
['feature_0000', 'feature_0001', 'feature_0002']
Since we did not specify feature names, some default names were created for the model. If we had passed feature_names to the __init__ function of the ExplainableBoostingClassifier, or if we had used a Pandas dataframe with column names, then ebm.feature_names_in_ would have contained those names. Following scikit-learn’s SLEP007 convention, we recorded this in ebm.feature_names_in_
print(ebm.term_features_)
[(0,), (2,)]
In the call to the __init__ function of the ExplainableBoostingClassifier, we specified exclude=[(1,)], which means we excluded the middle feature in the list of terms for the model. The middle feature is thus missing from the list of terms in ebm.term_features_
print(ebm.term_names_)
['feature_0000', 'feature_0002']
ebm.term_names_ is also missing the middle feature since ebm.term_features_ is missing that feature
print(ebm.bins_)
[[{'Fiji': 1, 'Peru': 2}], [], [array([7.25, 9. ])]]
ebm.bins_ is a per-feature attribute, so the middle feature is listed here. We see however that the middle feature does not have a binning definition since it is not considered when making predictions with the model.
These bins are structured as described in part 1 and part 2. One change to note though is that the continuous feature bin cuts are the same as the bin cuts [7.25, 9.0] specified in the feature_types parameter to the __init__ function of the ExplainableBoostingClassifier.
It is also noteworthy that the last bin cut specified is exactly equal to the largest feature value of 9.0. In this situation where a feature value is identical to the cut value, the feature gets placed into the upper bin.
print(ebm.intercept_)
[-5.64276703 -5.7845831 0. ]
For multiclass, ebm.intercept_ is an array containing a logit for each of the predicted classes in ebm.classes_. This behavior is identical to how other scikit-learn multiclass classifiers generate one logit per class.
print(ebm.term_scores_[0])
[[-0.64405827 -0.3771541 0.83955602]
[-0.67298196 1.62669751 -1.10655707]
[ 0.65852012 -0.62477171 0.13350052]
[ 0. 0. 0. ]]
ebm.term_scores_[0] is the lookup table for the nominal categorical feature containing country names. For multiclass, each bin consists of an array of logits with 1 logit per class being predicted. In this example, each row corresponds to a bin. There are 4 bins in the outer index and 3 class logits in the inner index.
Missing values are once again placed in the 0th bin index, shown above as the first row of 3 logits. The unknown bin is the last row of zeros.
Since this feature is a nominal categorial, we use the dictionary {‘Fiji’: 1, ‘Peru’: 2} to lookup which row of logits to use for each categorical string.
print(ebm.term_scores_[1])
[[ -5.27627531 -5.0917026 10.09717302]
[ -5.33648687 -4.73258872 9.99901419]
[ -4.39671867 14.36023723 -9.68596077]
[ 15.00948085 -4.53594591 -10.41022644]
[ 0. 0. 0. ]]
ebm.term_scores_[1] is the lookup table for the continuous feature. Once again, the 0th and last index are for missing values, and unknown values respectively. This particular example has 5 bins consisting of the 0th missing bin index, the three partitions from the 2 cuts, and the unknown bin index. Each row is a single bin that contains 3 class logits.
Sample code
This sample code incorporates everything discussed in all 3 sections. It could be used as a drop in replacement for the existing EBM predict function of the ExplainableBoostingRegressor or as the predict_proba function of the ExplainableBoostingClassifier.
from sklearn.utils.extmath import softmax
sample_scores = []
for sample in X:
# start from the intercept for each sample
score = ebm.intercept_.copy()
if isinstance(score, float) or len(score) == 1:
# regression or binary classification
score = float(score)
# we have 2 terms, so add their score contributions
for term_idx, features in enumerate(ebm.term_features_):
# indexing into a tensor requires a multi-dimensional index
tensor_index = []
# main effects will have 1 feature, and pairs will have 2 features
for feature_idx in features:
feature_val = sample[feature_idx]
bin_idx = 0 # if missing value, use bin index 0
if feature_val is not None and feature_val is not np.nan:
# we bin differently for main effects and pairs, so first
# get the list containing the bins for different resolutions
bin_levels = ebm.bins_[feature_idx]
# what resolution do we need for this term (main resolution, pair
# resolution, etc.), but limit to the last resolution available
bins = bin_levels[min(len(bin_levels), len(features)) - 1]
if isinstance(bins, dict):
# categorical feature
# 'unknown' category strings are in the last bin (-1)
bin_idx = bins.get(feature_val, -1)
else:
# continuous feature
try:
# try converting to a float, if that fails it's 'unknown'
feature_val = float(feature_val)
# add 1 because the 0th bin is reserved for 'missing'
bin_idx = np.digitize(feature_val, bins) + 1
except ValueError:
# non-floats are 'unknown', which is in the last bin (-1)
bin_idx = -1
tensor_index.append(bin_idx)
# local_score is also the local feature importance
local_score = ebm.term_scores_[term_idx][tuple(tensor_index)]
score += local_score
sample_scores.append(score)
predictions = np.array(sample_scores)
if hasattr(ebm, 'classes_'):
# classification
if len(ebm.classes_) == 2:
# binary classification
# softmax expects two logits for binary classification
# the first logit is always equivalent to 0 for binary classification
predictions = [[0, x] for x in predictions]
predictions = softmax(predictions)
if hasattr(ebm, 'classes_'):
print("probabilities for classes " + str(ebm.classes_))
print("")
print(ebm.predict_proba(X))
else:
print(ebm.predict(X))
print("")
print(predictions)
probabilities for classes [4000 5000 6000]
[[1.30998715e-09 5.76262262e-10 9.99999998e-01]
[8.25675475e-10 9.99999998e-01 7.62156284e-10]
[9.99999998e-01 7.80930732e-10 1.52395050e-09]
[1.69218613e-10 2.30638665e-10 1.00000000e+00]]
[[1.30998715e-09 5.76262262e-10 9.99999998e-01]
[8.25675475e-10 9.99999998e-01 7.62156284e-10]
[9.99999998e-01 7.80930732e-10 1.52395050e-09]
[1.69218613e-10 2.30638665e-10 1.00000000e+00]]