EBM Internals - Multiclass#

This is part 3 of a 3 part series describing EBM internals and how to make predictions. For part 1, click here. For part 2, click here.

In this part 3 we’ll cover multiclass, specified bin cuts, term exclusion, and unknown values. Before reading this part you should be familiar with the information in part 1 and part 2

# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
import numpy as np

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
# make a dataset composed of a nominal, an unused feature, and a continuous 
X = [["Peru", "", 7], ["Fiji", "", 8], ["Peru", "", 9], [None, "", None]]
y = [6000, 5000, 4000, 6000] # integer classes

# Fit a classification EBM without interactions
# Specify exact bin cuts for the continuous feature
# Exclude the middle feature during fitting
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingClassifier(
    interactions=0, 
    feature_types=['nominal', 'nominal', [7.25, 9.0]], 
    exclude=[(1,)],
    validation_size=0, outer_bags=1, min_samples_leaf=1, min_hessian=1e-9)
ebm.fit(X, y)
show(ebm.explain_global())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/interpret/glassbox/_ebm/_ebm.py:738: UserWarning: Missing values detected. Our visualizations do not currently display missing values. To retain the glassbox nature of the model you need to either set the missing values to an extreme value like -1000 that will be visible on the graphs, or manually examine the missing value score in ebm.term_scores_[term_index][0]
  warn(





print(ebm.classes_)
[4000 5000 6000]

Like all scikit-learn classifiers, we store the list of classes in the ebm.classes_ attribute as a sorted array. In this example our classes are integers, but we also accept strings as seen in part 2

print(ebm.feature_types)
['nominal', 'nominal', [7.25, 9.0]]

In this example we passed feature_types into the __init__ function of the ExplainableBoostingClassifier. Per scikit-learn convention, this was recorded unmodified in the ebm object.

print(ebm.feature_types_in_)
['nominal', 'nominal', 'continuous']

The feature_types passed into __init__ were actualized into the base feature types of [‘nominal’, ‘nominal’, ‘continuous’].

print(ebm.feature_names)
None

feature_names were not specified in the call to the __init__ function of the ExplainableBoostingClassifier, so it was set to None following the scikit-learn convention of recording __init__ parameters unmodified.

print(ebm.feature_names_in_)
['feature_0000', 'feature_0001', 'feature_0002']

Since we did not specify feature names, some default names were created for the model. If we had passed feature_names to the __init__ function of the ExplainableBoostingClassifier, or if we had used a Pandas dataframe with column names, then ebm.feature_names_in_ would have contained those names. Following scikit-learn’s SLEP007 convention, we recorded this in ebm.feature_names_in_

print(ebm.term_features_)
[(0,), (2,)]

In the call to the __init__ function of the ExplainableBoostingClassifier, we specified exclude=[(1,)], which means we excluded the middle feature in the list of terms for the model. The middle feature is thus missing from the list of terms in ebm.term_features_

print(ebm.term_names_)
['feature_0000', 'feature_0002']

ebm.term_names_ is also missing the middle feature since ebm.term_features_ is missing that feature

print(ebm.bins_)
[[{'Fiji': 1, 'Peru': 2}], [], [array([7.25, 9.  ])]]

ebm.bins_ is a per-feature attribute, so the middle feature is listed here. We see however that the middle feature does not have a binning definition since it is not considered when making predictions with the model.

These bins are structured as described in part 1 and part 2. One change to note though is that the continuous feature bin cuts are the same as the bin cuts [7.25, 9.0] specified in the feature_types parameter to the __init__ function of the ExplainableBoostingClassifier.

It is also noteworthy that the last bin cut specified is exactly equal to the largest feature value of 9.0. In this situation where a feature value is identical to the cut value, the feature gets placed into the upper bin.

print(ebm.intercept_)
[-5.42986413 -5.92206271  0.77374731]

For multiclass, ebm.intercept_ is an array containing a logit for each of the predicted classes in ebm.classes_. This behavior is identical to how other scikit-learn multiclass classifiers generate one logit per class.

print(ebm.term_scores_[0])
[[-2.07530571 -1.36670925  3.06520036]
 [-2.11402785  5.55937827 -3.78473271]
 [ 2.09466678 -2.09633451  0.35976618]
 [ 0.          0.          0.        ]]

ebm.term_scores_[0] is the lookup table for the nominal categorical feature containing country names. For multiclass, each bin consists of an array of logits with 1 logit per class being predicted. In this example, each row corresponds to a bin. There are 4 bins in the outer index and 3 class logits in the inner index.

Missing values are once again placed in the 0th bin index, shown above as the first row of 3 logits. The unknown bin is the last row of zeros.

Since this feature is a nominal categorial, we use the dictionary {‘Fiji’: 1, ‘Peru’: 2} to lookup which row of logits to use for each categorical string.

print(ebm.term_scores_[1])
[[-5.03133718 -4.23311636  8.54818126]
 [-5.1281371  -3.35698149  8.32481619]
 [-2.6758276  10.87715787 -7.32914533]
 [12.83530188 -3.28706002 -9.54385213]
 [ 0.          0.          0.        ]]

ebm.term_scores_[1] is the lookup table for the continuous feature. Once again, the 0th and last index are for missing values, and unknown values respectively. This particular example has 5 bins consisting of the 0th missing bin index, the three partitions from the 2 cuts, and the unknown bin index. Each row is a single bin that contains 3 class logits.

Sample code

This sample code incorporates everything discussed in all 3 sections. It could be used as a drop in replacement for the existing EBM predict function of the ExplainableBoostingRegressor or as the predict_proba function of the ExplainableBoostingClassifier.

from sklearn.utils.extmath import softmax

sample_scores = []
for sample in X:
    # start from the intercept for each sample
    score = ebm.intercept_.copy()
    if isinstance(score, float) or len(score) == 1:
        # regression or binary classification
        score = float(score)

    # we have 2 terms, so add their score contributions
    for term_idx, features in enumerate(ebm.term_features_):
        # indexing into a tensor requires a multi-dimensional index
        tensor_index = []

        # main effects will have 1 feature, and pairs will have 2 features
        for feature_idx in features:
            feature_val = sample[feature_idx]
            bin_idx = 0  # if missing value, use bin index 0

            if feature_val is not None and feature_val is not np.nan:
                # we bin differently for main effects and pairs, so first 
                # get the list containing the bins for different resolutions
                bin_levels = ebm.bins_[feature_idx]

                # what resolution do we need for this term (main resolution, pair
                # resolution, etc.), but limit to the last resolution available
                bins = bin_levels[min(len(bin_levels), len(features)) - 1]

                if isinstance(bins, dict):
                    # categorical feature
                    # 'unknown' category strings are in the last bin (-1)
                    bin_idx = bins.get(feature_val, -1)
                else:
                    # continuous feature
                    try:
                        # try converting to a float, if that fails it's 'unknown'
                        feature_val = float(feature_val)
                        # add 1 because the 0th bin is reserved for 'missing'
                        bin_idx = np.digitize(feature_val, bins) + 1
                    except ValueError:
                        # non-floats are 'unknown', which is in the last bin (-1)
                        bin_idx = -1
        
            tensor_index.append(bin_idx)
        # local_score is also the local feature importance
        local_score = ebm.term_scores_[term_idx][tuple(tensor_index)]
        score += local_score
    sample_scores.append(score)

predictions = np.array(sample_scores)

if hasattr(ebm, 'classes_'):
    # classification
    if len(ebm.classes_) == 2:
        # binary classification

        # softmax expects two logits for binary classification
        # the first logit is always equivalent to 0 for binary classification
        predictions = [[0, x] for x in predictions]
    predictions = softmax(predictions)

if hasattr(ebm, 'classes_'):
    print("probabilities for classes " + str(ebm.classes_))
    print("")
    print(ebm.predict_proba(X))
else:
    print(ebm.predict(X))
print("")
print(predictions)
probabilities for classes [4000 5000 6000]

[[1.64710071e-08 8.95437756e-10 9.99999983e-01]
 [9.89132306e-10 9.99999998e-01 8.76921033e-10]
 [9.99999982e-01 9.20999343e-10 1.66568683e-08]
 [1.49900277e-11 4.13471649e-11 1.00000000e+00]]

[[1.64710071e-08 8.95437756e-10 9.99999983e-01]
 [9.89132306e-10 9.99999998e-01 8.76921033e-10]
 [9.99999982e-01 9.20999343e-10 1.66568683e-08]
 [1.49900277e-11 4.13471649e-11 1.00000000e+00]]