EBM Internals - Multiclass#

This is part 3 of a 3 part series describing EBM internals and how to make predictions. For part 1, click here. For part 2, click here.

In this part 3 we’ll cover multiclass, specified bin cuts, term exclusion, and unknown values. Before reading this part you should be familiar with the information in part 1 and part 2

# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
import numpy as np

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
# make a dataset composed of a nominal, an unused feature, and a continuous 
X = [["Peru", "", 7.0], ["Fiji", "", 8.0], ["Peru", "", 9.0], [None, "", None]]
y = [6000, 5000, 4000, 6000] # integer classes

# Fit a classification EBM without interactions
# Specify exact bin cuts for the continuous feature
# Exclude the middle feature during fitting
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingClassifier(
    feature_types=['nominal', 'nominal', [7.25, 9.0]], 
    mains=[0, 2], # this excludes the middle feature
    validation_size=0, early_stopping_rounds=1000, min_samples_leaf=1)
ebm.fit(X, y)
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/interpret/glassbox/ebm/ebm.py:467: UserWarning: Missing values detected. Our visualizations do not currently display missing values. To retain the glassbox nature of the model you need to either set the missing values to an extreme value like -1000 that will be visible on the graphs, or manually examine the missing value score in ebm.term_scores_[term_index][0]