EBM Internals - Binary classification#

This is part 2 of a 3 part series describing EBM internals and how to make predictions. For part 1, click here. For part 3, click here.

In this part 2 we’ll cover binary classification, interactions, missing values, ordinals, and the reduced discretization resolutions for interactions. Before reading this part you should be familiar with part 1

# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
import numpy as np

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
# make a dataset composed of an ordinal categorical, and a continuous feature
X = [["low", 8.0], ["medium", 7.0], ["high", 9.0], [None, None]]
y = ["apples", "apples", "oranges", "oranges"]

# Fit a classification EBM with 1 interaction
# Define an ordinal feature with specified ordering
# Limit the number of interaction bins to force a lower resolution
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingClassifier(
    feature_types=[["low", "medium", "high"], 'continuous'], 
    validation_size=0, early_stopping_rounds=1000, min_samples_leaf=1)
ebm.fit(X, y)
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/interpret/glassbox/ebm/ebm.py:467: UserWarning: Missing values detected. Our visualizations do not currently display missing values. To retain the glassbox nature of the model you need to either set the missing values to an extreme value like -1000 that will be visible on the graphs, or manually examine the missing value score in ebm.term_scores_[term_index][0]