EBM Internals - Regression#

This is part 1 of a 3 part series describing EBM internals and how to make predictions. For part 2, click here. For part 3, click here.

In this part 1 we’ll cover the simplest useful EBM: a regression model that does not have interactions, missing values, or other complications.

At their core, EBMs are generalized additive models where the score contributions from individual features and interactions are added together to make a prediction. Each individual score contribution is determined using a lookup table. Before doing the lookup, we first need to discretize continuous features and assign bin indexes to categorical features.

Regression is the simplest form of EBM model because the final sum is the actual prediction without requiring an inverse link function.

# boilerplate
from interpret import show
from interpret.glassbox import ExplainableBoostingRegressor
import numpy as np

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
# make a dataset composed of a nominal categorical, and a continuous feature 
X = [["Peru", 7.0], ["Fiji", 8.0], ["Peru", 9.0]]
y = [450.0, 550.0, 350.0]

# Fit a regression EBM without interactions
# Eliminate the validation set to handle the small dataset
ebm = ExplainableBoostingRegressor(
    interactions=0, 
    validation_size=0, outer_bags=1, min_samples_leaf=1, min_hessian=1e-9)
ebm.fit(X, y)
show(ebm.explain_global())