EBM Internals - Regression#
In this part 1 we’ll cover the simplest useful EBM: a regression model that does not have interactions, missing values, or other complications.
At their core, EBMs are generalized additive models where the score contributions from individual features and interactions are added together to make a prediction. Each individual score contribution is determined using a lookup table. Before doing the lookup, we first need to discretize continuous features and assign bin indexes to categorical features.
Regression is the simplest form of EBM model because the final sum is the actual prediction without requiring an inverse link function.
# boilerplate from interpret import show from interpret.glassbox import ExplainableBoostingRegressor import numpy as np from interpret import set_visualize_provider from interpret.provider import InlineProvider set_visualize_provider(InlineProvider())
# make a dataset composed of a nominal categorical, and a continuous feature X = [["Peru", 7.0], ["Fiji", 8.0], ["Peru", 9.0]] y = [450.0, 550.0, 350.0] # Fit a regression EBM without interactions # Eliminate the validation set to handle the small dataset ebm = ExplainableBoostingRegressor( interactions=0, validation_size=0, outer_bags=1, min_samples_leaf=1) ebm.fit(X, y) show(ebm.explain_global())