Interpretable Regression#
In this notebook we will fit regression explainable boosting machine (EBM), LinearRegression, and RegressionTree models. After fitting them, we will use their glassbox nature to understand their global and local explanations.
This notebook can be found in our examples folder on GitHub.
# install interpret if not already installed
try:
import interpret
except ModuleNotFoundError:
!pip install --quiet interpret scikit-learn
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from interpret import show
from interpret.perf import RegressionPerf
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
dataset = fetch_california_housing()
X = dataset.data
y = dataset.target
names = dataset.feature_names
seed = 42
np.random.seed(seed)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
Explore the dataset
from interpret import show
from interpret.data import Marginal
marginal = Marginal(names).explain_data(X_train, y_train, name='Train Data')
show(marginal)
Train the Explainable Boosting Machine (EBM)
from interpret.glassbox import ExplainableBoostingRegressor, LinearRegression, RegressionTree
ebm = ExplainableBoostingRegressor(names, interactions=3)
ebm.fit(X_train, y_train)
ExplainableBoostingRegressor(feature_names=['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], interactions=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExplainableBoostingRegressor(feature_names=['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], interactions=3)
EBMs are glassbox models, so we can edit them
# post-process monotonize the MedInc feature
ebm.monotonize("MedInc", increasing=True)
ExplainableBoostingRegressor(feature_names=['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], interactions=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExplainableBoostingRegressor(feature_names=['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'], interactions=3)
Global Explanations: What the model learned overall
ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)
Local Explanations: How an individual prediction was made
ebm_local = ebm.explain_local(X_test[:5], y_test[:5], name='EBM')
show(ebm_local, 0)
Evaluate EBM performance
ebm_perf = RegressionPerf(ebm, names).explain_perf(X_test, y_test, name='EBM')
show(ebm_perf)
Let's test out a few other Explainable Models
from interpret.glassbox import LinearRegression, RegressionTree
lr = LinearRegression(names)
lr.fit(X_train, y_train)
rt = RegressionTree(names, random_state=seed)
rt.fit(X_train, y_train)
<interpret.glassbox._decisiontree.RegressionTree at 0x7f8debb88490>
Compare performance using the Dashboard
lr_perf = RegressionPerf(lr, names).explain_perf(X_test, y_test, name='Linear Regression')
show(lr_perf)
rt_perf = RegressionPerf(rt, names).explain_perf(X_test, y_test, name='Regression Tree')
show(rt_perf)
Glassbox: All of our models have global and local explanations
lr_global = lr.explain_global(name='Linear Regression')
show(lr_global)
rt_global = rt.explain_global(name='Regression Tree')
show(rt_global)