Shapley Additive Explanations#
Link to API Reference: ShapKernel
See the backing repository for SHAP here.
SHAP is a framework that explains the output of any model using Shapley values, a game theoretic approach often used for optimal credit allocation. While this can be used on any blackbox models, SHAP can compute more efficiently on specific model classes (like tree ensembles). These optimizations become important at scale – calculating many SHAP values is feasible on optimized model classes, but can be comparatively slow in the model-agnostic setting. Due to their additive nature, individual (local) SHAP values can be aggregated and also used for global explanations. SHAP can be used as a foundation for deeper ML analysis such as model monitoring, fairness and cohort analysis.
How it Works#
The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with SHAP. The visualizations provided will be for local explanations.
from interpret import set_visualize_provider from interpret.provider import InlineProvider set_visualize_provider(InlineProvider())
import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.decomposition import PCA from sklearn.pipeline import Pipeline from interpret import show from interpret.blackbox import ShapKernel seed = 42 np.random.seed(seed) X, y = load_breast_cancer(return_X_y=True, as_frame=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed) pca = PCA() rf = RandomForestClassifier(random_state=seed) blackbox_model = Pipeline([('pca', pca), ('rf', rf)]) blackbox_model.fit(X_train, y_train) shap = ShapKernel(blackbox_model, X_train) shap_local = shap.explain_local(X_test[:5], y_test[:5]) show(shap_local, 0)
Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
Using 455 background data samples could cause slower run times. Consider using shap.sample(data, K) or shap.kmeans(data, K) to summarize the background as K samples.