Partial Dependence Plot#
Link to API Reference: PartialDependence
Partial dependence plots visualize the dependence between the response and a set of target features (usually one or two), marginalizing over all the other features. For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features, and can be misleading interpretability-wise when this is not met (e.g. when the model has many high order interactions).
How it Works#
The conceiving paper “Greedy Function Approximation: A Gradient Boosting Machine”  provides a good motivation and definition.
The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with Partial Dependence Plots. The visualizations provided will be for global explanations.
from interpret import set_visualize_provider from interpret.provider import InlineProvider set_visualize_provider(InlineProvider())
import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.decomposition import PCA from sklearn.pipeline import Pipeline from interpret import show from interpret.blackbox import PartialDependence seed = 42 np.random.seed(seed) X, y = load_breast_cancer(return_X_y=True, as_frame=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed) pca = PCA() rf = RandomForestClassifier(random_state=seed) blackbox_model = Pipeline([('pca', pca), ('rf', rf)]) blackbox_model.fit(X_train, y_train) pdp = PartialDependence(blackbox_model, X_train) show(pdp.explain_global(), 0)