Partial Dependence Plot


Partial dependence plots visualize the dependence between the response and a set of target features (usually one or two), marginalizing over all the other features. For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features, and can be misleading interpretability-wise when this is not met (e.g. when the model has many high order interactions).

How it Works

The PDP module for scikit-learn [2] provides a succinct description of the algorithm here.

Christoph Molnar’s “Interpretable Machine Learning” e-book [1] has an excellent overview on partial dependence that can be found here.

The conceiving paper “Greedy Function Approximation: A Gradient Boosting Machine” [3] provides a good motivation and definition.

Code Example

The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with Partial Dependence Plots. The visualizations provided will be for global explanations.

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from interpret import show
from interpret.blackbox import PartialDependence

seed = 1
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

pca = PCA()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1)

blackbox_model = Pipeline([('pca', pca), ('rf', rf)]), y_train)

pdp = PartialDependence(predict_fn=blackbox_model.predict_proba, data=X_train)
pdp_global = pdp.explain_global()