# Partial Dependence Plot

Link to API Reference: [PartialDependence](./python/api/PartialDependence.ipynb)

<h2>Summary</h2>

Partial dependence plots visualize the dependence between the response and a set of target features (usually one or two), marginalizing over all the other features. For a perturbation-based interpretability method, it is relatively quick. PDP assumes independence between the features, and can be misleading interpretability-wise when this is not met (e.g. when the model has many high order interactions).

<h2>How it Works</h2>

The PDP module for `scikit-learn` [[1](pedregosa2011scikit_pdp)] provides a succinct description of the algorithm [here](https://scikit-learn.org/stable/modules/partial_dependence.html).

Christoph Molnar's "Interpretable Machine Learning" e-book [[2](molnar2020interpretable_pdp)] has an excellent overview on partial dependence that can be found [here](https://christophm.github.io/interpretable-ml-book/pdp.html).

The conceiving paper "Greedy Function Approximation: A Gradient Boosting Machine" [[3](friedman2001greedy_pdp)] provides a good motivation and definition.

<h2>Code Example</h2>

The following code will train a blackbox pipeline for the breast cancer dataset. Aftewards it will interpret the pipeline and its decisions with Partial Dependence Plots. The visualizations provided will be for global explanations.

In [None]:
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from interpret import show
from interpret.blackbox import PartialDependence

seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

pca = PCA()
rf = RandomForestClassifier(random_state=seed)

blackbox_model = Pipeline([('pca', pca), ('rf', rf)])
blackbox_model.fit(X_train, y_train)

pdp = PartialDependence(blackbox_model, X_train)

show(pdp.explain_global(), 0)

<h2>Further Resources</h2>

- [Paper link to conceiving paper](https://projecteuclid.org/download/pdf_1/euclid.aos/1013203451)
- [scikit-learn on their PDP module](https://scikit-learn.org/stable/modules/partial_dependence.html)

<h2>Bibliography</h2>

(pedregosa2011scikit_pdp)=
[1] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. Scikit-learn: machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.

(molnar2020interpretable_pdp)=
[2] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.

(friedman2001greedy_pdp)=
[3] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.