Decision Tree#
Links to API References: ClassificationTree, RegressionTree
See the backing repository for Decision Tree here.
Summary
A supervised decision tree. This is a recursive partitioning method where the feature space is continually split into further partitions based on a split criteria. A predicted value is learned for each partition in the “leaf nodes” of the learned tree. This is a light wrapper to the decision trees exposed in scikit-learn
. Single decision trees often have weak model performance, but are fast to train and great at identifying associations. Low depth decision trees are easy to interpret, but quickly become complex and unintelligible as the depth of the tree increases.
How it Works
Christoph Molnar’s “Interpretable Machine Learning” e-book [1] has an excellent overview on decision trees that can be found here.
For implementation specific details, scikit-learn’s user guide [2] on decision trees is solid and can be found here.
Code Example
The following code will train an decision tree classifier for the breast cancer dataset. The visualizations provided will be for both global and local explanations.
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from interpret.glassbox import ClassificationTree
from interpret import show
seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
dt = ClassificationTree(random_state=seed)
dt.fit(X_train, y_train)
auc = roc_auc_score(y_test, dt.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
AUC: 0.957
show(dt.explain_global())