Linear Model

Linear Model#

Links to API References: LogisticRegression, LinearRegression

See the backing repository for Linear Model here.

Summary

Linear / logistic regression, where the relationship between the response and its explanatory variables are modeled with linear predictor functions. This is one of the foundational models in statistical modeling, has quick training time and offers good interpretability, but has varying model performance. The implementation is a light wrapper to the linear / logistic regression exposed in scikit-learn.

How it Works

Christoph Molnar’s “Interpretable Machine Learning” e-book [1] has an excellent overview on linear and regression models that can be found here and here respectively.

For implementation specific details, scikit-learn’s user guide [2] on linear and regression models are solid and can be found here.

Code Example

The following code will train a logistic regression for the breast cancer dataset. The visualizations provided will be for both global and local explanations.

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from interpret.glassbox import LogisticRegression
from interpret import show

seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

lr = LogisticRegression(max_iter=3000, random_state=seed)
lr.fit(X_train, y_train)

auc = roc_auc_score(y_test, lr.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))

AUC: 0.998

/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

show(lr.explain_global())

show(lr.explain_local(X_test[:5], y_test[:5]), 0)

Further Resources

Bibliography

[1] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.

[2] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. Scikit-learn: machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.