API Reference

High-Level API

TalkToEBM: A Natural Language Interface to Explainable Boosting Machines

t2ebm.describe_ebm(llm: AbstractChatModel | str, ebm: ExplainableBoostingClassifier | ExplainableBoostingRegressor, num_sentences: int = 30, **kwargs)

Ask the LLM to describe an EBM.

The function accepts additional keyword arguments that are passed to extract_graph, graph_to_text, and describe_graph_cot.

Parameters:
  • llm (Union[AbstractChatModel, str]) – The LLM.

  • ebm (Union[ExplainableBoostingClassifier, ExplainableBoostingRegressor]) – The EBM.

  • num_sentences (int, optional) – The desired number of senteces for the description. Defaults to 30.

Returns:

The description of the EBM.

Return type:

str

t2ebm.describe_graph(llm: AbstractChatModel | str, ebm: ExplainableBoostingClassifier | ExplainableBoostingRegressor, feature_index: int, num_sentences: int = 7, **kwargs)

Ask the LLM to describe a graph. Uses chain-of-thought reasoning.

The function accepts additional keyword arguments that are passed to extract_graph, graph_to_text, and describe_graph_cot.

Parameters:
  • llm (Union[AbstractChatModel, str]) – The LLM.

  • ebm (Union[ExplainableBoostingClassifier, ExplainableBoostingRegressor]) – The EBM.

  • feature_index (int) – The index of the feature to describe.

  • num_sentences (int, optional) – The desired number of senteces for the description. Defaults to 7.

Returns:

The description of the graph.

Return type:

str

t2ebm.feature_importances_to_text(ebm: ExplainableBoostingClassifier | ExplainableBoostingRegressor)

Convert the feature importances of an EBM to text.

Parameters:

ebm (_type_) – The EBM.

Returns:

Textual representation of the feature importances.

Return type:

str

Extract graphs from EBM’s and convert them to text

class t2ebm.graphs.EBMGraph(feature_name: str, feature_type: str, x_vals: List[Tuple[float, float]], scores: List[float], stds: List[float])

Bases: object

A datastructure for the graphs of Explainable Boosting Machines.

t2ebm.graphs.extract_graph(ebm: ExplainableBoostingClassifier | ExplainableBoostingRegressor, feature_index: int, normalization='none', use_feature_bounds=True)

Extract the graph of a feature from an Explainable Boosting Machine.

This is a low-level function. It does not return the final format in which the graph is presented to the LLM.

The purpose of this function is to extract the graph from the interals of the EBM and return it in a format that is easy to work with.

Parameters:
  • ebm (_type_) – The EBM.

  • feature_index (int) – The index of the feature in the EBM.

  • normalization (str, optional) – How to normalize the graph (shift on the y-axis). Possible values are: ‘mean’, ‘min’, ‘none’. Defaults to “none”.

  • use_feature_bounds (bool, optional) – If True, the first and last x-axis bins are the min and max values of the feature stored in the EBM. If false, the first and last value are -inf and inf, respectively. Defaults to True.

Raises:

Exception – If an error occurs.

Returns:

The graph.

Return type:

EBMGraph

t2ebm.graphs.graph_to_text(graph: EBMGraph, include_description=True, feature_format=None, x_axis_precision=None, y_axis_precision='auto', confidence_bounds=True, confidence_level=0.95, max_tokens=3000)

Convert an EBMGraph to text. This is the text that we then pass to the LLM.

The function takes care of a variety of different formatting issues that can arise in the process of converting a graph to text.

Parameters:
  • graph (EBMGraph) – The graph.

  • include_description (bool, optional) – Whether to include a short descriptive preamble that describes the graph to the LLM. Defaults to True.

  • feature_format (_type_, optional) – The format of the feature (continuous, cateorical, boolean). Defaults to None (auto-detect).

  • x_axis_precision (_type_, optional) – The precision of the values on the x-axis. Defaults to None.

  • y_axis_precision (str, optional) – The precision of the values on the x-axis. Defaults to “auto”.

  • confidence_bounds (bool, optional) – Whether to inlcude confidence bounds. Defaults to True.

  • confidence_level (float, optional) – The desired confidence level of the bounds. Defaults to 0.95.

  • max_tokens (int, optional) – The maximum number of tokens that the textual description of the graph can have. The function simplifies the graph so to fit into this token limit. Defaults to 3000.

Raises:

Exception – If an error occurs.

Returns:

The textual representation of the graph.

Return type:

str

t2ebm.graphs.plot_graph(graph: EBMGraph)

Plot a graph.

Parameters:

graph (EBMGraph) – The graph.

t2ebm.graphs.simplify_graph(graph: EBMGraph, min_variation_per_cent: float = 0.0)

Simplifies a graph. Removes redundant (flat) bins from the graph.

Parameters:
  • graph (EBMGraph) – The graph.

  • min_variation_per_cent (float, optional) – Parameter that controlls the degree of simplification. If min_variation_per_cent>0, the function simplifies the graph by removing bins that correspond to a less that min_variation_per_cent change in the score, considering the overal min/max difference of score for the feature as 100%. Defaults to 0.0.

Returns:

The simplified graph.

Return type:

EBMGraph

t2ebm.graphs.text_to_graph(graph: str)

Convert the textual representation of a graph back to an EBMGraph.

Prompt templates

Prompts that ask the LLM to perform tasks with Graphs and EBMs.

Functions either return a string or a sequene of messages / desired responses in the OpenAI message format.

t2ebm.prompts.describe_graph(graph: str, graph_description='', dataset_description='', task_description='Please describe the general pattern of the graph.')

Prompt the LLM to describe a graph. This is intended to be the first prompt in a conversation about a graph.

Parameters:
  • graph (str) – The graph to describe (in JSON format, obtained from graph_to_text).

  • graph_description (str, optional) – Additional description of the graph (e.g. “The y-axis of the graph depicts the probability of sucess.”). Defaults to “”.

  • dataset_description (str, optional) – Additional description of the dataset (e.g. “The dataset is a Pneumonia dataset collected by […]”). Defaults to “”.

  • task_description (str, optional) – A final prompt to instruct the LLM. Defaults to “Please describe the general pattern of the graph.”.

Returns:

The prompt to describe the graph.

Return type:

str

t2ebm.prompts.describe_graph_cot(graph, num_sentences=7, **kwargs)

Use chain-of-thought reasoning to elicit a description of a graph in at most {num_sentences} sentences.

Returns:

Messages in OpenAI format.

t2ebm.prompts.graph_system_msg(expert_description='an expert statistician and data scientist')

A system message that instructs the LLM to work with the graphs of an EBM.

Parameters:

expert_description (str, optional) – Description of the expert that we want the LLM to be. Defaults to “an expert statistician and data scientist”.

Returns:

The system message.

Return type:

str

t2ebm.prompts.summarize_ebm(feature_importances: str, graph_descriptions: str, expert_description='an expert statistician and data scientist', dataset_description='', num_sentences: int = None)

Prompt the LLM to summarize a Generalized Additive Model (GAM).

Returns:

Messages in OpenAI format.

Interface to the LLM

TalkToEBM structures conversations in a generic OpenAI message format that can be executed with different LLMs.

We interface the LLM via the simple class AbstractChatModel. To use your own LLM, simply implement the chat_completion method in a subclass.

class t2ebm.llm.AbstractChatModel

Bases: object

chat_completion(messages, temperature: float, max_tokens: int)

Send a query to a chat model.

Parameters:
  • messages – The messages to send to the model. We use the OpenAI format.

  • temperature – The sampling temperature.

  • max_tokens – The maximum number of tokens to generate.

Returns:

The model response.

Return type:

str

class t2ebm.llm.OpenAIChatModel(client, model)

Bases: AbstractChatModel

chat_completion(messages, temperature, max_tokens)

Send a query to a chat model.

Parameters:
  • messages – The messages to send to the model. We use the OpenAI format.

  • temperature – The sampling temperature.

  • max_tokens – The maximum number of tokens to generate.

Returns:

The model response.

Return type:

str

t2ebm.llm.chat_completion(llm: str | AbstractChatModel, messages)

Execute a sequence of user and assistant messages with an AbstractChatModel.

Sends multiple individual messages to the AbstractChatModel.

t2ebm.llm.openai_setup(model: str, azure: bool = False, *args, **kwargs)

Setup an OpenAI language model.

Parameters:
  • model – The name of the model (e.g. “gpt-3.5-turbo-0613”).

  • azure – If true, use a model deployed on azure.

This function uses the following environment variables:

  • OPENAI_API_KEY

  • OPENAI_API_ORG

  • AZURE_OPENAI_ENDPOINT

  • AZURE_OPENAI_KEY

  • AZURE_OPENAI_VERSION

Returns:

An LLM to work with!

Return type:

LLM_Interface