API Reference

Tests for tabular datasets (based on csv files)

tabmemcheck.dataset_name_test(csv_file: str, llm: LLM_Interface | str, few_shot_csv_files=['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], few_shot_dataset_names=None, num_rows=5, header=True, random_rows=False, system_prompt: str = 'default', rng=None)

Test if the model knows the names of the features in a csv file.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
few_shot_dataset_names – A list of dataset names to be used as few-shot examples. If None, the dataset names are are the file names of the few-shot csv files.
system_prompt – The system prompt to be used.

Num_rows:

The number of dataset rows to be given to the model as part of the prompt.

Header:

If True, the first row of the csv file is included in the prompt (it usually contains the feature names).

Random_rows:

If True, the rows are selected at random from the dataset.

tabmemcheck.feature_completion_test(csv_file: str, llm: LLM_Interface | str, feature_name: str = None, num_queries=25, few_shot=5, out_file=None, system_prompt: str = 'default', rng=None)

Feature completion test for memorization. The test resports the number of correctly completed features.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
feature_name – The name of the feature to be used for the test.
num_queries – The number of feature values that we test the model on.
few_shot – The number of few-shot examples to be used.
out_file – Optionally save all queries and responses to a csv file.
system_prompt – The system prompt to be used.

Returns:

the feature values, the model responses.

tabmemcheck.feature_names_test(csv_file: str, llm: LLM_Interface | str, num_prefix_features: int = None, few_shot_csv_files=['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], system_prompt: str = 'default', verbose: bool = True, return_result=True)

Test if the model knows the names of the features in a csv file.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
num_prefix_features – The number of features given to the model as part of the prompt (defaults to 1/4 of the features).
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
system_prompt – The system prompt to be used.

tabmemcheck.feature_values_test(csv_file: str, llm: LLM_Interface | str, few_shot_csv_files=['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], system_prompt: str = 'default')

Test if the model knows valid feature values for the features in a csv file. Asks the model to provide samples, then compares the sampled feature values to the values in the csv file.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
system_prompt – The system prompt to be used.

tabmemcheck.first_token_test(csv_file: str, llm: LLM_Interface | str, num_prefix_rows=10, num_queries=25, few_shot=7, out_file=None, system_prompt: str = 'default', rng=None)

First token test for memorization. We ask the model to complete the first token of the next row of the csv file, given the previous rows. The test resports the number of correctly completed tokens.

Note that the ‘’first token’’ is not actually the first token produced by the llm, but consists of the first n digits of the row. The number of digits is determined by the function build_first_token.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
num_prefix_rows – The number of rows given to the model as part of the prompt.
num_queries – The number of rows that we test the model on.
few_shot – The number of few-shot examples to be used.
out_file – Optionally save all queries and responses to a csv file.
system_prompt – The system prompt to be used.

tabmemcheck.header_test(csv_file: str, llm: LLM_Interface | str, split_rows: list[int] = [2, 4, 6, 8], completion_length: int = 500, few_shot_csv_files: list[str] = ['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], system_prompt: str = 'default', verbose: bool = True, return_result=True, rng=None)

Header test for memorization.

We split the csv file at random positions in rows split_rows and performs 1 query for each split. Then we compare the best completion with the actual header.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
split_rows – The rows at which the csv file is split for the test.
completion_length – The length of the completions in the few-shot examples (reduce for LLMs with small context windows).
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
system_prompt – The system prompt to be used.

Returns:

The header prompt, the actual header completion, and the model response.

tabmemcheck.row_completion_test(csv_file: str, llm: LLM_Interface | str, num_prefix_rows=10, num_queries=25, few_shot=7, out_file=None, system_prompt: str = 'default', print_levenshtein: bool = True, return_result=True, rng=None)

Row completion test for memorization. The test resports the number of correctly completed rows.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
num_prefix_rows – The number of rows given to the model as part of the prompt.
num_queries – The number of rows that we test the model on.
few_shot – The number of few-shot examples to be used.
out_file – Optionally save all queries and responses to a csv file.
system_prompt – The system prompt to be used.
print_levenshtein – Print a visulization of the levenshtein distance between the model responses and the actual rows.

Returns:

the rows, the model responses.

tabmemcheck.run_all_tests(csv_file: str, llm: LLM_Interface | str, few_shot_csv_files=['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], unique_feature: str = None)

Run different tests for memorization and prior experience with the content of the csv file.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
unique_feature – The name of the feature to be used for the feature completion test.

tabmemcheck.sample(csv_file: str, llm: LLM_Interface | str, num_queries: int, temperature: float = 0.7, few_shot_csv_files: list[str] = ['iris.csv', 'adult-train.csv', 'openml-diabetes.csv', 'uci-wine.csv', 'california-housing.csv'], cond_feature_names: list[str] = [], drop_invalid_responses: bool = True, print_invalid_responses: bool = False, out_file=None, system_prompt: str = 'default')

Ask the model to provide random samples from the csv file.

Parameters:

csv_file – The path to the csv file.
llm – The language model to be tested.
num_queries – The desired number of samples.
few_shot_csv_files – A list of other csv files to be used as few-shot examples.
out_file – Optionally save all queries and responses to a csv file.
system_prompt – The system prompt to be used.

Generic chat-completion

tabmemcheck.prefix_suffix_chat_completion(llm: LLM_Interface, prefixes: list[str], suffixes: list[str], system_prompt: str, few_shot=None, num_queries=100, print_levenshtein=False, out_file=None, rng=None)

A general-purpose chat completion function. Given prefixes, suffixes, and few-shot examples, this function sends {num_queries} LLM queries of the format

System: <system_prompt>: User: <prefix> | Assistant: <suffix> | … | {few_shot} times, or one example from each (prefixes, suffixes) pair in a {few_shot} list. User: <prefix> | In the second case, few_shot = [([prefixes], [suffixes]), …, ([prefixes], [suffixes])] Assistant: <suffix> |

User: <prefix> Assistant: <response> (= test suffix?)

The prefixes, suffixes are and few-shot examples are randomly selected.

This function guarantees that the test suffix (as a complete string) is not contained in any of the few-shot prefixes or suffixes (a useful sanity check, we don’t want to provide the desired response anywhere in the context).

Parameters:

llm (LLM_Interface) – The LLM.
prefixes (list[str]) – A list of prefixes.
suffixes (list[str]) – A list of suffixes.
system_prompt (str) – The system prompt.
few_shot (_type_, optional) – Either an integer, to select the given number of few-shot examples from the list of prefixes and suffixes. Or a list [([prefixes], [suffixes]), …, ([prefixes], [suffixes])] to select one few-shot example from each list. Defaults to None.
num_queries (int, optional) – The number of queries. Defaults to 100.
print_levenshtein (bool, optional) – Visualize the Levenshtein string distance between test suffixes and LLM responses. Defaults to False.
out_file (_type_, optional) – Save all queries to a CSV file. Defaults to None.
rng (_type_, optional) – _description_. Defaults to None.

Raises:

Exception – It an error occurs.

Returns:

A tuple of test prefixes, test suffixes, and responses.

Return type:

tuple

Tabular dataset loading (original, perturbed, task, statistical)

tabmemcheck.datasets.load_adult(csv_file: str = 'adult-train.csv', *args, **kwargs): Load the Adult Income dataset (http://www.cs.toronto.edu/~delve/data/adult/adultDetail.html).

tabmemcheck.datasets.load_dataset(csv_file: str, yaml_config: str = None, transform: str = 'plain', permute_columns=False, print_stats=False, seed=None)

Load a dataset from a CSV file and apply transformations as specified in a YAML configuration file.

Parameters:

csv_file (str) – The path to the CSV file.
yaml_config (str, optional) – The path to the YAML configuration file. Defaults to None.
transform (str, optional) – The type of transformation to apply (‘original’, ‘perturbed’, ‘task’, ‘statistical’).
permute_columns (bool, optional) – Whether to permute the columns in the perturbed version. Defaults to False.
print_stats (bool, optional) – Whether to print statistics about the transformation. Defaults to False.
seed (optional) – The seed for the numpy random number generator. Defaults to None.

Returns:

The transformed dataset.

Return type:

pandas.DataFrame

tabmemcheck.datasets.load_housing(csv_file: str = 'california-housing.csv', *args, **kwargs): Load the California Housing dataset (https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html).

tabmemcheck.datasets.load_iris(csv_file: str = 'iris.csv', *args, **kwargs): Load the Iris dataset (https://archive.ics.uci.edu/ml/datasets/iris).

tabmemcheck.datasets.load_openml_diabetes(csv_file: str = 'openml-diabetes.csv', *args, **kwargs): Load the OpenML Diabetes dataset (https://www.openml.org/d/37).

tabmemcheck.datasets.load_wine(csv_file: str = 'iris.csv', *args, **kwargs): Load the UCI Wine dataset (https://archive.ics.uci.edu/dataset/109/wine).

LLM

class tabmemcheck.LLM_Interface

Bases: object

Generic interface to a language model.

chat_completion(messages, temperature: float, max_tokens: int)

Send a query to a chat model.

Parameters:

messages – The messages to send to the model. We use the OpenAI format.
temperature – The sampling temperature.
max_tokens – The maximum number of tokens to generate.

Returns:

The model response.

Return type:

str

completion(prompt: str, temperature: float, max_tokens: int)

Send a query to a language model.

Parameters:

prompt – The prompt (string) to send to the model.
temperature – The sampling temperature.
max_tokens – The maximum number of tokens to generate.

Returns:

The model response.

Return type:

str

tabmemcheck.openai_setup(model: str, azure: bool = False, *args, **kwargs)

Setup an OpenAI language model.

Parameters:

model – The name of the model (e.g. “gpt-3.5-turbo-0613”).
azure – If true, use a model deployed on azure.

This function uses the following environment variables:

OPENAI_API_KEY
OPENAI_API_ORG
AZURE_OPENAI_ENDPOINT
AZURE_OPENAI_KEY
AZURE_OPENAI_VERSION

Returns:: An LLM to work with!
Return type:: LLM_Interface

tabmemcheck.send_chat_completion(llm: LLM_Interface, messages, max_tokens=None, logfile=None): Ask the LLM to perform a chat_completion, with additional bells and whistles (logging, printing).

tabmemcheck.send_completion(llm: LLM_Interface, prompt, max_tokens=None, logfile=None): Ask the LLM to perform a completion, with additional bells and whistles (logging, printing).

Analysis

tabmemcheck.analysis.build_first_token(csv_file, verbose=False)

Given a csv file, build a first token that can be used in the first token test.

The first token is constructed by taking the first n digits of every row in the csv file (that is, this functions determines the n). Using the first n digits improves upon using the first digit on datasets where the first digit is always the same or contains few distinct values.

Note: This function does NOT check if the constructed first token is random.

Parameters:

csv_file – the path to the csv file.
verbose – if True, print the first tokens and their counts.

Returns:

the number of digits that make up the first token.

tabmemcheck.analysis.find_matches(df: ~pandas.core.frame.DataFrame, x, string_dist_fn=<function levenshtein_distances>, match_floating_point=True, strip_quotation_marks=True)

Find the closest matches between a row x and all rows in the dataframe df. By default, we use the levenshtein distance as the distance metric.

This function can handle some formatting differences between the values in the original data and LLM responses that should still be counted as equal.

Parameters:

df – a pandas dataframe.
x – a string, a pandas dataframe or a pandas Series.
string_dist_fn – a function that computes the distance between two strings. By default, this is the levenshtein distance.
match_floating_point – if True, handes floating point formatting differences, e.g. 0.28 vs. .280 or 172 vs 172.0 (default: True).
strip_quotation_marks – if True, strips quotation marks from the values in df and x (to handle the case where a model responds with “23853”, and the value in the data is 23853) (default: True).

Returns:

the minimum distance and the matching rows in df.

tabmemcheck.analysis.find_most_unique_feature(csv_file)

Given a csv file, find the feature that has the most unique values. This is the default feature used for the feature completion test.

Parameters:: csv_file – the path to the csv file.
Returns:: the name of the most unique feature and the fraction of unique values.

Utilities

tabmemcheck.utils.get_dataset_name(csv_file): Returns the name of the dataset.

tabmemcheck.utils.get_delimiter(csv_file): Returns the delimiter of a csv file.

tabmemcheck.utils.get_feature_names(csv_file): Returns the names of the features in a csv file (a list of strings).

tabmemcheck.utils.levenshtein_cmd(a: str, b: str): Visualization of the Levenshtein distance between a and b, using color codes to be printed in the console.

tabmemcheck.utils.levenshtein_html(a: str, b: str): HTML visualization of the Levenshtein distance between a and b.

tabmemcheck.utils.load_csv_array(csv_file, add_feature_names=False)

Load a csv file as a 2d numpy array where each entry is a string.

Add_feature_names:: if true, then each entry will have the format “feature_name = feature_value”
Returns:: a 2d numpy array of strings

tabmemcheck.utils.load_csv_rows(csv_file, header=True): Load a csv file as a list of strings, with one string per row.

tabmemcheck.utils.load_csv_string(csv_file, header=True, size=10000000): Load a csv file as a single string.

tabmemcheck.utils.load_samples(csv_file, add_feature_names=True)

Load a csv file as a list of ‘’Feature name = Feature value’’ strings.

Returns:: description, samples

tabmemcheck.utils.parse_feature_stings(strings, feature_names, **kwargs): Parse a list of features strings into a pandas dataframe.

tabmemcheck.utils.parse_feature_string(s, feature_names, as_dict=False, in_list=False, final_delimiter=','): Parse a string (model response) of the form “feature_name = feature_value, feature_name = feature_value, …” into a pandas dataframe.