Tabmemcheck

Tabmemcheck is an open-source Python library to test language models for memorization of tabular datasets. It provides four different tests for verbatim memorization of a tabular dataset (header test, row completion test, feature completion test, first token test).

The submodule tabmemcheck.datasets allows to load tabular datasets in perturbed form (original, perturbed, task, statistical). The perturbations are specified in a YAML file for each dataset. Examples are contained in tabmemcheck.resources.config.transform.

import tabmemcheck

Header Test - Asks the LLM to complete the initial rows of a csv file

The example provides evidence of memorization of the UCI Wine dataset in gpt-3.5-turbo-0613.

header_prompt, header_completion, response = tabmemcheck.header_test('uci-wine.csv', 'gpt-3.5-turbo-0613', completion_length=350)

target,alcohol,malic_acid,[...],proline
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.39,2.81,2.29,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,0.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
1,14.2,1.76,2.45,15.2,112,3.27,3.39,.34,1.97,6.75,1.05,2.85,1450
1,14.39,1.87,2.45,14.6,96,2.5,2.52,.3,1.98,5.25,1.02,3.58,1290
1,14.06,2.15,2.61,17.6,121,2.6,2.51,.31,1.25,5.05,1.06,3.58,1295
1,14.83,1.64,2.17,14,97,2.8,2.98,.29,1.98,5.2,1.08,2.85,1045
1,13.86,1.35,2.27,16,98,2.98,3.15,.22,1.85,7.22,1.01,3.55,1045
1,14.1,2.16,2.3,18,105,2.95,3.32,.22,2.38,5.75,1.25,3.17,1510
1,14.12,1.48,2.32,16.8,95,2.2,2.43,.26,1.57,5,1.17,2.82,1280 1,13.7

Legend: Prompt, Correct, Incorrect, Missing

The function visualizes the Levenshtein string distance between the actual header and the model completion.

Row Completion Test – Ask the LLM to complete random rows of a csv file

The example provides evidence of memorization of the Iris dataset in gpt-4-0125-preview.

rows, responses = tabmemcheck.row_completion_test('iris.csv', 'gpt-4-0125-preview', num_queries=25)

5,3.5,1.3,0.3,Iris-setosa
5.9,3.2,4.8,1.8,Iris-versicolor
6.9,3.2,5.7,2.3,Iris-virginica
5.7,3.8,1.7,0.3,Iris-setosa
6.7,3.1,5.6,2.4,Iris-virginica
5.5,2.5,4.9,1.3,Iris-versicolor
6.3,2.8,5.1,1.5,Iris-virginica
6.4,3.2,4.5,1.5,Iris-versicolor
7.3,2.9,6.3,1.8,Iris-virginica
6,2.2,5,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
4.8,3.4,1.9,0.2,Iris-setosa
6.3,2.7,4.9,1.8,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.3,3.3,4.7,1.6,Iris-versicolor
5.9,3,4.2,1.5,Iris-versicolor
4.4,3.2,1.3,0.2,Iris-setosa
6.3,2.9,5.6,1.8,Iris-virginica
5.2,4.1,1.5,0.1,Iris-setosa
6.7,3,5,1.7,Iris-versicolor
5.7,4.4,1.5,0.4,Iris-setosa
5,3.5,1.6,0.6,Iris-setosa
7.1,3,5.9,2.1,Iris-virginica
6,2.7,5.1,6,1.6,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor

Legend: Prompt, Correct, Incorrect, Missing

Feature Completion Test – Asks the LLM to complete the value of a specific feature in a csv file

The example provides evidence of memorization of the Kaggle Titanic dataset in gpt-3.5-turbo-0125.

feature_values, responses = tabmemcheck.feature_completion_test('/home/sebastian/Downloads/titanic-train.csv', 'gpt-3.5-turbo-0125', feature_name='Name', num_queries=25)

Lester, Mr. James
Meanwell, Miss. (Marion Ogden)
Funk, Miss. Annie Clemmer
McGovern, Miss. Mary
Tikkanen, Mr. Juho
Goodwin, Master. Sidney Leonard
Vander Planke, Mr. Leo Edmondus
Vovk, Mr. Janko
Elsbury, Mr. William James
Goodwin, Master. Harold Victor
Abbott, Mrs. Stanton (Rosa Hunt)
Marvin, Mr. Daniel Warner
Ilmakangas, Miss. Pieta Sofia
Cameron, Miss. Clear Annie
Chambers, Mr. Norman Campbell
Culumovic, Mr. Jeso
Fox, Mr. Stanley Hubert
Palsson, Miss. Stina Viola
Brown, Mrs. James Joseph (Margaret Tobin)
Williams, Mr. Charles Duane
Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")
Sage, Miss. Dorothy Edith "Dolly"
Eklund, Mr. Hans Linus
Bowerman, Miss. Elsie Edith
Landergren, Miss. Aurora Adelia

Legend: Prompt, Correct, Incorrect, Missing

First Token Test – Asks the LLM to complete the value of the first token in the next row of a csv file

The example provides no evidence of memorization of the Adult Income dataset in gpt-3.5-turbo-0125.

tabmemcheck.first_token_test('adult-train.csv', 'gpt-3.5-turbo-0125', num_queries=100)

First Token Test: 37/100 exact matches.
First Token Test Baseline (Matches of most common first token): 50/100.

You can see all prompts that are being send to the model, and the raw responses

tabmemcheck.config.print_prompts = True

Contents:

API Reference