Generating counterfactual explanations without access to training data

If only the trained model is available but not the training data, DiCE can still be used to generate counterfactual explanations. Below we show an example where DiCE uses only basic metadata about each feature used in the ML model.

[1]:

# import DiCE
import pandas as pd
import dice_ml
from dice_ml.utils import helpers  # helper functions

[2]:

%load_ext autoreload
%autoreload 2

Defining meta data

We simulate “adult” income dataset from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/adult) by providing only meta information about the data: range is provided for continuous features and categories are provided for categorical features. Please note for Python<=3.6, “features” parameter should be provided as an OrderedDict in the same order that was used to train the ML model.

[3]:

d = dice_ml.Data(features={'age': [17, 90],
                           'workclass': ['Government', 'Other/Unknown', 'Private', 'Self-Employed'],
                           'education': ['Assoc', 'Bachelors', 'Doctorate', 'HS-grad', 'Masters',
                                         'Prof-school', 'School', 'Some-college'],
                           'marital_status': ['Divorced', 'Married', 'Separated', 'Single', 'Widowed'],
                           'occupation': ['Blue-Collar', 'Other/Unknown', 'Professional', 'Sales', 'Service', 'White-Collar'],
                           'race': ['Other', 'White'],
                           'gender': ['Female', 'Male'],
                           'hours_per_week': [1, 99]},
                 outcome_name='income')

Explaining pre-trained sklearn models

We first explain a RandomForest model that has been pre-trained on the Adult dataset.

[4]:

backend = 'sklearn'
sk_modelpath = helpers.get_adult_income_modelpath(backend=backend)  # pretrained model
m = dice_ml.Model(model_path=sk_modelpath, backend=backend)

The next two steps are the same as when using DiCE with training data. We specify the random algorithm and provide an input query instance.

[5]:

# initiate DiCE
exp = dice_ml.Dice(d, m, method="genetic")

/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator OneHotEncoder from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator OneHotEncoder from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator Pipeline from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator ColumnTransformer from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/sklearn/base.py:380: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.1.1 when using version 1.6.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 2
      1 # initiate DiCE
----> 2 exp = dice_ml.Dice(d, m, method="genetic")

File /mnt/c/Users/amshar/code/dice/dice_ml/dice.py:22, in Dice.__init__(self, data_interface, model_interface, method, **kwargs)
     15 def __init__(self, data_interface, model_interface, method="random",  **kwargs):
     16     """Init method
     17
     18     :param data_interface: an interface to access data related params.
     19     :param model_interface: an interface to access the output or gradients of a trained ML model.
     20     :param method: Name of the method to use for generating counterfactuals
     21     """
---> 22     self.decide_implementation_type(data_interface, model_interface, method, **kwargs)

File /mnt/c/Users/amshar/code/dice/dice_ml/dice.py:32, in Dice.decide_implementation_type(self, data_interface, model_interface, method, **kwargs)
     28         raise UserConfigValidationException(
     29             'Private data interface is not supported with kdtree explainer'
     30             ' since kdtree explainer needs access to entire training data')
     31 self.__class__ = decide(model_interface, method)
---> 32 self.__init__(data_interface, model_interface, **kwargs)

File /mnt/c/Users/amshar/code/dice/dice_ml/explainer_interfaces/dice_genetic.py:26, in DiceGenetic.__init__(self, data_interface, model_interface)
     20 def __init__(self, data_interface, model_interface):
     21     """Init method
     22
     23     :param data_interface: an interface class to access data related params.
     24     :param model_interface: an interface class to access trained ML model.
     25     """
---> 26     super().__init__(data_interface, model_interface)  # initiating data related parameters
     27     self.num_output_nodes = None
     29     # variables required to generate CFs - see generate_counterfactuals() for more info

File /mnt/c/Users/amshar/code/dice/dice_ml/explainer_interfaces/explainer_base.py:33, in ExplainerBase.__init__(self, data_interface, model_interface)
     30 if model_interface is not None:
     31     # self.data_interface.create_ohe_params()
     32     self.model = model_interface
---> 33     self.model.load_model()  # loading pickled trained model if applicable
     34     self.model.transformer.feed_data_params(data_interface)
     35     self.model.transformer.initialize_transform_func()

File /mnt/c/Users/amshar/code/dice/dice_ml/model_interfaces/base_model.py:43, in BaseModel.load_model(self)
     41 if self.model_path != '':
     42     with open(self.model_path, 'rb') as filehandle:
---> 43         self.model = pickle.load(filehandle)

File _tree.pyx:848, in sklearn.tree._tree.Tree.__setstate__()

File _tree.pyx:1554, in sklearn.tree._tree._check_node_ndarray()

ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

[6]:

# query instance in the form of a dictionary; keys: feature name, values: feature value
query_instance = pd.DataFrame({'age': 22,
                               'workclass': 'Private',
                               'education': 'HS-grad',
                               'marital_status': 'Single',
                               'occupation': 'Service',
                               'race': 'White',
                               'gender': 'Female',
                               'hours_per_week': 45}, index=[0])

Generate diverse counterfactuals

The initialization needs to be provided as random since the default kdtree is not supported for private data.

[7]:

# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
                                        initialization="random")
# visualize the results
dice_exp.visualize_as_dataframe(show_only_changes=True)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 # generate counterfactuals
----> 2 dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite",
      3                                         initialization="random")
      4 # visualize the results
      5 dice_exp.visualize_as_dataframe(show_only_changes=True)

NameError: name 'exp' is not defined

Explaining pre-trained deep learning models

We can also use a trained model based on tensorflow or pytorch. Below, we use a trained ML model which produces high accuracy on test datasets, comparable to other popular baselines. This sample trained model comes in-built with our package.

The variable backend below indicates the implementation type of DiCE we want to use. We use TensorFlow 2 in the notebooks with backend=’TF2’. You can set backend to ‘TF1’ or ‘PYT’ to use DiCE with TensorFlow 1.x or with PyTorch respectively. We want to note that the time required to find counterfactuals with Tensorflow 2.x’s eager style of execution is significantly greater than that with TensorFlow 1.x’s graph execution.

[8]:

import tensorflow as tf  # noqa

backend = 'TF' + tf.__version__[0]  # TF2
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
m = dice_ml.Model(model_path=ML_modelpath, backend=backend, func="ohe-min-max")

2025-07-13 22:29:32.541757: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-13 22:29:32.576855: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1752425972.608549 2887 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752425972.618909 2887 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1752425972.650886 2887 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752425972.650923 2887 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752425972.650926 2887 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1752425972.650928 2887 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-07-13 22:29:32.660577: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

[9]:

# initiate DiCE
exp = dice_ml.Dice(d, m, method="gradient")

/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/keras/src/layers/core/dense.py:93: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-07-13 22:29:36.778031: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
/home/amshar/python-envs/vpy39/lib/python3.9/site-packages/keras/src/optimizers/base_optimizer.py:86: UserWarning: Argument `decay` is no longer supported and will be ignored.
  warnings.warn(
WARNING:absl:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:absl:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.

[10]:

# query instance in the form of a dictionary; keys: feature name, values: feature value
query_instance = pd.DataFrame({'age': 22,
                               'workclass': 'Private',
                               'education': 'HS-grad',
                               'marital_status': 'Single',
                               'occupation': 'Service',
                               'race': 'White',
                               'gender': 'Female',
                               'hours_per_week': 45}, index=[0])

Generate diverse counterfactuals

[11]:

# generate counterfactuals
dice_exp = exp.generate_counterfactuals(query_instance, total_CFs=4, desired_class="opposite")
# visualize the results
dice_exp.visualize_as_dataframe(show_only_changes=True)

100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:10<00:00, 70.75s/it]

Diverse Counterfactuals found! total time taken: 01 min 10 sec
Query instance (original outcome : 0.01899999938905239)

	age	workclass	education	marital_status	occupation	race	gender	hours_per_week	income
0	22	Private	HS-grad	Single	Service	White	Female	45	0.019


Diverse Counterfactual set without sparsity correction since only metadata about each  feature is available (new outcome: 1.0

	age	workclass	education	marital_status	occupation	race	gender	hours_per_week	income
0	60	Self-Employed	Prof-school	Married	Professional	-	-	43	0.911
1	38	Other/Unknown	Assoc	Married	-	-	-	55	0.74
2	90	-	Doctorate	-	-	-	-	99	0.755
3	70	-	-	-	White-Collar	Other	Male	73	0.525

Note on weighing different features: When the training data is available, by default, the distance between two values of a continuous feature is scaled by the inverse median absolute deviation (MAD) of the feature using the training data. This is done to capture the relative prevalence of observing a continuous feature at a particular value, as discussed in our paper. However, when there is no access to the training data, as in the above case, no scaling is done and hence all features are weighted equally in the normalized form. As a result, the counterfactuals generated above are different from those in DiCE_getting_started notebook where the training data was available. Nonetheless, you can manually provide the scaling constants through a parameter feature_weights to the generate_counterfactuals() method as shown in this advanced notebook, or you can provide the MADs directly to the data interface dice_ml.Data if you know them.