Reproducibility

Reproducible-ML results are a very important aspect in research. Unfortunately, often not much time is spent on it. In the following we show how reproducible deals with randomness.

Setting a random seed

Reproducibility is achieved by fixing random seeds. First we consider the Input function package.

  1. Shuffle, repeat and prefetch function:
def shuffle_repeat_prefetch(dataset, buffer_size, n_epochs, random_seed):

    dataset = dataset.apply(
        tf.contrib.data.shuffle_and_repeat(
            buffer_size=buffer_size,
            count=n_epochs,
            seed=random_seed))
    dataset = dataset.prefetch(buffer_size)

return dataset
  1. Datasets from records function:
def dataset_from_records(split, record_dir, record_pattern, random_seed):

    record_pattern = os.path.join(record_dir,
                                  record_pattern.format(split=split, idx="*"))

    dataset = tf.data.Dataset.list_files(record_pattern, seed=random_seed)

return dataset

By adding a fixed random seed to this two functions, we achieve the desired reproducibility of the experiment.

These two functions are then called within the generic input function:

def input_fn(split,
             batch_size,
             buffer_size,
             num_parallel,
             compression,
             random_seed):
   def fn():
      ...

      dataset = dataset_from_records(split, random_seed=random_seed)

      ...

      dataset = shuffle_repeat_prefetch(dataset, random_seed)

The second part is controlling the randomness within the experiment itself. Lets see how this works for the MNIST GAN (exps.mnist package).

  1. The extended input function:
def extended_input_fn(noise_dims, mnist_feeder, gan_type):

    def add_noise_and_swap(feature_dict, label_dict):
        X = (tf.to_float(feature_dict.pop("X")) - 128.0) / 128.0

        feature_dict["noise"] = tf.random_normal(
             [mnist_feeder["batch_size"], noise_dims], seed=mnist_feeder["random_seed"])

        feature_dict["labels"] = label_dict["y"]

    return feature_dict, X

    def fn():
        dataset = mnist_input_fn()()

        dataset = dataset.map(add_noise_and_swap,
            num_parallel_calls=mnist_feeder["num_parallel"])

    return dataset

return fn
  1. The main function in the MNIST GAN experiment:
def main(model_dir, save_summary_steps, save_checkpoints_steps,
    log_step_count_steps, gen_lr, crit_lr, max_train_steps, gan_type, mnist_feeder):

     config = tf.estimator.RunConfig(
         model_dir=model_dir,
         tf_random_seed=mnist_feeder["random_seed"],
         save_summary_steps=save_summary_steps,
         save_checkpoints_steps=save_checkpoints_steps,
         log_step_count_steps=log_step_count_steps
 )

Results

To show the reproducibility of an experiment we ran the MNIST unconditional GAN experiment two times on CPUs and compared the loss values.

python -m exps.mnist.gan with mnist_feeder.random_seed=123
../_images/exp.png

Loss 1 and Loss 2 are identical. Therefore perfect reproducibility is achieved.

Note

If you run this experiments on a GPU, reproducible results can not be achieved. It is impossible because on a GPU some non-deterministic results are expected.