Getting Started with Reproducible-ML

We recommend to setup Miniconda or Conda to create a virtual environment.

Creating a virtual environment

  1. Create a virtual environment from the environment file, which can be found in the home directory of this project:
conda env create -f environment reproducible
  1. Activate the new environment:
source activate reproducible

Set up MongoDB

We recommend MongoDB since it is a noSQL database and it allows easy storage of arbitrary JSON documents. Sacred [14] itself recommends the MongoDB.

Start by downloading the binaries for Ubuntu 16.04:

mkdir mongodb
cd mongodb
wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1604-4.0.3.tgz

Add path to your .bashrc file:

export PATH="$HOME/mongodb/bin:$PATH"

Then you can start the MongoDB daemon:

mkdir reproducible-db
mongodb --dbpath reproducible-db --fork --logpath reproducible-db

More information concerning the MongoDB can be found here.

How to use Sacredboard

Having set up the Mongo database, you can track your experiments in the database by just adding -m sacred:

python <train.py with params> -m sacred

This will create a new entry in the Sacred MongoDB. If you access a server via ssh make sure you open a port. e.g.:

ssh username@someserver.ch -L 10000:localhost:10000

Then you can open the Sacredboard via:

sacredboard -m sacred --port 10000 --no-browser

Workflow

Since all is ste up now, you can try to run some experiments and tests.

From the root folder /reproducible-ml you can run experiments as modules:

python -m exps.brain.gan

Preferably you run your Code on a GPU. If you want to run it on CPUs change the tensorflow-gpu package to the TensorFlow package in your environment.

There are several tests you can run:

python -m unittest datasets.brain.test_serialize

Or run all tests at once:

python -m unittest