Introducing experiments#

An experiment consists of a collection of runs whose state progresses through a chained sequence of steps. We can define, execute, persist and load experiments.

Creating experiments#

Let's create an empty experiment (experiment with no runs):

Creating an experiment

import mltraq

session = mltraq.create_session()
experiment = session.create_experiment()
print(experiment)
print("Before", session)
experiment.persist()
print("After", session)

Output

Experiment(name="akeut6", runs.count=0, id="d65df69e-1175-44a5-be2f-2232765703b8")
Before Session(db="sqlite:///:memory:", experiments(0)=[])
After Session(db="sqlite:///:memory:", experiments(1)=["akeut6"])

The statistics reported by the session refer to the persisted experiments. An empty experiment is not very useful, as there are no runs to execute.

Adding and executing runs#

Let's add a run to the experiment using the context manager. We introduce three state attributes with different semantics:

run.vars: It can store all object types, accessible only within steps
run.state: It can store all object types, accessible within steps and at runtime, not accessible after reloading
run.fields: It can store a limited set of object types, it supports reloading from database, and is always accessible

Creating a run

import mltraq

session = mltraq.create_session()
experiment = session.create_experiment()

with experiment.run() as run:
    run.vars.a = 1
    run.state.b = 2
    run.fields.c = 3

print(experiment)
print(experiment.runs)

run = experiment.runs.first()

print("run.vars", run.vars)
print("run.state", run.state)
print("run.fields", run.fields)

Output

Experiment(name="akeut6", runs.count=1, id="d65df69e-1175-44a5-be2f-2232765703b8")
Runs(keys(1)=["d65df69e-1175-44a5-be2f-2232765703b9"])
run.vars {}
run.state {'b': 2}
run.fields {'c': 3}

The first print shows the state of the experiment, reporting the count of runs with runs.count. The second print shows the contents of the experiment.runs object, a dictionary whose keys are the run IDs.

In the last block of prints, we see that only the contents of run.state and run.steps are accessible. run.vars remains available, but it's always empty by design after execution.

Tip

What type of state attribute shall you use? In principle, one should prefer the least powerful yet sufficient semantics that get the job done.
You can use regular local variables if their intended scope is within the step: they are the cheapest as they're discarded immediately once the step function returns, and stateless.
The semantics of the state of experiments, including run.vars, run.state and run.steps, is presented in detail in the sections Model of computation and State management.

Persistence of experiments#

In the next example, we add a run and execute a step function on it. We will then persist and reload the experiment, looking into its reloaded internal state.

Persisting and reloading an experiment

from mltraq import Run, create_session

session = create_session()
experiment = session.create_experiment("example")


def step(run: Run):
    run.vars.a = 1
    run.state.b = 2
    run.fields.c = 3


experiment.add_run()
experiment.execute(step)
experiment.persist()

experiment = session.load_experiment("example")
run = experiment.runs.first()

print("run.vars", run.vars)
print("run.state", run.state)
print("run.fields", run.fields)

Output

run.vars {}
run.state {}
run.fields {'c': 3}

Only run.steps is accessible, as expected by design.

Tip

In run.steps, you can store NumPy arrays, Pandas and Pyarrow objects, as well as lists, dictionaries and more.

Congratulations!

You have created your first experiment and run, playing with state attributes and persistence.