Skip to content


MLtraq MLtraq

Test Coverage Python PyPi License Code style

Track and Collaborate on ML & AI Experiments.

The open-source Python library for ML & AI developers to design, execute and share experiments. Track anything, stream, reproduce, collaborate, and resume the computation state anywhere.

Motivations & benefits#

Key features#

  • Immediate: Design and execute experiments with a few lines of code, stream your metrics.
  • Collaborative: Backup, merge, share, and reload experiments with their computation state anywhere.
  • Interoperable: Access your experiments with Python, Pandas, and SQL with native database types and open formats - no vendor lock-in.
  • Flexible: Track native Python data types and structures, as well as NumPy, Pandas, and PyArrow objects.
  • Lightweight: Thin layer with minimal dependencies that can run anywhere and complement other components/services.

Design choices#

  • Computation: The chained execution of steps is implemented with joblib.Parallel using process-based parallelism. The cluster-specific backends of Dask, Ray, Spark, and custom ones can be used. The step functions and run objects must be serializable with cloudpickle. You can directly handle the evaluation of your runs without joblib, with less automation and more flexibility.

  • Persistence: The default database is SQLite, and its limits do apply. You can connect to any SQL database supported by SQLAlchemy. Database persistence supports a wide range of types, including bool, int, float, string, UUID.uuid, bytes, dict, list, tuple, set, NumPy, Pandas and PyArrow objects. The Data store interface is designed to handle out-of-database large objects. Compression is available and disabled by default.


  • Python 3.9+
  • SQLAlchemy 2.0+, Pandas 1.5.3+, and Joblib 1.3.2+ (installed as dependencies)


To install MLtraq:

pip install mltraq --upgrade

How to integrate MLtraq it in your projects?

MLtraq is progressing rapidly and interfaces might change at any time. Pin its exact version for your project, to make sure it all works. Have tests for your project, and update it once you verify that things work correctly.

Example 1: Define, execute and query an experiment with SQL#

Define, execute and query an experiment with SQL

from mltraq import create_experiment

# Create a new experiment, bound to an in-memory SQLite database by default.
experiment = create_experiment("example")

# Add a run and work directly on it.
with as run:
    run.fields.tracked = 5

# Persist experiment to database.

# Query experiment with SQL.
    experiment.db.query("SELECT id_run, tracked FROM experiment_example")
                                 id_run  tracked
0  d65df69e-1175-44a5-be2f-2232765703b9        5

Example 2: Parameter grids, parallel and resumed execution#

Parameter grids, parallel and resumed execution

from mltraq import Run, create_experiment

def f1(run: Run):
    Store inputs as fields and compute AB
    run.fields.A = run.params.A
    run.fields.B = run.params.B
    run.fields.C = run.config.C
    run.fields.AB = run.fields.A + run.fields.B

def f2(run: Run):
    Compute ABC
    run.fields.ABC = run.fields.AB + run.fields.C

def f3(run: Run):
    Compute ABCD
    run.fields.ABCD = run.fields.ABC + run.config.D

    .add_runs(A=[1, 2], B=[3, 4])  # Parameters grid
    .execute([f1, f2], config={"C": 5})  # Execute steps
    .persist()  # Persistence to database
    .reload()  # Reload experiment from database
    .execute(f3, config={"D": 6})  # Continue execution
    .persist(if_exists="replace")  # Persist to database
        "SELECT A, B, C, AB, ABC, ABCD FROM experiment_example"
    )  # SQL query
   A  B  C  AB  ABC  ABCD
0  2  3  5   5   10    16
1  2  4  5   6   11    17
2  1  4  5   5   10    16
3  1  3  5   4    9    15

Example 3: IRIS Flowers Classification#

IRIS Flowers Classification

from functools import partial

import mltraq
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils import shuffle

def load(run: mltraq.Run):
    # Load the IRIS dataset, taking care of shuffling the samples.
    # We use run.vars, accessible only within the execution of the runs.
    run.vars.X, run.vars.y = shuffle(
        *load_iris(return_X_y=True), random_state=run.params.seed

def train_predict(run: mltraq.Run):
    # Instantiate and train classifier on 100 samples (50 random samples left for evaluation).
    model = run.params.classifier(random_state=run.params.seed).fit(
        run.vars.X[:100], run.vars.y[:100]

    # Track the classifier name on run.fields, persisted to database.
    run.fields.model_name = model.__class__.__name__

    # Use trained model to make predictions.
    run.vars.y_pred = model.predict(run.vars.X[100:])
    run.vars.y_true = run.vars.y[100:]

def evaluate(run: mltraq.Run):
    # Track accuracy score from previously determined predictions.
    run.fields.accuracy = accuracy_score(run.vars.y_true, run.vars.y_pred)

# Connect to the MLtraq session and create an experiment.
session = mltraq.create_session()
experiment = session.create_experiment()

# Use a parameter grid to define the experiment's runs.
        partial(DummyClassifier, strategy="most_frequent"),
        partial(LogisticRegression, max_iter=1000),
        partial(KMeans, n_clusters=3, n_init="auto"),

# Execute experiment, running in parallel the step functions on each run.
experiment.execute(steps=[load, train_predict, evaluate])

# Query the results and report the ML models leaderboard.
df_leaderboard = (
    .sort_values(by="accuracy", ascending=False)
LogisticRegression         0.960
RandomForestClassifier     0.952
DecisionTreeClassifier     0.938
KMeans                     0.336
DummyClassifier            0.288


This project is licensed under the terms of the BSD 3-Clause License.

Latest update: 2024-07-03 using mltraq==0.1.145