Track and Collaborate on ML & AI Experiments.
The open-source Python library for ML & AI developers to design, execute and share experiments.
Track anything, stream, reproduce, collaborate, and resume the computation state anywhere.
Motivations & benefits
-
Blazing fast: The fastest experiment tracking solution in the industry.
-
Extreme tracking and interoperability: With native database types, NumPy and PyArrow serialization, and a safe subset of opcodes for Python pickles.
-
Promoting collaboration: Work seamlessly with your team by creating, storing, reloading, mixing, resuming, and sharing experiments using any local or remote SQL database.
-
Flexible and open: Interact with your experiments using Python, Pandas, and SQL from Python scripts, Jupyter notebooks, and dashboards without vendor lock-in.
Key features
- Immediate: Design and execute experiments with a few lines of code, stream your metrics.
- Collaborative: Backup, merge, share, and reload experiments with their computation state anywhere.
- Interoperable: Access your experiments with Python, Pandas, and SQL with native database types and open formats - no vendor lock-in.
- Flexible: Track native Python data types and structures, as well as NumPy, Pandas, and PyArrow objects.
- Lightweight: Thin layer with minimal dependencies that can run anywhere and complement other components/services.
Design choices
-
Computation: The chained execution of steps is implemented with joblib.Parallel using process-based parallelism. The cluster-specific backends of Dask, Ray, Spark, and custom ones can be used. The step functions and run objects must be serializable with cloudpickle. You can directly handle the evaluation of your runs without joblib, with less automation and more flexibility.
-
Persistence: The default database is SQLite, and its limits do apply. You can connect to any SQL database supported by SQLAlchemy. Database persistence supports a wide range of types, including bool, int, float, string, UUID.uuid, bytes, dict, list, tuple, set, NumPy, Pandas and PyArrow objects. The Data store interface is designed to handle out-of-database large objects. Compression is available and disabled by default.
Requirements
- Python 3.9+
- SQLAlchemy 2.0+, Pandas 1.5.3+, and Joblib 1.3.2+ (installed as dependencies)
Installation
To install MLtraq:
pip install mltraq --upgrade
How to integrate MLtraq it in your projects?
MLtraq is progressing rapidly and interfaces might change at any time.
Pin its exact version for your project, to make sure it all works.
Have tests for your project, and update it once you verify that things work correctly.
Example 1: Define, execute and query an experiment with SQL
Define, execute and query an experiment with SQL
| from mltraq import create_experiment
# Create a new experiment, bound to an in-memory SQLite database by default.
experiment = create_experiment("example")
# Add a run and work directly on it.
with experiment.run() as run:
run.fields.tracked = 5
# Persist experiment to database.
experiment.persist()
# Query experiment with SQL.
print(
experiment.db.query("SELECT id_run, tracked FROM experiment_example")
)
|
Output id_run tracked
0 d65df69e-1175-44a5-be2f-2232765703b9 5
Example 2: Parameter grids, parallel and resumed execution
Parameter grids, parallel and resumed execution
| from mltraq import Run, create_experiment
def f1(run: Run):
"""
Store inputs as fields and compute AB
"""
run.fields.A = run.params.A
run.fields.B = run.params.B
run.fields.C = run.config.C
run.fields.AB = run.fields.A + run.fields.B
def f2(run: Run):
"""
Compute ABC
"""
run.fields.ABC = run.fields.AB + run.fields.C
def f3(run: Run):
"""
Compute ABCD
"""
run.fields.ABCD = run.fields.ABC + run.config.D
print(
create_experiment("example")
.add_runs(A=[1, 2], B=[3, 4]) # Parameters grid
.execute([f1, f2], config={"C": 5}) # Execute steps
.persist() # Persistence to database
.reload() # Reload experiment from database
.execute(f3, config={"D": 6}) # Continue execution
.persist(if_exists="replace") # Persist to database
.db.query(
"SELECT A, B, C, AB, ABC, ABCD FROM experiment_example"
) # SQL query
)
|
Output A B C AB ABC ABCD
0 2 3 5 5 10 16
1 2 4 5 6 11 17
2 1 4 5 5 10 16
3 1 3 5 4 9 15
Example 3: IRIS Flowers Classification
IRIS Flowers Classification
| from functools import partial
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils import shuffle
import mltraq
def load(run: mltraq.Run):
# Load the IRIS dataset, taking care of shuffling the samples.
# We use run.vars, accessible only within the execution of the runs.
run.vars.X, run.vars.y = shuffle(
*load_iris(return_X_y=True), random_state=run.params.seed
)
def train_predict(run: mltraq.Run):
# Instantiate and train classifier on 100 samples (50 random samples left for evaluation).
model = run.params.classifier(random_state=run.params.seed).fit(
run.vars.X[:100], run.vars.y[:100]
)
# Track the classifier name on run.fields, persisted to database.
run.fields.model_name = model.__class__.__name__
# Use trained model to make predictions.
run.vars.y_pred = model.predict(run.vars.X[100:])
run.vars.y_true = run.vars.y[100:]
def evaluate(run: mltraq.Run):
# Track accuracy score from previously determined predictions.
run.fields.accuracy = accuracy_score(run.vars.y_true, run.vars.y_pred)
# Connect to the MLtraq session and create an experiment.
session = mltraq.create_session()
experiment = session.create_experiment()
# Use a parameter grid to define the experiment's runs.
experiment.add_runs(
classifier=[
partial(DummyClassifier, strategy="most_frequent"),
partial(LogisticRegression, max_iter=1000),
partial(KMeans, n_clusters=3, n_init="auto"),
DecisionTreeClassifier,
RandomForestClassifier,
],
seed=range(10),
)
# Execute experiment, running in parallel the step functions on each run.
experiment.execute(steps=[load, train_predict, evaluate])
# Query the results and report the ML models leaderboard.
df_leaderboard = (
experiment.runs.df()
.groupby("model_name")
.mean(numeric_only=True)
.sort_values(by="accuracy", ascending=False)
)
print(df_leaderboard)
|
Output accuracy
model_name
LogisticRegression 0.960
RandomForestClassifier 0.952
DecisionTreeClassifier 0.938
KMeans 0.336
DummyClassifier 0.288
License
This project is licensed under the terms of the BSD 3-Clause License.
Latest update: 2025-03-03 using mltraq==0.1.158