#
Track and Collaborate on ML & AI Experiments.
The open-source Python library for ML & AI developers to design, execute and share experiments. Track anything, stream, reproduce, collaborate, and resume the computation state anywhere.
- Documentation: https://www.mltraq.com
- Source code: https://github.com/elehcimd/mltraq (License: BSD 3-Clause)
- Discussions: Ask questions, share ideas, engage
- Funding: You can star the project on GitHub and hire me to make your experiments run faster
Motivations & benefits#
-
Blazing fast: The fastest experiment tracking solution in the industry.
-
Extreme tracking and interoperability: With native database types, NumPy and PyArrow serialization, and a safe subset of opcodes for Python pickles.
-
Promoting collaboration: Work seamlessly with your team by creating, storing, reloading, mixing, resuming, and sharing experiments using any local or remote SQL database.
-
Flexible and open: Interact with your experiments using Python, Pandas, and SQL from Python scripts, Jupyter notebooks, and dashboards without vendor lock-in.
Key features#
- Immediate: Design and execute experiments with a few lines of code, stream your metrics.
- Collaborative: Backup, merge, share, and reload experiments with their computation state anywhere.
- Interoperable: Access your experiments with Python, Pandas, and SQL with native database types and open formats - no vendor lock-in.
- Flexible: Track native Python data types and structures, as well as NumPy, Pandas, and PyArrow objects.
- Lightweight: Thin layer with minimal dependencies that can run anywhere and complement other components/services.
Design choices#
-
Computation: The chained execution of
steps
is implemented with joblib.Parallel using process-based parallelism. The cluster-specific backends of Dask, Ray, Spark, and custom ones can be used. Thestep
functions andrun
objects must be serializable withcloudpickle
. You can directly handle the evaluation of your runs withoutjoblib
, with less automation and more flexibility. -
Persistence: The default database is SQLite, and its limits do apply. You can connect to any SQL database supported by
SQLAlchemy
. Database persistence supports a wide range of types, includingbool
,int
,float
,string
,UUID.uuid
,bytes
,dict
,list
,tuple
,set
,NumPy
,Pandas
andPyArrow
objects. The Data store interface is designed to handle out-of-database large objects. Compression is available and disabled by default.
Requirements#
- Python 3.9+
- SQLAlchemy 2.0+, Pandas 1.5.3+, and Joblib 1.3.2+ (installed as dependencies)
Installation#
To install MLtraq:
How to integrate MLtraq it in your projects?
MLtraq is progressing rapidly and interfaces might change at any time. Pin its exact version for your project, to make sure it all works. Have tests for your project, and update it once you verify that things work correctly.
Example 1: Define, execute and query an experiment with SQL#
Define, execute and query an experiment with SQL
Example 2: Parameter grids, parallel and resumed execution#
Parameter grids, parallel and resumed execution
Example 3: IRIS Flowers Classification#
IRIS Flowers Classification
License#
This project is licensed under the terms of the BSD 3-Clause License.
Latest update: 2024-10-24
using mltraq==0.1.156