Skip to content


Archive files


The ArchiveStore interface lets you manage TAR archives transparently. In the example below, the directory datasets is archived and stored as a regular DataStore asset. Persisting the experiment triggers the creation of the archive. Upon loading the experiment, the archive is extracted in the directory mltraq.archivestore, organized similarly to mltraq.datastore by experiment ID.


Persisting an experiment is equivalent to removing and saving it, triggering the deletion and recreation of its associated datastore assets, including its archives. You can implement different behaviors with ArchiveStoreIO and DataStoreIO.

ArchiveStore example

from os import mkdir

import pandas as pd

from mltraq import create_session
from import ArchiveStore
from mltraq.utils.fs import glob, tmpdir_ctx

with tmpdir_ctx():
    # Work in a temporary directory

    # Create a directory with two files
    pd.Series([1, 2, 3]).to_csv("datasets/first.csv")
    pd.Series([4, 5, 6]).to_csv("datasets/second.csv")

    # Create an experiment
    s = create_session()
    e = s.create_experiment("test")

    # Define an archive (no tar file created!)
    e.fields.archived = ArchiveStore(src_dir="datasets", arc_dir="e")

    # Persist the experiment, creating the tar file

    # Load the experiment, unarchiving the tar file
    e = s.load_experiment("test")

    print(f"Destination directory: '{e.fields.archived.get_target()}'")

    # Print contents of current directory
    print("Contents of current directory:")
    for idx, name in enumerate(glob("**", root_dir=".", recursive=True)):
        print(f"[{idx:2d}] {name}")
Destination directory: 'mltraq.archivestore/d65df69e-1175-44a5-be2f-2232765703b8'
Contents of current directory:
[ 0] datasets
[ 1] datasets/first.csv
[ 2] datasets/second.csv
[ 3] mltraq.archivestore
[ 4] mltraq.archivestore/d65df69e-1175-44a5-be2f-2232765703b8
[ 5] mltraq.archivestore/d65df69e-1175-44a5-be2f-2232765703b8/e
[ 6] mltraq.archivestore/d65df69e-1175-44a5-be2f-2232765703b8/e/first.csv
[ 7] mltraq.archivestore/d65df69e-1175-44a5-be2f-2232765703b8/e/second.csv
[ 8] mltraq.datastore
[ 9] mltraq.datastore/d65df69e-1175-44a5-be2f-2232765703b8
[10] mltraq.datastore/d65df69e-1175-44a5-be2f-2232765703b8/d65df69e117544a5be2f2232765703ba


The class ArchiveStoreIO provides a lower-level interface to manage archives, bypassing the organization by experiment IDs. Its implementation relies on the glob and tarfile modules from the standard library. You can pass patterns to include or exclude and optionally include hidden files.

ArchiveStoreIO example

from os import mkdir

import pandas as pd

from mltraq.opts import options
from import ArchiveStoreIO
from mltraq.utils.fs import glob, tmpdir_ctx

with tmpdir_ctx():
    # Work in a temporary directory

    # Create a directory with two files
    pd.Series([1, 2, 3]).to_csv("datasets/first.csv")
    pd.Series([4, 5, 6]).to_csv("datasets/second.csv")

    with options().ctx(
            "datastore.relative_path_prefix": "archives",
            "archivestore.relative_path_prefix": "all",
        # Create an archive and extract it
        archive = ArchiveStoreIO.create(
            src_dir="datasets", arc_dir="assets"

    # Print contents of current directory
    print("Contents of current directory:")
    for idx, name in enumerate(glob("**", root_dir=".", recursive=True)):
        print(f"[{idx:2d}] {name}")
Contents of current directory:
[ 0] datasets
[ 1] datasets/first.csv
[ 2] datasets/second.csv
[ 3] mltraq.archivestore
[ 4] mltraq.archivestore/all
[ 5] mltraq.archivestore/all/assets
[ 6] mltraq.archivestore/all/assets/first.csv
[ 7] mltraq.archivestore/all/assets/second.csv
[ 8] mltraq.datastore
[ 9] mltraq.datastore/archives
[10] mltraq.datastore/archives/d65df69e117544a5be2f2232765703b8