Skip to content

Blog#

In-memory archive files

Sometimes, serializing the contents of an entire directory as a bytes field in your experiment is a convenient way to share code and other small files across different environments.

The Archive interface simplifies the creation and extraction of in-memory TAR archives. The example below demonstrates how to archive a src directory, extracted to src_archived.

Warning

Anything below 100 MB can easily fit in a field as a binary blob with Archive. We recommend to rely on the DataStore interface to persist and move larger archives.

Archive example

from os import mkdir

from mltraq import create_session
from mltraq.storage.archivestore import Archive
from mltraq.utils.fs import glob, tmpdir_ctx

with tmpdir_ctx():
    # Work in a temporary directory

    # Create a directory with a file
    mkdir("src")
    with open("src/simple_print.py", "w") as f:
        f.write("print(1 + 2)\n")

    # Create an experiment
    s = create_session()
    e = s.create_experiment("test")

    # Create the archive
    e.fields.src = Archive.create(src_dir="src", arc_dir="src_archived")

    # Persist the experiment, including the binary TAR blob
    e.persist()

    # Load the experiment
    e = s.load_experiment("test")

    # Extract the contents of the archive
    e.fields.src.extract()

    # Print contents of current directory
    print("Contents of current directory:")
    for idx, name in enumerate(glob("**", root_dir=".", recursive=True)):
        print(f"[{idx:2d}] {name}")
Output
Contents of current directory:
[ 0] src_archived
[ 1] src_archived/simple_print.py
[ 2] src
[ 3] src/simple_print.py