Options management#
MLtraq manages global preferences with the object mltraq.options
, whose class follows the Singleton pattern and is transparently replicated in read-only mode to other processes that handle the execution of runs
with step
functions.
Options are organized in a tree-like structure with values at their leaves and indexed by dot-separated strings. The data is stored as a nested Python dictionary. If the query string does not reach a leaf, a dictionary is returned, with the matching sub-tree.
The options tree#
Diagram of the options tree, query strings and returned values
flowchart LR
X(((Options)))
X --> xa(a)
xa --> ab(b) --> abc(c) --> v1[12]
xa --> ad(d) --> v2['hello']
X --> xe(e) --> v3["#123;'k':3#125;"]
X --> xf(f)
xf(f) --> xfg(g) --> v4[46]
xf(f) --> xfh(h) --> v5[Object]
style X stroke-width:3px
style v1 stroke-width:3px
style v2 stroke-width:3px
style v3 stroke-width:3px
style v4 stroke-width:3px
style v5 stroke-width:3px
options.get("a.b.c")
returns the int12
options.get("a.d")
returns the string'hello'
options.get("e")
returns the dictionary{'k':3}
options.get("f")
returns the dictionary{'g': 46, 'h': Object}
Context manager#
You can use the context manager options.ctx
to temporarily modify the configuration.
Using the context manager with options
Nesting options#
You can define a new group of options extending the class BaseOptions
defined in mltraq.utils.base_options
and requesting its singleton instance. Options are stored in the .options
attribute and can be nested to other existing option groups, as we demonstrate in the following example.
Nesting options
Default options#
Overview#
Listing the default values of the options. The generation of the documentation (which relies on the options.ctx
context manager) alters two values:
"tqdm.disable"
is set toFalse
to improve readability in the docs."sequential_uuids"
is set toTrue
to avoid random UUIDs in the documentation.
Default option values
{'app': {},
'archivestore': {'format': 1,
'mode': 'x',
'relative_path_prefix': 'undefined',
'url': 'file:///mltraq.archivestore'},
'bunchstore': {'pathname': 'bunchstore.data'},
'cli': {'logging': {'format': '%(levelname)-9s '
'%(asctime)s %(message)s',
'level': 'INFO'},
'tabulate': {'maxcolwidths': 70}},
'codelog': {'disable': True, 'field_name': 'codelog'},
'database': {'ask_password': False,
'echo': False,
'experiment_tableprefix': 'experiment_',
'experiments_tablename': 'experiments',
'pool_pre_ping': True,
'query_read_chunk_size': 1000,
'query_write_chunk_size': 1000,
'url': 'sqlite:///:memory:'},
'datastore': {'relative_path_prefix': 'undefined',
'url': 'file:///mltraq.datastore'},
'datastream': {'cli_address': 'mltraq.sock',
'cli_throttle_send': 0.001,
'disable': True,
'kind': 'UNIX',
'srv_address': 'mltraq.sock',
'srv_throttle_persist': 1,
'srv_throttle_recv': 0.0001},
'execution': {'args_field': False,
'backend': 'loky',
'backend_params': {},
'exceptions': {'compact_message': False,
'report_basenames': False},
'loky_chdir': True,
'n_jobs': -1,
'return_as': 'list'},
'reproducibility': {'random_seed': 123,
'sequential_uuids': True},
'serialization': {'compression': {'codec': 'uncompressed'},
'serializer': 'DataPakSerializer',
'store_unsafe_pickle': False},
'sysmon': {'disable': True,
'field_name': 'sysmon',
'interval': 1,
'path': '/',
'percpu': False},
'tqdm': {'delay': 0.5, 'disable': True, 'leave': False}}
Reference documentation#
-
The prefix
"app.*"
is reserved for the application, is empty by default, and can be used by the application to customize the behaviour of steps. -
Options
"database.*"
control the behaviour of the connection to the database, chunking, and table names/prefixes.- I/O operations are chunked by number of rows,
"database.query_read_chunk_size"
and"database.query_write_chunk_size"
, to implement progress bar reporting. - If
"tqdm.disable"
is set toFalse
, there is no chunking. - If
"database.ask_password"
is set to True, the password of the connection string is requested interactively. "database.echo"
,"database.pool_pre_ping"
,"database.url"
are passed to SQLAlchemy."experiments_tablename"
defines the table name used to index the experiments and their meta data."experiment_tableprefix"
is the table prefix used for individual experiment tables.
- I/O operations are chunked by number of rows,
-
Options
"execution.*"
cover how experiments (and their runs) are executed.- If
"execution.exceptions.compact_message"
is set to true, exceptions raised within runs are reported with a compact, friendly format. It might hide useful context to debug errors, so it's False by default.
- If
-
Options
"reproducibility.*"
handle outputs can be reproduced accurately.- The random seed of the Python
random
andnumpy
packages resets to"reproducibility.random_seed"
before executing runs, ensuring reproducibility. - If
"reproducibility.sequential_uuids"
is set to True, there is no randomness for UUIDs generated for experiments and run IDs, simplifying tests and avoiding unnecessary changes in the documentation.
- The random seed of the Python
-
Options
"serialization.*
set defaults on compression and storage of experiments. -
Options
"tqdm.*
" are parameters passed totqdm
to render the progress bars used in the evaluation ofruns
and SQL queries. -
Options
"datastore.*"
define how objects are serialized outside the database. E.g., the filesystem."datastore.url"
defines the storage location. Three slashes indicate a relative path."datastore.relative_path_prefix"
is appended to"datastore.url"
and defines the relative directory that should be used to store the file(s).DataStore
objects manage it transparently, temporarily setting it to the experiment ID.
-
Options
"datastream.*"
handle all things streaming."datastream.cli_address"
: Address the client sends the messages to."datastream.cli_throttle_send"
: Delay (s) introduced after each sent message."datastream.kind"
: Type of socket, either"UNIX"
or"INET"
."datastream.srv_address"
: Address to listen to."datastream.srv_throttle_recv"
: Delay (s) introduced after each received message."datastream.srv_throttle_persist"
: Delay (s) introduced after persists to database.