Logging the code of steps
You can log the code and the parameters of the executed steps
by turning on the codelog
feature.
In the example below, we execute and log the code, location, and runtime parameters of the step function factory
init_fields
.
Logging the code of steps
Step 0: ../../../../src/mltraq/steps/chdir.py:8 @ step_chdir(*[], **{'path': '/Users/michele/dev/mltraq/mkdocs/blog/posts/004'})
--
def step_chdir(run: Run, path: Optional[str] = None):
"""
Change current directory to `path`.
"""
if path:
os.chdir(path)
--
Step 1: ../../../../src/mltraq/steps/init_fields.py:6 @ step_init_fields(*[], **{'a': 1})
--
def step_init_fields(run: Run, **fields):
"""
Initialize fields in the run.
"""
if fields is None:
fields = {}
for name, value in fields.items():
run.fields[name] = value
--
Why are two steps being logged, and not just one?
The MLtraq
executor implicitly calls the function step_chdir
at step #0
if the joblib backend is loky
, to ensure
that the steps running in the pool worker processes have their current directory aligned with the primary process.
This behavior is managed by the option "execution.loky_chdir"
.
The pool of worker processes is reused if multiple executions are triggered closely in time to increase efficiency (creating new processes is an expensive task for the operating system), but this means that they leak, and have the memory of, previously executed jobs.
Besides being a security issue, this can also break your experiments if you use chdir
in your steps.
The implicit step #0
resolves this issue.