You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: When metrics are logged using only epoch parameter, step value is chosen incrementally. When this happens out of order (for example: asynchronous evaluation on a batch system), displaying them in an epoch/value graph will connect the lines incorrectly. This is because step is used to determine the order the data points are connected in.
To reproduce
Pseudo:
run.track(float(train_loss), name='train_loss', epoch=1)
run.track(float(eval_loss), name='eval_loss', epoch=1)
run.track(float(train_loss), name='train_loss', epoch=2)
run.track(float(train_loss), name='train_loss', epoch=3)
run.track(float(train_loss), name='train_loss', epoch=4)
run.track(float(eval_loss), name='eval_loss', epoch=3) # Out of order due to scheduling
run.track(float(eval_loss), name='eval_loss', epoch=2) # Out of order due to scheduling
In my specific setting, eval_loss is calculated by a seperate process and saved to disk. Periodically, the main process (also running the training and the aim logger picks up the value from disk and logs it)
Expected behavior
Connection of data points in the graph is dictated by whatever is selected to be on the x axis.
Environment
Aim Version: v3.27.0
Python version: latest
pip version: latest
OS: Linux
Additional context
Workaround: Always also calculate and log current step value:
🐛 Bug
Summary: When metrics are logged using only epoch parameter, step value is chosen incrementally. When this happens out of order (for example: asynchronous evaluation on a batch system), displaying them in an epoch/value graph will connect the lines incorrectly. This is because step is used to determine the order the data points are connected in.
To reproduce
Pseudo:
In my specific setting, eval_loss is calculated by a seperate process and saved to disk. Periodically, the main process (also running the training and the aim logger picks up the value from disk and logs it)
Expected behavior
Connection of data points in the graph is dictated by whatever is selected to be on the x axis.
Environment
Additional context
Workaround: Always also calculate and log current step value:
run.track(float(eval_loss), name='eval_loss', epoch=epoch, step=len(train_dataloader) * epoch)
The text was updated successfully, but these errors were encountered: