Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NMT] Add support for multiple validation and test dataloaders #2113

Merged
merged 19 commits into from
Apr 27, 2021

Conversation

ericharper
Copy link
Collaborator

@ericharper ericharper commented Apr 24, 2021

Specify multiple validation (or test) dataloaders by inputting a list of src (tgt) files.

Usage

Replace src_file_name and tgt_file_name with a list of file paths.

    model.validation_ds.src_file_name=/data/wmt14-en-de.src \
    model.validation_ds.tgt_file_name=/data/wmt14-en-de.ref \

Becomes:

    model.validation_ds.src_file_name=[/data/wmt13-en-de.src,/data/wmt14-en-de.src] \
    model.validation_ds.tgt_file_name=[/data/wmt13-en-de.ref,/data/wmt14-en-de.ref] \

When using val_loss or val_sacreBLEU for the exp_manager.checkpoint_callback_params.monitor then the 0th indexed dataloader will be used as the monitor. This ensures backwards compatibility.

Additionally, any validation dataloader can now be used as a monitor by appending the index as below:

    exp_manager.checkpoint_callback_params.monitor=val_sacreBLEU_dl_index_1

Multiple test datasets works exactly the same way as validation datasets, simply replace validation_ds by test_ds in the above examples.

Copy link
Member

@okuchaiev okuchaiev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, let's wait a little to see what PTL teams says

Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
@ericharper ericharper marked this pull request as ready for review April 27, 2021 05:01
@ericharper ericharper merged commit dddf87e into main Apr 27, 2021
@ericharper ericharper deleted the nmt_multi_val branch April 27, 2021 16:25
karpnv pushed a commit to karpnv/NeMo that referenced this pull request May 21, 2021
…A#2113)

* add multiple validation dataloaders

Signed-off-by: ericharper <[email protected]>

* loop over outputs in eval_epoch_end

Signed-off-by: ericharper <[email protected]>

* looping over val dataloaders in eval epoch end

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* add sync_dist

Signed-off-by: ericharper <[email protected]>

* rearrange name

Signed-off-by: ericharper <[email protected]>

* compute eval loss for each data loader

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* clean up

Signed-off-by: ericharper <[email protected]>

* normalize one val dataset outputs

Signed-off-by: ericharper <[email protected]>

* add jenkins test for multi-val

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add testing to jenkins tests

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Карпов Николай Вячеславович <[email protected]>
michalivne pushed a commit to michalivne/NeMo that referenced this pull request Jun 23, 2021
…A#2113)

* add multiple validation dataloaders

Signed-off-by: ericharper <[email protected]>

* loop over outputs in eval_epoch_end

Signed-off-by: ericharper <[email protected]>

* looping over val dataloaders in eval epoch end

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* add sync_dist

Signed-off-by: ericharper <[email protected]>

* rearrange name

Signed-off-by: ericharper <[email protected]>

* compute eval loss for each data loader

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* clean up

Signed-off-by: ericharper <[email protected]>

* normalize one val dataset outputs

Signed-off-by: ericharper <[email protected]>

* add jenkins test for multi-val

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add testing to jenkins tests

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
…A#2113)

* add multiple validation dataloaders

Signed-off-by: ericharper <[email protected]>

* loop over outputs in eval_epoch_end

Signed-off-by: ericharper <[email protected]>

* looping over val dataloaders in eval epoch end

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* add sync_dist

Signed-off-by: ericharper <[email protected]>

* rearrange name

Signed-off-by: ericharper <[email protected]>

* compute eval loss for each data loader

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* set loss attr for each val dl

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* default to dl index 0

Signed-off-by: ericharper <[email protected]>

* clean up

Signed-off-by: ericharper <[email protected]>

* normalize one val dataset outputs

Signed-off-by: ericharper <[email protected]>

* add jenkins test for multi-val

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add multi-test

Signed-off-by: ericharper <[email protected]>

* add testing to jenkins tests

Signed-off-by: ericharper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants