Move ConcatDataset to common #2237

aklife97 · 2021-05-19T18:57:48Z

Move ConcatDataset (earlier ConcatTranslationDataset) to common.
The dataset was introduced in #2160

Signed-off-by: Abhinav Khattar <[email protected]>

titu1994

Looks fantastic, just could use a bit of refactoring of self.N

titu1994 · 2021-05-19T20:16:25Z

nemo/collections/common/data/dataset.py

+            self.index_generator = ConcatDataset.round_robin_generator
+        else:
+            raise ValueError(f"Currently we only support sampling techniques in {supported_sampling_techniques}.")
+        self.N = 0


What is this variable?

titu1994 · 2021-05-19T20:17:34Z

nemo/collections/common/data/dataset.py

+            if self.kind == 'map':
+                self.N += len(dataset) // world_size
+            else:
+                self.N += len(dataset)


Can you give this variable a useful name rather than N

Done, changed it to length

Signed-off-by: Abhinav Khattar <[email protected]>

* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]>

* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]>

* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …

move concatdataset to common

5c2a20c

Signed-off-by: Abhinav Khattar <[email protected]>

aklife97 requested review from titu1994 and okuchaiev May 19, 2021 18:58

titu1994 approved these changes May 19, 2021

View reviewed changes

var name change

a00c298

Signed-off-by: Abhinav Khattar <[email protected]>

aklife97 merged commit 25014e0 into main May 19, 2021

aklife97 deleted the concatdataset branch May 20, 2021 15:57

karpnv pushed a commit to karpnv/NeMo that referenced this pull request May 21, 2021

Move ConcatDataset to common (NVIDIA#2237)

f5b93d9

* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ConcatDataset to common #2237

Move ConcatDataset to common #2237

aklife97 commented May 19, 2021

titu1994 left a comment

titu1994 May 19, 2021

titu1994 May 19, 2021

aklife97 May 19, 2021

Move ConcatDataset to common #2237

Move ConcatDataset to common #2237

Conversation

aklife97 commented May 19, 2021

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 May 19, 2021

Choose a reason for hiding this comment

titu1994 May 19, 2021

Choose a reason for hiding this comment

aklife97 May 19, 2021

Choose a reason for hiding this comment