-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ConcatDataset to common #2237
Conversation
Signed-off-by: Abhinav Khattar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic, just could use a bit of refactoring of self.N
self.index_generator = ConcatDataset.round_robin_generator | ||
else: | ||
raise ValueError(f"Currently we only support sampling techniques in {supported_sampling_techniques}.") | ||
self.N = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this variable?
if self.kind == 'map': | ||
self.N += len(dataset) // world_size | ||
else: | ||
self.N += len(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give this variable a useful name rather than N
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, changed it to length
Signed-off-by: Abhinav Khattar <[email protected]>
* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]>
* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]>
* move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]>
* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …
* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …
* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …
* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …
Move
ConcatDataset
(earlierConcatTranslationDataset
) to common.The dataset was introduced in #2160