I am trying to train the FastConformer 120M model from scratch, but it is not converging? #12167

PhamDangNguyen · 2025-02-13T03:39:33Z

I am trying to train the FastConformer 120M model from scratch, but it is not converging. This is my config base:

GPU: 40G
Batch_size = 32
amount of data: 5000 hours (4.000.000 data points).
'''
name: "FastConformer-CTC-BPE"

model:
sample_rate: 16000
log_prediction: true # enables logging sample predictions in the output during training
ctc_reduction: 'mean_volume'
skip_nan_grad: false

train_ds:
manifest_filepath: /home/team_voice/STT_pdnguyen/finetune-fast-conformer/metadata_train/train_remove_func_18.json
sample_rate: ${model.sample_rate}
batch_size: 40 # you may increase batch_size if your memory allows
shuffle: true
num_workers: 32
pin_memory: true
max_duration: 21 # it is set for LibriSpeech, you may need to update it for your dataset
min_duration: 0.3
# tarred datasets
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
# bucketing params
bucketing_strategy: "fully_randomized"
bucketing_batch_size: null

validation_ds:
manifest_filepath: /home/team_voice/STT_pdnguyen/finetune-fast-conformer/metadata_train/test_remove_func.json
sample_rate: ${model.sample_rate}
batch_size: 16 # you may increase batch_size if your memory allows
shuffle: false
use_start_end_token: false
num_workers: 16
pin_memory: true

test_ds:
manifest_filepath: /home/team_voice/STT_pdnguyen/finetune-fast-conformer/metadata_train/test_remove_func.json
sample_rate: ${model.sample_rate}
batch_size: 16 # you may increase batch_size if your memory allows
shuffle: false
use_start_end_token: false
num_workers: 16
pin_memory: true

tokenizer:
dir: /home/team_voice/STT_pdnguyen/finetune-fast-conformer/dict_N/tokenizer_spe_bpe_v3072 # path to directory which contains either tokenizer.model (bpe) or vocab.txt (wpe)
type: bpe # Can be either bpe (SentencePiece tokenizer) or wpe (WordPiece tokenizer)

preprocessor:
target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
sample_rate: 16000
normalize: "per_feature"
window_size: 0.025
window_stride: 0.01
window: "hann"
features: 80
n_fft: 512
log: true
frame_splicing: 1
dither: 0.00001
pad_to: 0
pad_value: 0.0

spec_augment:
target: nemo.collections.asr.modules.SpectrogramAugmentation
freq_masks: 2 # set to zero to disable it
time_masks: 0
freq_width: 27
time_width: 0.05

encoder:
target: nemo.collections.asr.modules.ConformerEncoder
feat_in: ${model.preprocessor.features}
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 19
d_model: 512

subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
subsampling_factor: 8 # must be power of 2 for striding and vggnet
subsampling_conv_channels: 256 # -1 sets it to d_model
causal_downsampling: false


ff_expansion_factor: 4


self_attention_model: rel_pos # rel_pos or abs_pos
n_heads: 8 # may need to be lower for smaller d_models
att_context_size: [-1, -1] # -1 means unlimited context
att_context_style: regular # regular or chunked_limited
xscaling: true # scales up the input embeddings by sqrt(d_model)
untie_biases: true # unties the biases of the TransformerXL layers
pos_emb_max_len: 5000

conv_kernel_size: 9
conv_norm_type: 'batch_norm' # batch_norm or layer_norm or groupnormN (N specifies the number of groups)

conv_context_size[0]+conv_context_size[1]+1==conv_kernel_size
conv_context_size: null

dropout: 0.1 # The dropout used in most of the Conformer Modules
dropout_pre_encoder: 0.1 # The dropout used before the encoder
dropout_emb: 0.0 # The dropout used for embeddings
dropout_att: 0.1 # The dropout for multi-headed attention modules

stochastic_depth_drop_prob: 0.0
stochastic_depth_mode: linear  # linear or uniform
stochastic_depth_start_layer: 1

decoder:
target: nemo.collections.asr.modules.ConvASRDecoder
feat_in: null
num_classes: -1
vocabulary: []

interctc:
loss_weights: []
apply_at_layers: []

optim:
name: adamw
lr: 1e-3
betas: [0.9, 0.98]
weight_decay: 1e-3

sched:
  name: CosineAnnealing
  warmup_ratio: 0.1
  min_lr: 5e-4

trainer:
devices: 3 # number of GPUs, -1 would use all available GPUs
num_nodes: 1
max_epochs: 1000
max_steps: -1 # computed at runtime if not set
val_check_interval: 1.0 # Set to 0.25 to check 4 times per epoch, or an int for number of iterations
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32 # 16, 32, or bf16
log_every_n_steps: 10 # Interval of logging.
enable_progress_bar: True
num_sanity_val_steps: 0 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
check_val_every_n_epoch: 1 # number of evaluations on validation every n epochs
sync_batchnorm: true
enable_checkpointing: False # Provided by exp_manager
logger: false # Provided by exp_manager
benchmark: false # needs to be false for models with variable-length speech input as it slows down training

exp_manager:
exp_dir: null
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: "val_wer"
mode: "min"
save_top_k: 40
always_save_nemo: True # saves the checkpoints as nemo files instead of PTL checkpoints

resume_from_checkpoint: /home/team_voice/STT_pdnguyen/Backup_version_train_L40S/fast_conformer_ja_larg/25_12_2/nemo_experiments/FastConformer-CTC-BPE/2025-02-11_08-17-59/checkpoints/FastConformer-CTC-BPE--val_wer=1.0000-epoch=3.ckpt
resume_if_exists: true
resume_ignore_no_checkpoint: true

create_wandb_logger: false
wandb_logger_kwargs:
name: null
project: null
'''

How can I change my config to make it converge? Thanks for any help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am trying to train the FastConformer 120M model from scratch, but it is not converging? #12167

I am trying to train the FastConformer 120M model from scratch, but it is not converging? #12167

PhamDangNguyen commented Feb 13, 2025 •

edited

Loading

I am trying to train the FastConformer 120M model from scratch, but it is not converging? #12167

I am trying to train the FastConformer 120M model from scratch, but it is not converging? #12167

Comments

PhamDangNguyen commented Feb 13, 2025 • edited Loading

PhamDangNguyen commented Feb 13, 2025 •

edited

Loading