-
Notifications
You must be signed in to change notification settings - Fork 491
Extracting with Kaldi
Andy T. Liu edited this page Jun 17, 2020
·
3 revisions
- Install Kaldi
- As suggested during the installation, do not forget to add the path of the Kaldi binaries into $HOME/.bashrc. For instance, make sure that .bashrc contains the following paths:
export KALDI_ROOT=/home/mirco/kaldi
PATH=$PATH:$KALDI_ROOT/tools/openfst
PATH=$PATH:$KALDI_ROOT/src/featbin
PATH=$PATH:$KALDI_ROOT/src/gmmbin
PATH=$PATH:$KALDI_ROOT/src/bin
PATH=$PATH:$KALDI_ROOT/src/nnetbin
export PATH
- Then reload
.bashrc
with the command:source ~/.bashrc
- Remember to change the KALDI_ROOT variable using your path. To get your path,
cd
to the Kaldi directory and use the command:pwd
. - As a first test to check the installation, open a bash shell, type
copy-feats
orhmm-info
and make sure no errors appear.
- Dump the codes in src/kaldi_egs_librispeech_s5/ to your
$KALDI_ROOT/egs/librispeech/s5/
. - If running on a single machine, change the following lines in
$KALDI_ROOT/egs/librispeech/s5/cmd.sh
and replacequeue.pl
torun.pl
.- Change the lines to:
export train_cmd="run.pl --mem 2G"
export decode_cmd="run.pl --mem 4G"
export mkgraph_cmd="run.pl --mem 8G"
- Change the
data
path inrun.sh
to your LibriSpeech data path, the directoryLibriSpeech/
should be under that path. For example:
data=/media/andi611/1TBSSD
- make sure that
flac
is installed if you are using a Linux machine:
sudo apt-get install flac
- Run the Kaldi recipe
run.sh
for librispeech at least until Stage 13 (included):
./run.sh
- Copy
exp/tri4b/trans.*
files intoexp/tri4b/decode_tgsmall_train_clean_*/
mkdir exp/tri4b/decode_tgsmall_train_clean_100 && cp exp/tri4b/trans.* exp/tri4b/decode_tgsmall_train_clean_100/
- Compute the fmllr features by running the following script.
./compute_fmllr.sh
- Compute alignments using:
# aligments on dev_clean and test_clean
steps/align_fmllr.sh --nj 10 data/dev_clean data/lang exp/tri4b exp/tri4b_ali_dev_clean
steps/align_fmllr.sh --nj 10 data/test_clean data/lang exp/tri4b exp/tri4b_ali_test_clean
steps/align_fmllr.sh --nj 30 data/train_clean_100 data/lang exp/tri4b exp/tri4b_ali_clean_100
steps/align_fmllr.sh --nj 30 data/train_clean_360 data/lang exp/tri4b exp/tri4b_ali_clean_360
steps/align_fmllr.sh --nj 30 data/train_other_500 data/lang exp/tri4b exp/tri4b_ali_other_500
To pre-train models with the above generated Kaldi fMLLR features, run the following command:
- Apply cmvn and dump the fmllr features to new .ark files:
./dump_fmllr_cmvn.sh
./dump_mfcc_cmvn.sh
./dump_fbank_cmvn.sh # this requires a second run of stage 6 in `/run.sh`, see the comments in stage 6.
- Use the python script to convert kaldi generated .ark featrues to .npy for our S3PRL dataloader, modify the path and settings in the script then run:
cd Self-Supervised-Speech-Pretraining-and-Representation-Learning/preprocess/
python3 ark2libri.py # DATA_TYPE = 'fmllr' by default, this can be either 'mfcc', 'fbank', or 'fmllr'
- In order to pre-train on fMLLR features, change the
data_path
argument in the config filesconfig/*.yaml
to the following:
data_path: 'data/libri_fmllr_cmvn' # or 'data/libri_mfcc_cmvn', 'data/libri_fbank_cmvn'
- Modify the
train_set
argument in the config filesconfig/*.yaml
to train on different subsets:
train_set: ['train-other-500', 'train-clean-360', 'train-clean-100']
- Download the TIMIT dataset from the LDC website.
- Dump the codes in src/kaldi_egs_timit_s5/ to your
$KALDI_ROOT/egs/timit/s5/
. - Run the Kaldi s5 baseline of TIMIT (Remember to modify the paths in
run.sh
to your own):
cd kaldi/egs/timit/s5
./run.sh
./local/nnet/run_dnn.sh
- Compute the alignments (i.e, the phone-state labels) with the following commands:
steps/nnet/align.sh --nj 4 data-fmllr-tri3/train data/lang exp/dnn4_pretrain-dbn_dnn exp/dnn4_pretrain-dbn_dnn_ali
steps/nnet/align.sh --nj 4 data-fmllr-tri3/dev data/lang exp/dnn4_pretrain-dbn_dnn exp/dnn4_pretrain-dbn_dnn_ali_dev
steps/nnet/align.sh --nj 4 data-fmllr-tri3/test data/lang exp/dnn4_pretrain-dbn_dnn exp/dnn4_pretrain-dbn_dnn_ali_test
To pre-train models with the above generated Kaldi fMLLR features, run the following command:
- Compute cmvn on fmllr features by running the following script.
./dump_fmllr_cmvn.sh
- Use the provided python script to convert kaldi generated .ark featrues to .npy for our S3PRL dataloader, change the path settings in the script and run:
cd Self-Supervised-Speech-Pretraining-and-Representation-Learning/preprocess/
python ark2timit.py
- In order to pre-train on fMLLR features, change the
data_path
argument in the config filesconfig/*.yaml
to the following:
data_path: 'data/timit_fmllr_cmvn'
- Modify the
train_set
argument in the config filesconfig/*.yaml
to train with TIMIT:
train_set: ['train']