环境: Ubuntu 12.04, Kaldi
在训练timit语音库已经运行到“MMI + SGMM2 Training & Decoding”,由于是在虚拟机上的ubuntu,且硬件配置一般,再往向训练DNN模型的发现需要花很长很长时间,因此就止步在那。 想使用训练的模型来做在线解码 (http://blog.itpub.net/16582684/viewspace-1270816/),发现却无法使用(timit训练数据中的wav文件是sphere格式,而voxforge的wav是可以播放),因而转向对voxforge语音库的训练。由于voxforge语音库是开源的,不像timit有版权限制,同时其训练的模型也能支持在线解码,所以对这个语音库来进行训练。
步骤:
1. 安装mitlm, g2p依赖的库
sudo apt-get install flac
sudo apt-get install swig
2. 切换到/u01/kaldi/egs/voxforge/s5,脚本local/voxforge_prepare_lm.sh有安装mitlm的步骤,但发现无法从http://mitlm.googlecode.com/svn/trunk/地址上svn checkout下源码,只好从https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz下载源码,放到tools下,解压后更名为mitlm-svn, 注释掉脚本local/voxforge_prepare_lm.sh中”svn checkout -r103 http://mitlm.googlecode.com/svn/trunk/ tools/mitlm-svn“
3. 修改脚本getdata.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项,运行脚本./getdata.sh下载并解压数据,由于下载慢同时机器配置一般,只下载100M左右数据并解压
4. 修改脚本run.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项 ,由于数据量比较小,还有几项修改如下:
nspk_test=7
utils/subset_data_dir.sh data/train 15 data/train.1k || exit 1;
5. 运行脚本./run,风扇狂响,CPU使用率直接飙到100%,大概五个小时。运行到”# Do MMI on top of LDA+MLLT.“,输出如下:
=== Starting VoxForge subset selection(accent: ((American)|(British)|(Australia)|(Zealand))) …
*** VoxForge subset selection finished!
=== Starting to map anonymous users to unique IDs …
— Mapping the “anonymous” speakers to unique IDs …
ls: cannot access /u01/kaldi/egs/voxforge/s5/data/selected/anonymous-*-*: No such file or directory
*** Finished mapping anonymous users!
=== Starting initial VoxForge data preparation …
— Making test/train data split …
17 data/local/tmp/speakers_all.txt
10 data/local/tmp/speakers_train.txt
7 data/local/tmp/speakers_test.txt
17 total
— Preparing test_wav.scp, test_trans.txt and test.utt2spk …
— Preparing test.spk2utt …
— Preparing train_wav.scp, train_trans.txt and train.utt2spk …
steps/decode.sh –config conf/decode.config –iter 3 –nj 2 –cmd run.pl exp/tri 2b/graph data/test exp/tri2b_mmi/decode_it3
decode.sh: feature type is lda
exp/tri2b_mmi/decode_it3/wer_10
%WER 97.59 [ 1657 / 1698, 29 ins, 649 del, 979 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_11
%WER 97.17 [ 1650 / 1698, 22 ins, 713 del, 915 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_12
%WER 96.76 [ 1643 / 1698, 15 ins, 787 del, 841 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_13
%WER 96.41 [ 1637 / 1698, 15 ins, 837 del, 785 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_14
%WER 96.64 [ 1641 / 1698, 11 ins, 888 del, 742 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_15
%WER 96.82 [ 1644 / 1698, 7 ins, 930 del, 707 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_16
%WER 97.06 [ 1648 / 1698, 7 ins, 967 del, 674 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_17
%WER 97.17 [ 1650 / 1698, 9 ins, 997 del, 644 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_18
%WER 97.17 [ 1650 / 1698, 9 ins, 1013 del, 628 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_19
%WER 97.41 [ 1654 / 1698, 9 ins, 1027 del, 618 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_20
%WER 97.17 [ 1650 / 1698, 9 ins, 1037 del, 604 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_9
%WER 98.00 [ 1664 / 1698, 35 ins, 582 del, 1047 sub ]
%SER 100.00 [ 180 / 180 ]
6. 将 /u01/kaldi/egs/voxforge/s5/exp/tri2b/graph拷贝到/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,切换至/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,
在线解码,执行如下
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’ ark,t:trans.txt ark,t:ali.txt final.mat
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt final.mat
File: AT-20130718-lws-a0011
FROM EXPLAINED INCIDENTAL ACCIDENTAL AND FROM SHE
File: Aaron-20080318-pwn-a0265
DISGUSTED THE MANIFESTED THERE
File: Aaron-20080318-pwn-a0266
THERE WAS PASSIONATELY IT WAS THERE
File: AdrianMcNear-20091016-psv-a0573
IT IS GOING TO YOU MY WEEKS TO SUGGESTED PC THAT FOR SHUDDERED
至此,整个流程都走通。
结论: 总共才100M的语音文件,训练时间之长。 当然跟硬件环境有关系。但整个voxforge语音库有20G左右,如果真的全部来训练的话,不知要多久才能跑完,看看有谁跑完的话告知下运行时间。