之前做语音训练的时候只能用英文,一直没找到中文的训练数据。最近才找一份不错的中文语音训练数据。
http://blog.topspeedsnail.com/archives/10696/comment-page-1#comment-1088
量化自我和极简主义的窝藏点
之前做语音训练的时候只能用英文,一直没找到中文的训练数据。最近才找一份不错的中文语音训练数据。
http://blog.topspeedsnail.com/archives/10696/comment-page-1#comment-1088
环境: Ubuntu 12.04
FFmpeg是一个自由软件,可以运行音频和视频多种格式的录影、转换、流功能,包含了libavcodec ─这是一个用于多个项目中音频和视频的解码器库,以及libavformat——一个音频与视频格式转换库。
下载安装: sudo apt-get install ffmpeg, sudo apt-get install libav-tools
avconv : 是一个快速的音频和视频转换器,它也可以从音频视频流中提取。可以转换任意采样率和以高质量的多相过滤器更改视频大小。
转换mp3到wav :
ffmpeg -i Charlottes.Web-001.mp3 -acodec pcm_s16le -ar 16000 out.wav
avconv -i Charlottes.Web-001.mp3 -acodec pcm_s16le out.wav
查看编码方式:
ffmpeg -codecs
avconv -codecs
环境: Ubuntu 12.04, Kaldi
在训练timit语音库已经运行到“MMI + SGMM2 Training & Decoding”,由于是在虚拟机上的ubuntu,且硬件配置一般,再往向训练DNN模型的发现需要花很长很长时间,因此就止步在那。 想使用训练的模型来做在线解码 (http://blog.itpub.net/16582684/viewspace-1270816/),发现却无法使用(timit训练数据中的wav文件是sphere格式,而voxforge的wav是可以播放),因而转向对voxforge语音库的训练。由于voxforge语音库是开源的,不像timit有版权限制,同时其训练的模型也能支持在线解码,所以对这个语音库来进行训练。
步骤:
1. 安装mitlm, g2p依赖的库
sudo apt-get install flac
sudo apt-get install swig
2. 切换到/u01/kaldi/egs/voxforge/s5,脚本local/voxforge_prepare_lm.sh有安装mitlm的步骤,但发现无法从http://mitlm.googlecode.com/svn/trunk/地址上svn checkout下源码,只好从https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz下载源码,放到tools下,解压后更名为mitlm-svn, 注释掉脚本local/voxforge_prepare_lm.sh中”svn checkout -r103 http://mitlm.googlecode.com/svn/trunk/ tools/mitlm-svn“
3. 修改脚本getdata.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项,运行脚本./getdata.sh下载并解压数据,由于下载慢同时机器配置一般,只下载100M左右数据并解压
4. 修改脚本run.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项 ,由于数据量比较小,还有几项修改如下:
nspk_test=7
utils/subset_data_dir.sh data/train 15 data/train.1k || exit 1;
5. 运行脚本./run,风扇狂响,CPU使用率直接飙到100%,大概五个小时。运行到”# Do MMI on top of LDA+MLLT.“,输出如下:
=== Starting VoxForge subset selection(accent: ((American)|(British)|(Australia)|(Zealand))) …
*** VoxForge subset selection finished!
=== Starting to map anonymous users to unique IDs …
— Mapping the “anonymous” speakers to unique IDs …
ls: cannot access /u01/kaldi/egs/voxforge/s5/data/selected/anonymous-*-*: No such file or directory
*** Finished mapping anonymous users!
=== Starting initial VoxForge data preparation …
— Making test/train data split …
17 data/local/tmp/speakers_all.txt
10 data/local/tmp/speakers_train.txt
7 data/local/tmp/speakers_test.txt
17 total
— Preparing test_wav.scp, test_trans.txt and test.utt2spk …
— Preparing test.spk2utt …
— Preparing train_wav.scp, train_trans.txt and train.utt2spk …
steps/decode.sh –config conf/decode.config –iter 3 –nj 2 –cmd run.pl exp/tri 2b/graph data/test exp/tri2b_mmi/decode_it3
decode.sh: feature type is lda
exp/tri2b_mmi/decode_it3/wer_10
%WER 97.59 [ 1657 / 1698, 29 ins, 649 del, 979 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_11
%WER 97.17 [ 1650 / 1698, 22 ins, 713 del, 915 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_12
%WER 96.76 [ 1643 / 1698, 15 ins, 787 del, 841 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_13
%WER 96.41 [ 1637 / 1698, 15 ins, 837 del, 785 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_14
%WER 96.64 [ 1641 / 1698, 11 ins, 888 del, 742 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_15
%WER 96.82 [ 1644 / 1698, 7 ins, 930 del, 707 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_16
%WER 97.06 [ 1648 / 1698, 7 ins, 967 del, 674 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_17
%WER 97.17 [ 1650 / 1698, 9 ins, 997 del, 644 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_18
%WER 97.17 [ 1650 / 1698, 9 ins, 1013 del, 628 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_19
%WER 97.41 [ 1654 / 1698, 9 ins, 1027 del, 618 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_20
%WER 97.17 [ 1650 / 1698, 9 ins, 1037 del, 604 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_9
%WER 98.00 [ 1664 / 1698, 35 ins, 582 del, 1047 sub ]
%SER 100.00 [ 180 / 180 ]
6. 将 /u01/kaldi/egs/voxforge/s5/exp/tri2b/graph拷贝到/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,切换至/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,
在线解码,执行如下
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’ ark,t:trans.txt ark,t:ali.txt final.mat
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt final.mat
File: AT-20130718-lws-a0011
FROM EXPLAINED INCIDENTAL ACCIDENTAL AND FROM SHE
File: Aaron-20080318-pwn-a0265
DISGUSTED THE MANIFESTED THERE
File: Aaron-20080318-pwn-a0266
THERE WAS PASSIONATELY IT WAS THERE
File: AdrianMcNear-20091016-psv-a0573
IT IS GOING TO YOU MY WEEKS TO SUGGESTED PC THAT FOR SHUDDERED
至此,整个流程都走通。
结论: 总共才100M的语音文件,训练时间之长。 当然跟硬件环境有关系。但整个voxforge语音库有20G左右,如果真的全部来训练的话,不知要多久才能跑完,看看有谁跑完的话告知下运行时间。
环境:Ubuntu 12.04, Kaldi
timit训练完语音模型后可以进入解码,
1. 首先安装PortAudio
cd /u01/kaldi/tools/portaudio
./configure
make
sudo make install
2. 编译安装onlinebin
cd /u01/kaldi/src/onlinebin
make
离线解码:
3. 切换到训练好的模型目录/u01/kaldi/egs/timit/s5/exp/tri1,执行命令如下:
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/train/split10/1/wav.scp final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’ ark,t:trans.txt ark,t:ali.txt
结果输出如下:
File: faem0_si1392
sil ax s uw m f ao r ix vcl z ae m cl p el ax s ix cl ch uw ey sh en w er f aa r m hh eh z ax cl p ae cl k ix ng sh eh vcl d ae n vcl d f iy l vcl s sil
File: faem0_si2022
sil
sil
sil w ah dx ow cl t ih cl t ih sh iy vcl d r ay f ao r sil
File: faem0_si762
sil f ih l s epi m ao l hh ow l ix n vcl b ow l w ix cl k l ey sil
sil m ey ay vcl d ow ix n vcl g eh cl k ix s ae n vcl jh ix m aa m ah sil
File: fhxs0_sx175
sil s ix v iy ah m ay eh l cl p iy ah cl k ix n cl t ey vcl b iy dx ih cl t uw r aa n z epi f iy r iy aa r dx iy cl k aa m c
File: fhxs0_sx265
sil dh ix s ao r ih z vcl b r ow cl k ix n s ah cl ch aa cl p dh ax w uh vcl en s cl t eh vcl sil
File: fhxs0_sx355
sil
sil aa l f ih n z aa r ix n cl t eh l ix vcl jh ix n er r iy n m ae m ax l s sil
File: fhxs0_sx445
sil w ah dx ih z ih z l ao vcl jh ix ng vcl b ay dx iy ay n iy ng vcl b el ix cl sil
File: fhxs0_sx85
sil s ix m eh n cl t ix z epi m eh zh uw dx ix n cl k y uw vcl b ih cl k y aa r vcl d z sil
4. 在线解码 (需要microphone)
jerry@hq:/u01/kaldi/egs/timit/s5/exp/tri1$ /u01/kaldi/src/onlinebin/online-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’
另外一个在线解码应用
cd /u01/kaldi/egs/voxforge/online_demo
./run.sh –test-mode live
环境: Ubuntu 12.04, Kaldi
查看语音解码测试数据
jerry@hq:/u01/kaldi/egs/timit/s5/exp/tri1/decode_test/score_5$ more ctm_39phn.filt.dtl
DETAILED OVERALL REPORT FOR THE SYSTEM: exp/tri1/decode_test/score_5/ctm_39phn
SENTENCE RECOGNITION PERFORMANCE
sentences 192
with errors 100.0% ( 192)
with substitions 100.0% ( 192)
with deletions 66.7% ( 128)
with insertions 89.1% ( 171)
WORD RECOGNITION PERFORMANCE
Percent Total Error = 27.1% (1957)
Percent Correct = 78.5% (5663)
Percent Substitution = 17.7% (1277)
Percent Deletions = 3.8% ( 275)
Percent Insertions = 5.6% ( 405)
Percent Word Accuracy = 72.9%
Ref. words = (7215)
Hyp. words = (7345)
Aligned words = (7620)
CONFUSION PAIRS Total (393)
With >= 1 occurances (393)
1: 48 -> z ==> s
2: 42 -> ih ==> ah
3: 37 -> ah ==> ih
4: 36 -> ih ==> iy
5: 26 -> eh ==> ih
6: 25 -> er ==> r
7: 20 -> ae ==> eh
8: 20 -> eh ==> ah
9: 20 -> m ==> n
10: 20 -> s ==> z
11: 19 -> ih ==> eh
12: 17 -> ah ==> aa
13: 17 -> r ==> er
14: 16 -> eh ==> ae
15: 16 -> iy ==> ih
16: 15 -> d ==> t
17: 14 -> er ==> ih
18: 13 -> ah ==> eh
19: 12 -> b ==> p
20: 12 -> ih ==> er
21: 12 -> ow ==> ah
22: 11 -> ay ==> aa
23: 11 -> ey ==> ih
24: 11 -> w ==> l
25: 10 -> aa ==> ah
26: 10 -> ah ==> er
27: 10 -> n ==> dx
28: 10 -> p ==> t
29: 10 -> uw ==> ih
30: 9 -> aa ==> ay
31: 9 -> g ==> k
32: 9 -> iy ==> ey
33: 9 -> n ==> m
34: 8 -> dh ==> d
35: 8 -> ey ==> iy
36: 8 -> g ==> d
37: 8 -> ow ==> l
38: 7 -> aa ==> ae
39: 7 -> ah ==> ow
40: 7 -> b ==> dh
41: 7 -> d ==> dh
42: 7 -> l ==> ow
43: 7 -> n ==> ng
44: 7 -> ng ==> n
45: 7 -> p ==> b
46: 7 -> uh ==> ah
47: 7 -> uh ==> ih
48: 7 -> uw ==> iy
49: 6 -> ae ==> ih
50: 6 -> ih ==> ey
51: 6 -> th ==> t
52: 5 -> aw ==> l
53: 5 -> dx ==> d
54: 5 -> dx ==> n
55: 5 -> ih ==> uw
56: 5 -> ow ==> aa
57: 5 -> sh ==> ch
58: 5 -> t ==> d
59: 5 -> y ==> iy
60: 4 -> (sil) ==> ih
61: 4 -> (sil) ==> n
62: 4 -> aa ==> l
63: 4 -> ae ==> ah
64: 4 -> aw ==> ae
65: 4 -> ay ==> ey
66: 4 -> ch ==> jh
67: 4 -> d ==> dx
68: 4 -> dh ==> dx
69: 4 -> dh ==> z
70: 4 -> eh ==> ay
71: 4 -> ey ==> eh
72: 4 -> ih ==> n
73: 4 -> k ==> g
74: 4 -> l ==> aa
75: 4 -> n ==> ah
76: 4 -> n ==> ih
77: 4 -> oy ==> ih
78: 4 -> t ==> dh
79: 4 -> v ==> b
80: 3 -> (sil) ==> dx
81: 3 -> (sil) ==> f
82: 3 -> (sil) ==> iy
83: 3 -> (sil) ==> r
84: 3 -> ae ==> ay
85: 3 -> ae ==> ey
86: 3 -> ah ==> sil
87: 3 -> ah ==> uh
88: 3 -> aw ==> aa
89: 3 -> ay ==> ah
90: 3 -> d ==> g
91: 3 -> dh ==> b
92: 3 -> dh ==> n
93: 3 -> dh ==> t
94: 3 -> f ==> p
95: 3 -> f ==> s
96: 3 -> f ==> sil
97: 3 -> g ==> ah
98: 3 -> ih ==> ow
99: 3 -> iy ==> y
100: 3 -> jh ==> t
101: 3 -> k ==> d
102: 3 -> l ==> ah
103: 3 -> l ==> dh
104: 3 -> l ==> v
105: 3 -> l ==> w
106: 3 -> n ==> ae
107: 3 -> ng ==> m
108: 3 -> ng ==> sil
109: 3 -> s ==> f
110: 3 -> sh ==> t
111: 3 -> t ==> s
112: 3 -> th ==> dh
113: 3 -> th ==> s
114: 3 -> v ==> dh
115: 3 -> v ==> dx
116: 3 -> v ==> f
117: 3 -> w ==> uw
118: 2 -> (sil) ==> dh
119: 2 -> (sil) ==> s
120: 2 -> aa ==> n
121: 2 -> aa ==> sil
122: 2 -> ah ==> ay
123: 2 -> ah ==> l
124: 2 -> ah ==> n
125: 2 -> ay ==> ih
126: 2 -> ay ==> n
127: 2 -> b ==> d
128: 2 -> ch ==> sh
129: 2 -> ch ==> t
130: 2 -> d ==> uw
131: 2 -> dh ==> l
132: 2 -> dh ==> s
133: 2 -> dx ==> dh
134: 2 -> eh ==> ey
135: 2 -> er ==> ah
136: 2 -> er ==> eh
137: 2 -> f ==> dh
138: 2 -> f ==> v
139: 2 -> hh ==> ah
140: 2 -> hh ==> n
141: 2 -> ih ==> ae
142: 2 -> ih ==> dh
143: 2 -> ih ==> l
144: 2 -> ih ==> r
145: 2 -> jh ==> ch
146: 2 -> jh ==> sh
147: 2 -> jh ==> sil
148: 2 -> k ==> t
149: 2 -> l ==> dx
150: 2 -> l ==> oy
151: 2 -> l ==> p
152: 2 -> l ==> t
153: 2 -> m ==> ae
154: 2 -> m ==> ih
155: 2 -> m ==> sil
156: 2 -> m ==> v
157: 2 -> n ==> l
158: 2 -> n ==> r
159: 2 -> n ==> sil
160: 2 -> n ==> v
161: 2 -> n ==> y
162: 2 -> ng ==> ih
163: 2 -> ow ==> ae
164: 2 -> oy ==> eh
165: 2 -> oy ==> ey
166: 2 -> p ==> dh
167: 2 -> r ==> ih
168: 2 -> s ==> sh
169: 2 -> s ==> th
170: 2 -> t ==> f
171: 2 -> t ==> ih
172: 2 -> t ==> p
173: 2 -> t ==> sil
174: 2 -> th ==> eh
175: 2 -> uh ==> aa
176: 2 -> uh ==> eh
177: 2 -> uw ==> ah
178: 2 -> uw ==> sil
179: 2 -> v ==> ow
180: 2 -> z ==> dh
181: 2 -> z ==> sil
182: 1 -> (sil) ==> aa
183: 1 -> (sil) ==> ah
184: 1 -> (sil) ==> ay
185: 1 -> (sil) ==> b
186: 1 -> (sil) ==> d
187: 1 -> (sil) ==> k
188: 1 -> (sil) ==> l
189: 1 -> (sil) ==> p
190: 1 -> (sil) ==> t
191: 1 -> (sil) ==> v
192: 1 -> (sil) ==> w
193: 1 -> aa ==> aw
194: 1 -> aa ==> er
195: 1 -> aa ==> iy
196: 1 -> aa ==> m
197: 1 -> aa ==> ow
198: 1 -> aa ==> oy
199: 1 -> aa ==> t
200: 1 -> aa ==> w
201: 1 -> ae ==> aw
202: 1 -> ae ==> er
203: 1 -> ae ==> n
204: 1 -> ah ==> ae
205: 1 -> ah ==> ch
206: 1 -> ah ==> f
207: 1 -> ah ==> hh
208: 1 -> ah ==> iy
209: 1 -> ah ==> r
210: 1 -> ah ==> t
211: 1 -> ah ==> uw
212: 1 -> aw ==> ah
213: 1 -> aw ==> eh
214: 1 -> aw ==> ow
215: 1 -> aw ==> w
216: 1 -> ay ==> ae
217: 1 -> ay ==> eh
218: 1 -> ay ==> er
219: 1 -> ay ==> r
220: 1 -> ay ==> s
221: 1 -> ay ==> sil
222: 1 -> ay ==> th
223: 1 -> b ==> g
224: 1 -> b ==> l
225: 1 -> b ==> w
226: 1 -> ch ==> s
227: 1 -> d ==> b
228: 1 -> d ==> eh
229: 1 -> d ==> f
230: 1 -> d ==> k
231: 1 -> d ==> n
232: 1 -> d ==> sil
233: 1 -> dh ==> f
234: 1 -> dh ==> g
235: 1 -> dh ==> ih
236: 1 -> dh ==> k
237: 1 -> dh ==> m
238: 1 -> dh ==> sil
239: 1 -> dh ==> v
240: 1 -> dx ==> eh
241: 1 -> dx ==> iy
242: 1 -> dx ==> l
243: 1 -> dx ==> sh
244: 1 -> eh ==> dh
245: 1 -> eh ==> k
246: 1 -> eh ==> p
247: 1 -> eh ==> s
248: 1 -> eh ==> sil
249: 1 -> er ==> aa
250: 1 -> er ==> dx
251: 1 -> er ==> g
252: 1 -> er ==> k
253: 1 -> er ==> m
254: 1 -> er ==> n
255: 1 -> er ==> sil
256: 1 -> er ==> uw
257: 1 -> er ==> v
258: 1 -> ey ==> ae
259: 1 -> ey ==> ay
260: 1 -> ey ==> r
261: 1 -> f ==> aa
262: 1 -> f ==> b
263: 1 -> f ==> eh
264: 1 -> f ==> t
265: 1 -> f ==> th
266: 1 -> f ==> y
267: 1 -> g ==> b
268: 1 -> hh ==> dx
269: 1 -> hh ==> ey
270: 1 -> hh ==> k
271: 1 -> hh ==> l
272: 1 -> hh ==> p
273: 1 -> ih ==> d
274: 1 -> ih ==> ng
275: 1 -> ih ==> oy
276: 1 -> ih ==> s
277: 1 -> ih ==> sil
278: 1 -> ih ==> uh
279: 1 -> ih ==> v
280: 1 -> ih ==> y
281: 1 -> iy ==> d
282: 1 -> iy ==> k
283: 1 -> iy ==> oy
284: 1 -> jh ==> g
285: 1 -> jh ==> z
286: 1 -> k ==> aa
287: 1 -> k ==> eh
288: 1 -> l ==> d
289: 1 -> l ==> eh
290: 1 -> l ==> hh
291: 1 -> l ==> m
292: 1 -> l ==> r
293: 1 -> l ==> sil
294: 1 -> l ==> th
295: 1 -> l ==> uh
296: 1 -> l ==> uw
297: 1 -> l ==> y
298: 1 -> m ==> aa
299: 1 -> m ==> ah
300: 1 -> m ==> b
301: 1 -> m ==> dh
302: 1 -> m ==> eh
303: 1 -> m ==> l
304: 1 -> m ==> ng
305: 1 -> m ==> ow
306: 1 -> m ==> t
307: 1 -> m ==> w
308: 1 -> n ==> b
309: 1 -> n ==> d
310: 1 -> n ==> dh
311: 1 -> n ==> eh
312: 1 -> n ==> ey
313: 1 -> n ==> iy
314: 1 -> n ==> p
315: 1 -> n ==> t
316: 1 -> ow ==> ay
317: 1 -> ow ==> dx
318: 1 -> ow ==> eh
319: 1 -> ow ==> ih
320: 1 -> ow ==> m
321: 1 -> ow ==> p
322: 1 -> ow ==> r
323: 1 -> ow ==> sil
324: 1 -> ow ==> uw
325: 1 -> ow ==> v
326: 1 -> oy ==> iy
327: 1 -> oy ==> ow
328: 1 -> oy ==> r
329: 1 -> oy ==> w
330: 1 -> p ==> ah
331: 1 -> p ==> aw
332: 1 -> p ==> d
333: 1 -> p ==> dx
334: 1 -> p ==> hh
335: 1 -> p ==> ih
336: 1 -> p ==> k
337: 1 -> p ==> l
338: 1 -> p ==> m
339: 1 -> r ==> ah
340: 1 -> r ==> aw
341: 1 -> r ==> ay
342: 1 -> r ==> b
343: 1 -> r ==> dx
344: 1 -> r ==> l
345: 1 -> r ==> p
346: 1 -> r ==> sh
347: 1 -> r ==> sil
348: 1 -> r ==> v
349: 1 -> s ==> ch
350: 1 -> s ==> ey
351: 1 -> s ==> ih
352: 1 -> s ==> sil
353: 1 -> s ==> t
354: 1 -> sh ==> f
355: 1 -> sh ==> jh
356: 1 -> sh ==> n
357: 1 -> sh ==> r
358: 1 -> sh ==> s
359: 1 -> t ==> dx
360: 1 -> t ==> k
361: 1 -> th ==> d
362: 1 -> th ==> ey
363: 1 -> th ==> f
364: 1 -> th ==> l
365: 1 -> th ==> sil
366: 1 -> uh ==> d
367: 1 -> uh ==> dx
368: 1 -> uh ==> er
369: 1 -> uh ==> ow
370: 1 -> uh ==> uw
371: 1 -> uw ==> er
372: 1 -> uw ==> ey
373: 1 -> uw ==> l
374: 1 -> uw ==> ow
375: 1 -> uw ==> t
376: 1 -> v ==> d
377: 1 -> v ==> ih
378: 1 -> v ==> k
379: 1 -> v ==> m
380: 1 -> v ==> n
381: 1 -> v ==> ng
382: 1 -> v ==> s
383: 1 -> v ==> sil
384: 1 -> v ==> z
385: 1 -> w ==> aa
386: 1 -> w ==> m
387: 1 -> w ==> ow
388: 1 -> w ==> sil
389: 1 -> w ==> y
390: 1 -> y ==> sh
391: 1 -> y ==> w
392: 1 -> z ==> ih
393: 1 -> z ==> sh
——-
1277
INSERTIONS Total (36)
With >= 1 occurances (36)
1: 74 -> sil
2: 39 -> ih
3: 26 -> aa
4: 26 -> l
5: 25 -> ah
6: 25 -> r
7: 17 -> n
8: 15 -> t
9: 13 -> dh
10: 12 -> iy
11: 11 -> d
12: 11 -> eh
13: 10 -> m
14: 10 -> ow
15: 8 -> ay
16: 8 -> dx
17: 8 -> hh
18: 7 -> ey
19: 7 -> k
20: 7 -> w
21: 6 -> ae
22: 6 -> s
23: 5 -> y
24: 4 -> b
25: 4 -> p
26: 3 -> f
27: 3 -> jh
28: 3 -> ng
29: 2 -> aw
30: 2 -> er
31: 2 -> v
32: 2 -> z
33: 1 -> ch
34: 1 -> sh
35: 1 -> th
36: 1 -> uw
——-
405
DELETIONS Total (33)
With >= 1 occurances (33)
1: 31 -> ih
2: 23 -> ah
3: 19 -> n
4: 17 -> r
5: 13 -> k
6: 12 -> hh
7: 11 -> eh
8: 11 -> t
9: 11 -> y
10: 10 -> b
11: 10 -> d
12: 10 -> m
13: 9 -> l
14: 9 -> v
15: 9 -> w
16: 8 -> dh
17: 7 -> ow
18: 6 -> aa
19: 6 -> er
20: 5 -> dx
21: 5 -> g
22: 5 -> th
23: 4 -> iy
24: 4 -> p
25: 4 -> uw
26: 3 -> ng
27: 3 -> s
28: 2 -> ae
29: 2 -> aw
30: 2 -> uh
31: 2 -> z
32: 1 -> ch
33: 1 -> oy
——-
275
SUBSTITUTIONS Total (39)
With >= 1 occurances (39)
1: 143 -> ih
2: 104 -> ah
3: 73 -> eh
4: 55 -> n
5: 54 -> z
6: 52 -> er
7: 42 -> aa
8: 41 -> l
9: 39 -> ae
10: 38 -> m
11: 37 -> d
12: 37 -> ow
13: 36 -> dh
14: 35 -> (sil)
15: 32 -> s
16: 31 -> iy
17: 29 -> ay
18: 29 -> r
19: 28 -> p
20: 26 -> ey
21: 26 -> uw
22: 24 -> b
23: 24 -> v
24: 23 -> uh
25: 22 -> t
26: 21 -> g
27: 19 -> f
28: 19 -> th
29: 19 -> w
30: 16 -> aw
31: 16 -> dx
32: 15 -> ng
33: 13 -> sh
34: 12 -> oy
35: 11 -> jh
36: 11 -> k
37: 9 -> ch
38: 9 -> hh
39: 7 -> y
——-
1277
* NOTE: The ‘Substitution’ words are those reference words
for which the recognizer supplied an incorrect word.
FALSELY RECOGNIZED Total (39)
With >= 1 occurances (39)
1: 155 -> ih
2: 119 -> ah
3: 74 -> eh
4: 67 -> s
5: 64 -> iy
6: 56 -> n
7: 54 -> t
8: 48 -> aa
9: 45 -> l
10: 44 -> er
11: 42 -> dh
12: 40 -> d
13: 39 -> ae
14: 39 -> r
15: 36 -> sil
16: 33 -> dx
17: 31 -> ey
18: 26 -> ow
19: 26 -> z
20: 25 -> p
21: 22 -> ay
22: 21 -> b
23: 20 -> m
24: 19 -> k
25: 16 -> f
26: 15 -> uw
27: 15 -> v
28: 11 -> g
29: 10 -> ng
30: 10 -> sh
31: 10 -> w
32: 9 -> ch
33: 9 -> y
34: 5 -> jh
35: 5 -> oy
36: 5 -> th
37: 5 -> uh
38: 4 -> aw
39: 3 -> hh
——-
1277
* NOTE: The ‘Falsely Recognized’ words are those hypothesis words
which the recognizer incorrectly substituted for a reference word.
环境: Ubuntu 12.04, Kaldi
1. 在网上下载timit语音库,解压放到/u01/kaldi/egs/timit/s5/data目录下
jerry@hq:/u01/kaldi/egs/timit/s5/data/timit$ ls
doc readme.doc test TIMIT_phonemes.Table train
#timit=/export/corpora5/LDC/LDC93S1/timit/TIMIT # @JHU
timit=/u01/kaldi/egs/timit/s5/data/timit # @BUT
local/timit_data_prep.sh $timit || exit 1
3. 直接运行./run.sh就可以了
***********************************************************************************************************************************************************
下面讲解下run.sh脚本的一些处理流程:
local/timit_data_prep.sh —– 从训练数据库/u01/kaldi/egs/timit/s5/data/timit中抽取出训练数据的目录位置并写到/u01/kaldi/egs/timit/s5/data/local/data, 这里使用的命令/u01/kaldi/src/featbin/wav-to-duration
local/timit_prepare_dict.sh —– 生成字典数据并放至到/u01/kaldi/egs/timit/s5/data/local/dict,使用的命令/u01/kaldi/tools/irstlm/bin/compile-lm, /u01/kaldi/tools/irstlm/bin/build-lm.sh,
utils/prepare_lang.sh —– 借助字典数据生成语言模型并放至 /u01/kaldi/egs/timit/s5/data/lang,使用的命令utils/make_lexicon_fst.pl, utils/sym2int.pl, fstcompile, fstaddselfloops, fstarcsort,
steps/make_mfcc.sh, steps/compute_cmvn_stats.sh —- 借助local/timit_data_prep.sh生成的数据位置抽取出MFCC特征,数据放到到 /u01/kaldi/egs/timit/s5/data/train,使用的命令compute-mfcc-feats, compute-cmvn-stats, copy-feats, copy-matrix
单音素训练与解码
steps/train_mono.sh —- 借助前两步生成的mfcc和语言模型生成单音素,使用命令gmm-init-mono, compile-train-graphs , align-equal-compiled, gmm-acc-stats-ali, gmm-est, gmm-align-compiled
utils/mkgraph.s —- 生成decoding graph, 使用的命令fsttablecompose, fstminimizeencoded, fstisstochastic, fstcomposecontext, make-h-transducer, fstdeterminizestar, fstrmsymbols, fstrmepslocal, add-self-loops
steps/decode.sh —- 解码数据,使用命令gmm-latgen-faster, gmm-decode-faster, compute-wer
环境: Ubuntu 12.4
kaldi + pdnn是一个能应用深度学习功能的语音识别系统
kaldi的安装如下:
1. 下载文件至当前目录的kaldi文件内
svn co https://svn.code.sf.net/p/kaldi/code/trunk kaldi
2. 编译安装使用的工具
cd kaldi/tool
make
(过程会下载一些工具文件,过程比较长一些)
3. 配置安装kaldi
cd ../src
./configure
make all
4. 测试实例
cd ../egs/yesno/s5
./run.sh
注:
编译时出现以下问题:
1. fatal error: clapack.h: No such file or directory
安装libfreefem++-dev
apt-get install libfreefem++-dev
还得./configure后make all
2. 无法找到文件 libfstscript.so.1
cd /u01/kaldi/tools/openfst
./configure
make
sudo make install
执行这些步骤即可
环境: Ubuntu 12.4
CMUSphinx语音识别工具包下载
Pocketsphinx —用C语言编写的轻量级识别库,主要是进行识别的。
Sphinxbase — Pocketsphinx所需要的支持库,主要完成的是语音信号的特征提取;
Sphinx3 —为语音识别研究用C语言编写的解码器
Sphinx4 —为语音识别研究用JAVA语言编写的解码器
CMUclmtk —语言模型训练工具
Sphinxtrain —声学模型训练工具
官方地址:http://cmusphinx.sourceforge.net/
下载地址如下:http://sourceforge.net/projects/cmusphinx/files/
由于只是测试使用,故只需要下载Pocketsphinx和Sphinxbase这两个包,下载文件如下: pocketsphinx-0.8.tar.gz sphinxbase-0.8.tar.gz
1. 安装Sphinxbase
tar xvf sphinxbase-0.8.tar.gz
cd sphinxbase-0.8
./configure
sudo make
sudo make install
2. 安装pocketsphinx
配置Sphinxbase的环境变量
vi ~/.bashrc
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
(这个PKG_CONFIG_PATH主要指明Sphinxbase的.pc文件的所在路径,这样 pkg-config工具就可以根据.pc文件的内容动态生成编译和连接选项,比如 cflags(编译用的头文件)和 libs (连接用的库))
这个设置只对当前系统生效,编辑系统的链接文件/etc/ld.so.conf去使其长期有效,如下:
sudo vi /etc/ld.so.conf
打开后,在新的一行添加(每一个路径一行):
/usr/local/lib
/usr/local/lib/pkgconfig
然后,执行:
sudo ldconfig
编译安装pocketsphinx:
tar xvf pocketsphinx-0.8.tar.gz
cd pocketsphinx-0.8
./configure
sudo make
sudo make install
3. 测试安装效果
pocketsphinx_continuous -infile pocketsphinx-0.8/test/data/cards/005.wav > audio.result
查询识别结果
more audio.result
000000000: eight of states for a close seven of hearts