python安全管理子进程-subprocess
http://blog.csdn.net/zbyufei/article/details/6412043
python控制shell执行时间,若超时则强行推出
http://outofmemory.cn/code-snippet/1127/python-control-shell-zhixingshijian-ruo-chaoshi-ze-qiangxing-tuichu
量化自我和极简主义的窝藏点
http://blog.csdn.net/zbyufei/article/details/6412043
http://outofmemory.cn/code-snippet/1127/python-control-shell-zhixingshijian-ruo-chaoshi-ze-qiangxing-tuichu
APScheduler是基于类似于Java Quartz的一个Python定时任务框架,实现了Quartz的所有功能。提供了基于日期、固定时间间隔以及crontab类型的任务,并且可以持久化任务。 它利用sqlalchemy包实现job状态存储于关系数据库,例:
__oracle_url = ‘oracle://test1:test1@10.44.74.13:8521/biprod’
__configure = { ‘apscheduler.standalone’: True,
‘apscheduler.jobstores.sqlalchemy_store.class’: ‘apscheduler.jobstores.sqlalchemy_store:SQLAlchemyJobStore’,
‘apscheduler.jobstores.sqlalchemy_store.url’: __oracle_url}
from apscheduler.jobstores.sqlalchemy_store import SQLAlchemyJobStore
from apscheduler.scheduler import Scheduler
scheduler = Scheduler(standalone=False)
scheduler.add_jobstore(SQLAlchemyJobStore(url=__oracle_url), ‘default’)
使用请参考: http://pythonhosted.org/APScheduler/index.html
Gearman是一款开源的通用的分布式任务分发框架,自己本身不做任何实际的工作。它可以将一个个的任务分发给其他的物理机器或者进程,以达到工作的并行运行和LB。 有人说Gearman是分布式 计算框架其实是不太准确的,因为相较于Hadoop而言,Gearman更偏重于任务的分发而不是执行。Gearman扮演的角色更像是一系列分布式进程 的神经系统。
Gearman框架中一共有三个角色:
由于yum或apt-get安装的版本太低,一般都到手工下载最新版本编译安装。步骤如下:
1. 安装依赖包, sudo apt-get install gcc autoconf bison flex libtool make libboost-all-dev libcurl4-openssl-dev curl libevent-dev memcached uuid-dev libpq-dev
2. 下载安装程序, wget https://launchpad.net/gearmand/1.2/1.1.5/+download/gearmand-1.1.5.tar.g
3. 解压编译安装,
tar xvzf gearmand-1.1.5.tar.gz
cd gearmand-1.1.5
./configure
make
make install
4. 当运行 /usr/local/sbin/gearmand -d 时出现 error while loading shared libraries: libgearman.so.1, 运行如下 sudo ldconfig
启动gearman:
1. gearmand –pid-file=/var/run/gearman/gearmand.pid –daemon –log-file=/var/log/gearman-job-server/gearman.log –listen=192.168.56.101
gearmand –verbose DEBUG -d
2. 通过命令行工具来体验 Gearman 的功能:
启动 Worker:gearman -w -f wc — wc -l &
运行 Client:gearman -f wc < /etc/passwd
gearman -w -f testgm — python &
gearman -f testgm < test_gearman.py
环境: Ubuntu 12.04
FFmpeg是一个自由软件,可以运行音频和视频多种格式的录影、转换、流功能,包含了libavcodec ─这是一个用于多个项目中音频和视频的解码器库,以及libavformat——一个音频与视频格式转换库。
下载安装: sudo apt-get install ffmpeg, sudo apt-get install libav-tools
avconv : 是一个快速的音频和视频转换器,它也可以从音频视频流中提取。可以转换任意采样率和以高质量的多相过滤器更改视频大小。
转换mp3到wav :
ffmpeg -i Charlottes.Web-001.mp3 -acodec pcm_s16le -ar 16000 out.wav
avconv -i Charlottes.Web-001.mp3 -acodec pcm_s16le out.wav
查看编码方式:
ffmpeg -codecs
avconv -codecs
环境: Ubuntu 12.04, Kaldi
在训练timit语音库已经运行到“MMI + SGMM2 Training & Decoding”,由于是在虚拟机上的ubuntu,且硬件配置一般,再往向训练DNN模型的发现需要花很长很长时间,因此就止步在那。 想使用训练的模型来做在线解码 (http://blog.itpub.net/16582684/viewspace-1270816/),发现却无法使用(timit训练数据中的wav文件是sphere格式,而voxforge的wav是可以播放),因而转向对voxforge语音库的训练。由于voxforge语音库是开源的,不像timit有版权限制,同时其训练的模型也能支持在线解码,所以对这个语音库来进行训练。
步骤:
1. 安装mitlm, g2p依赖的库
sudo apt-get install flac
sudo apt-get install swig
2. 切换到/u01/kaldi/egs/voxforge/s5,脚本local/voxforge_prepare_lm.sh有安装mitlm的步骤,但发现无法从http://mitlm.googlecode.com/svn/trunk/地址上svn checkout下源码,只好从https://mitlm.googlecode.com/files/mitlm-0.4.1.tar.gz下载源码,放到tools下,解压后更名为mitlm-svn, 注释掉脚本local/voxforge_prepare_lm.sh中”svn checkout -r103 http://mitlm.googlecode.com/svn/trunk/ tools/mitlm-svn“
3. 修改脚本getdata.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项,运行脚本./getdata.sh下载并解压数据,由于下载慢同时机器配置一般,只下载100M左右数据并解压
4. 修改脚本run.sh,增加DATA_ROOT=/u01/kaldi/egs/voxforge/s5/data这一项 ,由于数据量比较小,还有几项修改如下:
nspk_test=7
utils/subset_data_dir.sh data/train 15 data/train.1k || exit 1;
5. 运行脚本./run,风扇狂响,CPU使用率直接飙到100%,大概五个小时。运行到”# Do MMI on top of LDA+MLLT.“,输出如下:
=== Starting VoxForge subset selection(accent: ((American)|(British)|(Australia)|(Zealand))) …
*** VoxForge subset selection finished!
=== Starting to map anonymous users to unique IDs …
— Mapping the “anonymous” speakers to unique IDs …
ls: cannot access /u01/kaldi/egs/voxforge/s5/data/selected/anonymous-*-*: No such file or directory
*** Finished mapping anonymous users!
=== Starting initial VoxForge data preparation …
— Making test/train data split …
17 data/local/tmp/speakers_all.txt
10 data/local/tmp/speakers_train.txt
7 data/local/tmp/speakers_test.txt
17 total
— Preparing test_wav.scp, test_trans.txt and test.utt2spk …
— Preparing test.spk2utt …
— Preparing train_wav.scp, train_trans.txt and train.utt2spk …
steps/decode.sh –config conf/decode.config –iter 3 –nj 2 –cmd run.pl exp/tri 2b/graph data/test exp/tri2b_mmi/decode_it3
decode.sh: feature type is lda
exp/tri2b_mmi/decode_it3/wer_10
%WER 97.59 [ 1657 / 1698, 29 ins, 649 del, 979 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_11
%WER 97.17 [ 1650 / 1698, 22 ins, 713 del, 915 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_12
%WER 96.76 [ 1643 / 1698, 15 ins, 787 del, 841 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_13
%WER 96.41 [ 1637 / 1698, 15 ins, 837 del, 785 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_14
%WER 96.64 [ 1641 / 1698, 11 ins, 888 del, 742 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_15
%WER 96.82 [ 1644 / 1698, 7 ins, 930 del, 707 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_16
%WER 97.06 [ 1648 / 1698, 7 ins, 967 del, 674 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_17
%WER 97.17 [ 1650 / 1698, 9 ins, 997 del, 644 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_18
%WER 97.17 [ 1650 / 1698, 9 ins, 1013 del, 628 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_19
%WER 97.41 [ 1654 / 1698, 9 ins, 1027 del, 618 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_20
%WER 97.17 [ 1650 / 1698, 9 ins, 1037 del, 604 sub ]
%SER 100.00 [ 180 / 180 ]
exp/tri2b_mmi/decode_it3/wer_9
%WER 98.00 [ 1664 / 1698, 35 ins, 582 del, 1047 sub ]
%SER 100.00 [ 180 / 180 ]
6. 将 /u01/kaldi/egs/voxforge/s5/exp/tri2b/graph拷贝到/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,切换至/u01/kaldi/egs/voxforge/s5/exp/tri2b_mmi目录,
在线解码,执行如下
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’ ark,t:trans.txt ark,t:ali.txt final.mat
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/test/wav_test.scp final.mdl graph/HCLG.fst graph/words.txt 1:2:3:4:5 ark,t:trans.txt ark,t:ali.txt final.mat
File: AT-20130718-lws-a0011
FROM EXPLAINED INCIDENTAL ACCIDENTAL AND FROM SHE
File: Aaron-20080318-pwn-a0265
DISGUSTED THE MANIFESTED THERE
File: Aaron-20080318-pwn-a0266
THERE WAS PASSIONATELY IT WAS THERE
File: AdrianMcNear-20091016-psv-a0573
IT IS GOING TO YOU MY WEEKS TO SUGGESTED PC THAT FOR SHUDDERED
至此,整个流程都走通。
结论: 总共才100M的语音文件,训练时间之长。 当然跟硬件环境有关系。但整个voxforge语音库有20G左右,如果真的全部来训练的话,不知要多久才能跑完,看看有谁跑完的话告知下运行时间。
环境:Ubuntu 12.04, Kaldi
timit训练完语音模型后可以进入解码,
1. 首先安装PortAudio
cd /u01/kaldi/tools/portaudio
./configure
make
sudo make install
2. 编译安装onlinebin
cd /u01/kaldi/src/onlinebin
make
离线解码:
3. 切换到训练好的模型目录/u01/kaldi/egs/timit/s5/exp/tri1,执行命令如下:
/u01/kaldi/src/onlinebin/online-wav-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 scp:../../data/train/split10/1/wav.scp final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’ ark,t:trans.txt ark,t:ali.txt
结果输出如下:
File: faem0_si1392
sil ax s uw m f ao r ix vcl z ae m cl p el ax s ix cl ch uw ey sh en w er f aa r m hh eh z ax cl p ae cl k ix ng sh eh vcl d ae n vcl d f iy l vcl s sil
File: faem0_si2022
sil
sil
sil w ah dx ow cl t ih cl t ih sh iy vcl d r ay f ao r sil
File: faem0_si762
sil f ih l s epi m ao l hh ow l ix n vcl b ow l w ix cl k l ey sil
sil m ey ay vcl d ow ix n vcl g eh cl k ix s ae n vcl jh ix m aa m ah sil
File: fhxs0_sx175
sil s ix v iy ah m ay eh l cl p iy ah cl k ix n cl t ey vcl b iy dx ih cl t uw r aa n z epi f iy r iy aa r dx iy cl k aa m c
File: fhxs0_sx265
sil dh ix s ao r ih z vcl b r ow cl k ix n s ah cl ch aa cl p dh ax w uh vcl en s cl t eh vcl sil
File: fhxs0_sx355
sil
sil aa l f ih n z aa r ix n cl t eh l ix vcl jh ix n er r iy n m ae m ax l s sil
File: fhxs0_sx445
sil w ah dx ih z ih z l ao vcl jh ix ng vcl b ay dx iy ay n iy ng vcl b el ix cl sil
File: fhxs0_sx85
sil s ix m eh n cl t ix z epi m eh zh uw dx ix n cl k y uw vcl b ih cl k y aa r vcl d z sil
4. 在线解码 (需要microphone)
jerry@hq:/u01/kaldi/egs/timit/s5/exp/tri1$ /u01/kaldi/src/onlinebin/online-gmm-decode-faster –rt-min=0.3 –rt-max=0.5 –max-active=4000 –beam=12.0 –acoustic-scale=0.0769 final.mdl graph/HCLG.fst graph/words.txt ‘1:2:3:4:5’
另外一个在线解码应用
cd /u01/kaldi/egs/voxforge/online_demo
./run.sh –test-mode live