SPRAAK
|
begin/end point detection of speech More...
Functions | |
void | spr_bed_free (SprSspInfo *Info) |
int | spr_bed_setup (SprSspInfo *Info, const char **descript, void *aux_info) |
int | spr_bed_process (SprSspInfo *Info, const void *frame_in, void *frame_out) |
void | spr_bed_reset (SprSspInfo *Info, SprSspStatus *action) |
begin/end point detection of speech
This program removes the leading and trailing noise frames from a data stream.
[online_bed] | |
---|---|
nfr_lead <number>(20) | |
Number of extra leading noise frames. | |
nfr_tail <number>(20) | |
Number of extra trailing noise frames. | |
spch_win <number>(60) | |
Window used to detect a speech fragment. | |
min_spch <number>(30) | |
Min. number of speech frames in the speech window. | |
sil_win <number>(60) | |
Window used to detect silence fragments. | |
max_spch <number>(10) | |
Max. number of speech frames in the silence window. | |
clean_sil_spch | |
Return clean silence speech codes instead of removing frames. | |
trigger_pos | |
Return the trigger positions only (nfr with respect to the previous state). | |
method <sent/trunc/glob>(glob) | |
Determines how bed works. The <sent> mode work in a continuous mode, e.g. marking sentences in a dialog. The <trunc> mode will detect the first speech segment only (the rest will be classified as noise). The <glob> method will remove leading and trailing noise in a sentence, intermediate noise will be marked as speech to. | |
sil_range <min_range>(0.0) <max_range>(0.0) | |
Build in SpchDet: The <min_range> parameter is needed for signals that are almost noise free, and gives a lower bound on the amplitude of the noise. The <max_range> gives an upper bound on the amplitude of the noise, and gives more robustness if the speech detector is started in a speech segment. | |
spch_raise <raise>(0.25) | |
Build in SpchDet: minimum raise in the speech level (averaged over 4 consecutive frames) needed to detect the very first word. | |
spch_range <range>(20.0) | |
Build in SpchDet: signals (amplitude) below the average speech level divided by <range> are also treated as noise. |