Frames in Speech Processing

SPRAAK uses frame synchronous signal processing when converting sampled data to a sequence of feature vectors.

Following naming conventions are used:

frameshift (FSHIFT): the shift between two consecutive frames (default = 0.01 sec)
framelength (FLENGTH): the length of an individual frame (default = 0.03 sec)
timebase (TIMEBASE): the unit in which frameshift and framelength are expressed, i.e. TIMEBASE=CONTINUOUS expre sses time in seconds and TIMEBASE=DISCRETE expresses time in number of frames.

FSHIFT, FLENGTH, TIMEBASE are the relevant keys in the SPRAAK file headers.

FRAME Convention in SPRAAK

In SPRAAK the central part of a frame (c.q. its position) is uniquely defined by the number of a frame (IFR) and the frameshift parameter (here expressed in number of samples): [ IFR*FSHIFT:(IFR+1)*FSHIFT-1 ]. If the framelength is larger than the frameshift (what's normally the case), the frame is extended symmetrically around the central part; thus the full frame ranges over: [IFR*FSHIFT-(FLENGTH-FSHIFT)/2:(IFR+1)*FS HIFT-1+(FLENGTH-FSHIFT)/2]. This is graphically illustrated in the diagram below:

      0123456789012345678901234567890123..      sample index (last digit shown only)
      ||||||||||||||||||||||||||||||||||||      sampled data

                                                FSHIFT=10, FLENGTH=20
 -----012345678901234                           frame 0 [-5:14]
           56789012345678901234                 frame 1 [5:24]
                     56789012345678901234       frame 2 [15:34]
 
                                                FSHIFT=10, FLENGTH=14

    --012345678901                              frame 0 [-2:11]
              89012345678901                    frame 1 [8:21]
                        89012345678901          frame 2 [18:31]

Motivation

This approach has following significant advantages:

a 1-to-1 synchronization of frames that are the outcome of signal processing algorithms using the same frameshi ft, irrespective of the framelength
a straightforward synchronization of sample files and frames
a straightforward conversion between frames and times and viceversa
a straightforward synchronization of frames that are the outcome of signal processings algorithms using framesh ifts that are easily related (e.g. 2:1) .
it is natural to maintain the syncrhonization throughout all further processing (including the computation of f eature_vs_time derivaties)

This approach also has one minor drawback. Computations involving initial and final frames will normally require data that extends beyond the file boundaries. There is however an easy solution to this missing data problem: a sufficiently large segment of initial/final data is duplicated beyond the file boundary in order to account for the missing data. This rarely leads to problems as more often than not the initial and final data is (stationary) noise.