SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
Frames and Times
Frames in Speech Processing

SPRAAK uses frame synchronous signal processing when converting sampled data to a sequence of feature vectors.

Following naming conventions are used:

FSHIFT, FLENGTH, TIMEBASE are the relevant keys in the SPRAAK file headers.

FRAME Convention in SPRAAK

In SPRAAK the central part of a frame (c.q. its position) is uniquely defined by the number of a frame (IFR) and the frameshift parameter (here expressed in number of samples): [ IFR*FSHIFT:(IFR+1)*FSHIFT-1 ]. If the framelength is larger than the frameshift (what's normally the case), the frame is extended symmetrically around the central part; thus the full frame ranges over: [IFR*FSHIFT-(FLENGTH-FSHIFT)/2:(IFR+1)*FS HIFT-1+(FLENGTH-FSHIFT)/2]. This is graphically illustrated in the diagram below:

      0123456789012345678901234567890123..      sample index (last digit shown only)
      ||||||||||||||||||||||||||||||||||||      sampled data

                                                FSHIFT=10, FLENGTH=20
 -----012345678901234                           frame 0 [-5:14]
           56789012345678901234                 frame 1 [5:24]
                     56789012345678901234       frame 2 [15:34]
 
                                                FSHIFT=10, FLENGTH=14

    --012345678901                              frame 0 [-2:11]
              89012345678901                    frame 1 [8:21]
                        89012345678901          frame 2 [18:31]
Motivation

This approach has following significant advantages:

This approach also has one minor drawback. Computations involving initial and final frames will normally require data that extends beyond the file boundaries. There is however an easy solution to this missing data problem: a sufficiently large segment of initial/final data is duplicated beyond the file boundary in order to account for the missing data. This rarely leads to problems as more often than not the initial and final data is (stationary) noise.