SPRAAK
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Groups Pages
File formats and headers

Shortcuts

File formats

SPRAAK supports a variety of file formats. All formats have the same basic layout:

Header-less files are also supported. In that case any necessary header information can be provided in an external file or as extra information with the file name. See Specifying a stream of data (file) and Some practical examples for more details.

For a list of supported file formats, encryption and compression schemes and other features, see Data IO in SPRAAK.

The SPRAAK header format

A SPRAAK header consist of:

A key-value pair typically consist of a single line containing the following elements:

If desired, a multi-line variant can be made by having a line that ends with a back-slash new-line combo.

The header may contain empty lines. The keys used in a SPRAAK header are identical to the keys exposed to the SPRAAK enviroment (see The standard keys).

The standard keys

The key-set is per definition open, i.e. each object may define its own keys and corresponding values. However, in order to fit optimally in the SPRAAK framework, the following set of keys should be used where applicable:

FORMAT
Specifies how the data elements are stored:
BIN01
binary, little endian
BIN10
binary, big endian
ASCII
ascii
Other values are not allowed.
LAYOUT
Specifies the layout of the data elements:
LIST
a list (sequence) of entries, e.g. a corpus or a lexicon
MATRIX
a matrix or vector, e.g. feature vectors, samples, matrices
CUSTOM
some custom (usualy complex) layout; in practice this will almost always result in a non base-type value for the TYPE key as well
DATA
Specifies what kind of data the file contains. Typical values are:
SAMPLE
the file contains audio data (samples)
TRACK
the file contains feature vectors (frames)
CORPUS
the file is a corpus file
DICTIONARY
the file is a dictionary (lexicon) file
SEG
the file is a segmentation file
PARAM
some set of parameters
DES
some text file listing items
This is an open set, i.e. other values are allowed.
TYPE
Specifies the type of the data elements. Typically this is one of the base types (see datatype.c). If there is no fixed data type (e.g. a mixture of floats and integers), the value 'CUSTOM' should be used.
OBJECT
If set, this indicates that the complete file corresponds to a single object. Examples are:
NIY
future work
DIM1
The primary dimension of the stream of data. The primary dimension corresponds to:
  • the number of entries (lines) in a lexicon or corpus file
  • the number of segments (lines) in a segmentation file
  • the number of frames in a track or sample file
  • the number of vectors in a matrix; a vector consist of a consecutive list of data elements as stored in the file; whether the vectors are interpreted as row or column vectors is up to the routine that reads the data; the standard spr_print and spr_copy commands will interprete them as row vectors.
A (-1) value means "as many as there is data available", i.e. the application is supposed to process as many frames, entries, ... as it can read from the file.
DIM2
The secondary dimension of the stream of data. The secondary dimension corresponds to:
  • the number of entries in a segmentation file
  • the number of parameters per frame in a track or sample file
  • the vector length in a matrix
The secondary dimension cannot be (-1), except for segmentation files.
FSHIFT
The frame shift, i.e. each frame correspond to frame_shift seconds of data. The default value is 0.01 or 10 msec.
FOFFSET
The frame offset, i.e. the first frame starts at frame_offset seconds. Typical values are 0.0 (ESAT convention) and (frame_length-frame_shift)/2 (HTK convention). The ESAT convention requires that the first and last frames invent some left/right data in order to have frame_length seconds of data for these frames as well. The advantage is that (1) a file containing X seconds of data will contain floor(X/frame_shift) frames, and (2) the data for frame i (base 0) is located from i*frame_shift till (i+1)*frame_shift (invariant of the frame_length used).
NCHAN
The number of audio channels in the recording. The data for the different channels is stored in an interleaved fashion. For track files, this means that each vector of size DIM2 contains NCHAN feature vectors stored in an interleved fashion.
CHANLEN
The number of samples in a sample file (equals sample_frequency*duration). Note that the primary dimension (DIM1) is the number of frames when reading this using a frame_shift FSHIFT.
SAMPLEFREQ
The sample frequency of an audio file. The default value is 8000 (8kHz).
COMPRESS
When set, the value of this key can specifies a compression algorithm (e.g. gzip) and corresponding parameters that were used (reading data) or will be used (writing data) when the data (not the header) is written. See From physical storage to data content for more details on compression and other coders for data (and headers).

The key header format

A key header consist of:

A key-value pair typically consist of a single line containing the following elements:

The header may contain empty lines. The typical keys used in a key header are:

DATATYPE
Specifies the type of data stored. Typical values are:
SAMPLE
the file contains audio data (samples)
TRACK
the file contains feature vectors (frames)
SEG
the file is a segmentation file
PARAM
the file contains filter parameters, coefficients ...
DES
database description files
SEG
the file is a segmentation file
This is an open set, i.e. other values are allowed.
CORPUS
Replaces DATATYPE, indicating a corpus file.
DICTIONARY
May replace DATATYPE, or may be specified in conjunction with a "DATATYPE DES" key-value pair. Indicates a dictionary (lexicon) file.
DATAFORMAT
Specifies the type of the data elements. Must be one of the following base types:
FLOAT
a single precision (32 bits) float
DOUBLE
a double precision (64 bits) float
INT
a 32 bits integer, can be signed or unsigned
SHORT
a 16 bits integer, can be signed or unsigned
BYTE
an 8 bits integer, can be signed or unsigned
ALAW
a a-law value (8 bits unsigned integer)
MULAW
a u-law value (8 bits unsigned integer)
STRING
a string (sequence of characters)
NPARAM
The length of the feature vector in a track file (or sample file, assuming FSHIFT is known), or the length of the vectors (typically rows) in a parameter file.
NFR
The number of frames in a track file (or sample a file, assuming FSHIFT is known), or the number of vectors (typically rows) in a parameter file. A (-1) value means "as many as there is data available", i.e. the application is supposed to process as many frames, entries, ... as it can read from the file.
NCHAN
The number of audio channels in the recording. The data for the different channels is stored in an interleaved fashion. For track files, this means that each vector of size DIM2 contains NCHAN feature vectors stored in an interleved fashion.
CHANLEN
The number of samples in a sample file.
NENTRY
The number of entries in a segmenation, corpus or dictionary file.
NSEG
The number of segments (lines) in a segmenation file.
FSHIFT
The frame shift, i.e. each frame correspond to frame_shift seconds of data. The default value is 0.01 or 10 msec.
SAMPLEFREQ
The sample frequency of an audio file. The default value is 8000 (8kHz).
Note
When reading or writing a key header, the keys are automatically converted to/from the new SPRAAK format (see The standard keys).

Some examples of headers

TODO