main routines and types for handling MLP's More...

Data Structures
struct	SprMlpLayer
	a (named) layer in the MLP More...

struct	SprMlpArg
	extra arguments for the (non-linear) functions More...

struct	SprMlpFun
	the (non-linear) function on an arc More...

struct	SprMlpParam
	pointers to all params for the current arc More...

union	_Union2_MLP_MAIN_
	optional extra parameter(s) (constants) More...

union	_Union3_MLP_MAIN_

struct	SprMlpXFun

struct	SprMlpProc

union	_Union4_MLP_MAIN_
	forward evaluation of all relevant arcs More...

union	_Union5_MLP_MAIN_
	backward training of all relevant arcs More...

struct	SprMlpConnect
	a connection between two layers More...

struct	SprMlpNormList

struct	SprWgdParam
	weights \| partial_derivs \| time_av_moment2 \| time_av_moment1 More...

struct	SprCgParam

struct	SprMLP
	the main MLP structure More...

Macros
#define	spr_dt_nn_float_data

#define	SPR_MLP_MODIF_EVAL

#define	SPR_MLP_MODIF_TRAIN_X

#define	SPR_MLP_MODIF_TRAIN_GD

#define	SPR_MLP_MODIF_TRAIN_WGD

#define	SPR_MLP_MODIF_TRAIN_WGD2

#define	SPR_MLP_MODIF_TRAIN_SCG

#define	SPR_MLP_MODIF_TRAIN_WSCG

#define	SPR_MLP_MODIF_TRAIN_CG

#define	SPR_MLP_MODIF_TRAIN_WCG

#define	spr_mlp_check_training(mlp, flags)

Typedefs
typedef double	SprNNFloatCalc

typedef float	SprNNFloatData

typedef SprNNFloatCalc(*	_FuncPtr1_MLP_MAIN_ )(SprNNFloatCalc x, SprMlpParam p, const struct t_mlp_xfun op)

typedef SprNNFloatCalc(*	SprMlpNLF )(SprNNFloatCalc x, SprMlpParam p, const SprMlpXFun op)

typedef SprNNFloatCalc(*	SprMlpNorm )(SprNNFloatCalc dd, SprMLP *mlp)

Enumerations
enum	{ SPR_MLP_OPT_TRAIN, SPR_MLP_OPT_PROP, SPR_MLP_TREE_TRAIN, SPR_MLP_LAYER_TREE, SPR_MLP_CONNECT_SUM, SPR_MLP_CONNECT_PROD, SPR_MLP_LAYER_INIT0, SPR_MLP_LAYER_INIT1, SPR_MLP_LAYER_VSET, SPR_MLP_LAYER_USED, SPR_MLP_LAYER_MUSED, SPR_MLP_LAYER_PROP, SPR_MLP_OPT_CONST, SPR_MLP_OPT_BLOCK, SPR_MLP_OPT_TC_SET, SPR_MLP_OPT_PB_SET, SPR_MLP_CONNECT_BIAS }

enum	{ SPR_MLP_LEFT_CHILD, SPR_MLP_RIGHT_CHILD, SPR_MLP_SHIFT_CHILD }
	tree-structured evaluation More...

enum	{ SPR_MLP_TM_ANY, SPR_MLP_TM_WGT, SPR_MLP_TM_WGTP, SPR_MLP_TM_CG }

Functions
SprMLP *	spr_mlp_free (SprMLP *mlp)

int	spr_mlp_write (const char fname, SprMLP mlp, int write_dsc)

SprMLP *	spr_mlp_read (const char *fname)

void	spr_set_norm_const (SprNNFloatData norm_const, SprMlpConnect src)

int	spr_mlp_modif (SprMLP *mlp, int action,...)

SprNNFloatData *	spr_mlp_norm_output (SprMLP *mlp)
	Normalize the output (sum equals to 1.0). More...

Detailed Description

main routines and types for handling MLP's

The main routines and types for handling MLP's.

An MLP consist of different layers with some interconnection between them. Every connection (arc) between a node in the source layer and a node in the destination layer consist of a sequence of linear and non-linear functions. These sequences of functions are common to all arcs between the nodes of the two layers involved. The free parameters of the functions are however specific for every connection (with the exception of the simple 1-to-1 connection scheme between input and output which may also use shared parameters). Functions may have extra arguments which are specified between braces '()', seperated by commas. The allowed number of arguments, their type (float, integer or string) and the effect they have, depend on the function. In some cases, some of the immediate function arguments replace one or more of the free (arc specific) parameters.

The available linear and non-linear functions are:

scale: Multiply the input with a factor.
scale(a): Multiply the input with a constant a.
bias: Add an offset to the input.
bias(c): Add a constant c to the input.
poly1: Evaluate a*x+b
poly1(a,b): Evaluate a*x+b, a and b being constants
poly: Evaluate a polynomial in x; the first parameter is the order of the polynomial, followed by the polynomial coefficients ordered from high to low order (x^N ... x^0).
poly(p): Evaluate a polynomial of order p in x; the polynomial coefficients must be stored in high to low order (x^p ... x^0).
sigm(a=1): Evaluate a sigmoide (1/(1+exp(-a*x))), with an optional scaling of the input with a.
tanh(a=1): Evaluate a tangent hyperbolicus (tanh(a*x)), with an optional scaling of the input with a.
rop1(a=1,b=1,c=0): Evaluate f(x*a)*b+c with f() a ratio of first order polynomials designed to mimic the behavious of tanh() – f(x)=x/(|x|+1).
rop2(a=1,b=1,c=0): Evaluate f(x*a)*b+c with f() a ratio of second order polynomials designed to mimic the behavious of tanh() – f(x)=(x*|x|+x)/(|x|^2+|x|+1).
rop3(a=1,b=1,c=0): Evaluate f(x*a)*b+c with f() a ratio of third order polynomials designed to mimic the behavious of tanh() – f(x)=(x^3+x*|x|+1)/(|x|^3+|x|^2+|x|+1).
sign(a=0,b=1,c=-1): Function that outputs 1.0 if b*x >= a and outputs c otherwise; this function is not derivable and can thus not be trained!
lsigm(c=0): Evaluate -log(1+exp(x/(1+|x|*c))).
dist2: Evaluate (a*x+b)^2, (a,b) being the trainable parameters
dist2(a): Evaluate (a*x+b)^2, b being the trainable parameter
ndist2: Evaluate (a*x+b)^2-log(|a|+eps), (a,b) being the trainable parameters
pow(a=1): Evaluate a*sign(x)*|x|^|p| with a a constant and p the trainable parameter, i.e. raise x to a certain power with the sign of x being preserved.
pow(a,p,c=0): Evaluate a*sign(x)*((|x|+|c|)^p-|c|^p), with a, p and c constants
pow2(c=0,a=1): Evaluate a*x^2+c, a and c being constants.
exp(c=0,a=1,b=1): Evaluate exp(x*a)*b+c.
pae1(c=0,a=1,b=1): Evaluate f(x*a)*b+c with f() an approximaption of exp() using first order polynomials – xp=max(x,0), xn=max(-x,0), f(x)=xp+1/(xn+1).
pae2(c=0,a=1,b=1): Evaluate f(x*a)*b+c with f() an approximaption of exp() using second order polynomials – xp=max(x,0), xn=max(-x,0), f(x)=xp^2/2+xp+1/(xn^2/2+xn+1).
pae3(c=0,a=1,b=1): Evaluate f(x*a)*b+c with f() an approximaption of exp() using third order polynomials – xp=max(x,0), xn=max(-x,0), f(x)=xp^3/6+xp^2/2+xp+1/(xn^3/6+xn^2/2+xn+1).
abs(c=0,a=1): Evaluate a*|x|+c, a and c being constants.
clip(a=-1,b=1): Clip the input to ithe interval [a,b].
norm: Normalize a previous scaling operation so that one obtains an inner product of the input vector (layer) with a unit vector (the parameters).
merge(layer): Combine the values x and y read from the input layer and the layer called layer respectively into one output value as either a weighted sum (x+y*a) or as a product of both inputs after raising them to a certain power (x*sign(y)*|y|^|a|), with a a trainable parameter.
merge(layer,a,c=0): Combine the values x and y read from the input layer and the layer called layer respectively into one output value as either a weighted sum (x+y*a) or as a product of both inputs after raising them to a certain power (x*sign(y)*((|y|+|c|)^a-|c|^a)), a and c being constants.
weight(layer): Weight (factor or power) the input (x*w or sign(x)*|x|^|w|) with the weight w read from the layer called layer.
set_weight(c=0,a=1,b=1): Set the weights for the two children in a tree evaluation. Input values smaller than or equal to -<-a> are mapped to a weight of 1.0 and 0.0 for the left and right sub-tree respectively. Input values larger than or equal to +b are mapped to a weight of 0.0 (left sub-tree) and 1.0 (right sub-tree). An input value of 0.0 is mapped to an equal weight of 0.5 for both the left and right sub-tree. All other values in the range [-a,b] are mapped to intermediate values using a smooth and continous curve. The parameter c must be set to a value in the range ]-1.0,1.0] and controls the smoothness of the curve around the corner points -a and b. A value if 1.0 assures a smooth transition (sigmoid-alike curve). A value close to -1.0 give rise to a very fast step-alike transition from 0.5 to 1.0 around the two corner points. The left/right sub-tree is only evaluated of the corresponding weight is non-zero.
scaleR(r=1): Scaling, training with a regularisation cost of (0.5*w^2)*r.
biasR(r=1): Offset, training with a regularisation cost of (0.5*w^2)*r.
poly1R(r=1): Scaling+offset, training with a regularisation cost of (0.5*w^2)*r
expR1(r=1,c=0,a=1,b=1): Exponent with a regularisation cost on the output values y of (0.5*(y-c)^2)*r.
expR2(r=1,c=0,a=1,b=1): Exponent with a regularisation cost on the input values x of (0.5*x^2)*r.
sigmR1(r=1,a=1): Sigmoide with an optional scaling of the input with a, training with a regularisation cost of (.25-dsigm(x)/dx)*r.
tanhR1(r=1,a=1): Tanh with an optional scaling of the input with a, training with a regularisation cost of (1.0-dtanh(x)/dx)*r.
sigmR2(r=1,a=1): Sigmoide with an optional scaling of the input with a, training with a regularisation cost of r/(1+exp(256/x^2)).
tanhR2(r=1,a=1): Tanh with an optional scaling of the input with a, training with a regularisation cost of r/(1+exp(64/x^2)).
sigmR3(r=1,a=1): Sigmoide with an optional scaling of the input with a, training with a regularisation cost of (0.5*x^2)*r.
tanhR3(r=1,a=1): Tanh with an optional scaling of the input with a, training with a regularisation cost of (0.5*x^2)*r.

The available connection types are:

direct: A 1-to-1 connection. This connection type may also use shared parameters for all arcs.
norm: Divide all inputs by the 1, 2, ... inf norm of the inputs.
full: A full connection: every output node is connected with all input nodes. The results of all incomming connections are either added or multiplied.
reduced: A sparse connection between input and output layer. The individual connections are enumerated. See below for a description of the format.
tree: Identical to a full connection, only the evalutation order differs. A tree connection has a hierarchical order (binary tree) in which only one of the two descendents is evaluated, except when a point falls in a transition region. In this case, both descendents are evaluated. See below for a description of the format.

The MLP description file has the following structure:

[layers]
  Input         <nr_input_nodes>
  <layer_name>  <nr_nodes>
  Output        <nr_output_nodes>
[connections]
  <from>[+] <to> <sum/prod> <type> [ndx_file] <param_file> <functions>
  ...
[options]
  <options>

A layer thus has a unique name and a size (number of nodes). A connection is described with a source and destination layer, the combination operator (sum or prod), a connection type and a sequence of functions.
The optional '+' that may follow the name of the input layer indicates that one extra bias node must be added. For the 'sum' combination, the bias is the first node and has a value of 1. For the 'prod' combination, the bias is the last node with a value of 2.
The connection type has the following format:

<type>(<alt_opt>)

The <alt_opt> is optional and modifies the default behaviour of the connection type. The following connection types are available:

direct(shared): one-to-one connection, optionally the parameters are shared.
full(trans): full connection (each node to each node), optionally the parameters are stored in a transposed order (faster evaluation, conformant to the parameter layout for tree evaluation).
reduced(excl): reduced connections (each input node connects to a selected set of output nodes), the 'excl' flag should be set when each output node has only one incomming arc.
tree(<nsd>,<buf>): tree structured layer, optionally parts of the non selected sub-tree are evaluated also: the <nsd> option indicates that the non-selected sub-tree should be evaluated to a depth <nsd>, the <buf> option specifies a layer with weights for which any sub-tree with a non-zero weight will be evaluated.

The functions are described as follows:

<name>[<train_arg>](<extra_args>)

The train arguments <train_arg> are optional. They are specified between square brackets '[]' and consist of the letters:

T or C: Parameters that need to be (T)rained or parameters that are fixed (C)onstants.
P or B: To either (P)ropagate the error to the previous layer, or to (B)lock the error back propagation.

The extra arguments are specified between braces '()', seperated by commas. For the list allowed arguments per function, their type (float, integer or string) and the effect they have depend on the function, see above.

The index file that specifies the reduced connectivity consists of the concattenation of (-1) terminated arrays (of the type I32) listing the set of outputs for each input. For example, the following indices

[0 2 -1 ...
 1 2 -1
 3 -1]

describe the connections of a layer that transforms 3 inputs into 4 outputs with the following connectivity:

[1 0 0 1
 0 1 0 1
 0 0 1 0]

The binary tree (tree connection type) has the following properties:

node 0 is the root node
left child nodes immediately follow the parent node
right child nodes must follow (i.e. cannot precede) the parent node

The tree structure is stored as a two valued (of the type I32) tuple per node. The first value contains the (right) child information:

<has_left_child>*1 + <has_right_child>*2 + <offset_to_right_child>*4

with offset_to_right_child equal to 0 if the node does not have a right child. The second value contains the parent information:

<is_left_child>*1 + <is_right_child>*2 + <ndx_of_parent_node_base0>*4

with ndx_of_parent_node_base0, is_left_child and is_right_child equal to -1, 0 and 0 respectively for the root node.

Date: Jan 1999

Author: Kris Demuynck

Revision History:

XX/01/1999 - KD: Creation
13/04/2010 - KD: added to SPRAAK
01/10/2012 - KD: clean-up, documentation, added new functions

See Also: mlp_eval.c and mlp_train.c

TODO

check: bias + scale[C]/[B] ==> merge, which B/C flags

Data Structures

Macros

Typedefs

Enumerations

Functions

Detailed Description