Plugins

There are three types of pre-processing plugins for the input data, and two monitor plugins. A connection to a plugin can be added by dragging an arrow from the magenta square handle on the bottom side of an input layer, as depicted in the following figure:

Pre-Processing Plugins

There are four pre-processing plugins implemented, but others can be implemented by extending the org.joone.util.ConverterPlugIn class:


Centre on Zero

This plugin centres the entire data set around the zero axis by subtracting the average value.

Normalizer

This plugin can normalize an input data stream within a range determined by its min and max parameters.

Turning Points Extractor

This plugin extracts the turning points of a time series, generating a useful input signal for a neural net, emphasising the relative max and min of a time series (very useful to extract buy and sell instants for stock forecasting). Its minChangePercentage parameter indicates what the minimum change around a turning point should be to consider it a real change of direction of the time series. Setting this parameter to a relative high value helps to reject the noise of the input data.

Moving Average
This plugin calculates the moving average of a time series for a predefined interval of samples. Very useful to feed a neural network that must be trained to forecast a time series.
DeltaNormPlugin

This plugin serves to feed a network with the normalized 'delta' values of a time series. Used along with the TurningPointExtractor plugin is very useful to make time series predictions. 

ShufflePlugin

This plugin 'shuffles' the order of the input patterns at each epoch.

BinaryPlugin

This plugin is able to convert the input values to binary format


Every plugin has a common parameter named serie. This indicates what series (column) in a multicolumn input data is to be affected (0 = all series).

A plugin can be attached to an input layer, or to another plugin so that pre-processing modules can be cascaded.
If both centre on zero and normalize processing is required for an input stream, the centre on zero plugin can be connected to a file input layer, and then a normalizer plugin attached to this, as shown in the following figure:

Monitor Plugins

There are also two Monitor Plugins. These are useful for dynamically controlling the behaviour of control panel parameters (parameters contained in the org.joone.engine.Monitor object).

The Linear Annealing plugin changes the values of the learning rate (LR) and the momentum parameters linearly during training. The values vary from an initial value to a final value linearly, and the step is determined by the following formulas:

step = (FinalValue - InitValue) / numberOfEpochs
LR = LR – step

The Dynamic Annealing plugin controls the change of the learning rate based on the difference between the last two global error (E) values as follows:

If E(t) > E(t-1) then LR = LR * (1 - step/100%).
If E(t) <= E(t-1) then LR remains unchanged.

The ‘rate’ parameter indicates how many epochs occur between an annealing change. These plugins are useful to implement the annealing (hardening) of a neural network, changing the learning rate during the training process.

With the Linear Annealing plugin, the LR starts with a large value, allowing the network to quickly find a good minimum, and then the LR reduces permitting the found minimum to be fine tuned toward the best value, with little the risk of escaping from a good minimum by a large LR.

The Dynamic Annealing plugin is an enhancement to the Linear concept, reducing the LR only as required, when the global error of the neural net augments are larger (worse) than the previous step’s error. This may at first appear counter-intuitive, but it allows a good minimum to be found quickly and then helps to prevent its loss.

The Annealing Concept


annealing



To explain why the learning rate has to diminish as the error increases, look at the above figure:

All the weights of a network represent an error surface of n-dimensions (for simplicity, in the figure there are only two dimensions). To train a network means to modify the connection weights so as to find the best group of values that give the minimum error for certain input patterns.

In the above figure, the red ball represents the actual error. It ‘runs’ on the error surface during the training process, approaching the minimum error. Its velocity is proportionate to the value of the learning rate, so if this velocity is too high, the ball can overstep the absolute minimum and become trapped in a relative minimum.

To avoid this side effect, the velocity (learning rate) of the ball needs to be reduced as the error becomes worse (the grey ball).