|
Architecture and Training
A neural network is composed of individual, locally-connected
units termed neurons. Typically these sum up the effects of their
respective input connections, weigh them according to their own
fashion and transform this weighted sum with a non-linear function.
The latter function is often termed activation function, in analogy
with the biological neuron.
Connecting
several layers in succession is a great idea. One can show that
a network with only two layers of adaptive weights suffices to
model any function, given 'enough' neurons. This is just theory,
however, and one should note that it only grants the existence
of such a network solution - not the ability to actually find
it! That requires some kind of learning procedure.
Changing
the network parameters is termed network training. This procedure
is also referred to as weight adaption, or in short 'learning',
since that is really what's going on. One can think of learning
as attempting to store data in a way that allows generalisation.
Training
of a network can be done by most types of standard, non-linear
optimisation algorithms such as gradient descent or BFGS2. To
understand this, picture the network parameters as latitude and
longitude in a large, and often insanely multi-dimensional, rolling
landscape, where the altitude represents how far from the desired
answer the output is. The optimisation algorithm then strolls
along on the surface trying to find as low a valley as possible.
A
very neat feature developed in this context is 'error-backpropagation'.
This solves the problem of assigning the blame for bad prediction
to individual neurons (aka the credit assignment problem). Neurons
are very local creatures, remember, but using differentiable non-linearities
means that we can use the chain-rule 3 to determine who did what
to the final result. In the landscape analogy this corresponds
to computing how steep the terrain around the current location
is.
|