Loading…
Training dynamics and neural network performance
We use an analysis of a simple model of recurrent network dynamics to gain qualitative insights into the training dynamics of feedforward multilayer perceptrons (MLPs) used for classification. These insights suggest changes to the training methods used for MLPs that improve network performance signi...
Saved in:
Published in: | Neural networks 1997-07, Vol.10 (5), p.907-923 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We use an analysis of a simple model of recurrent network dynamics to gain qualitative insights into the training dynamics of feedforward multilayer perceptrons (MLPs) used for classification. These insights suggest changes to the training methods used for MLPs that improve network performance significantly. In previous work, the probabilistic neural network (PNN) was shown to provide better zero-reject error performance on character and fingerprint classification problems than radial basis function and MLP-based neural network methods. We will show that performance equal to or better than PNN can be achieved with a single three-layer MLP by making fundamental changes in the network optimization strategy. These changes are: 1) use of neuron activation functions, which reduce the probability of singular Jacobians; 2) use of successive regularization to constrain the volume of the minimized weight space; 3) use of Boltzmann pruning to constrain the dimension of the weight space; 4) use of Prior class probabilities to normalize all error calculations, so that statistically significant samples of rare but important classes can be included without distorting the error surface. All four of these changes are made in the inner loop of a conjugate gradient optimization iteration and are intended to simplify the training dynamics of the optimization. On handprinted digits and fingerprint classification problems these modifications improve error-reject performance by factors between 2 and 4, and reduce network size by 40-60%. Copyright 1997 Elsevier Science Ltd. |
---|---|
ISSN: | 0893-6080 1879-2782 |
DOI: | 10.1016/S0893-6080(96)00119-0 |