Loading…

Training dynamics and neural network performance

We use an analysis of a simple model of recurrent network dynamics to gain qualitative insights into the training dynamics of feedforward multilayer perceptrons (MLPs) used for classification. These insights suggest changes to the training methods used for MLPs that improve network performance signi...

Full description

Saved in:
Bibliographic Details
Published in:Neural networks 1997-07, Vol.10 (5), p.907-923
Main Authors: WILSON, C. L, BLUE, J. L, OMIDVAR, O. M
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We use an analysis of a simple model of recurrent network dynamics to gain qualitative insights into the training dynamics of feedforward multilayer perceptrons (MLPs) used for classification. These insights suggest changes to the training methods used for MLPs that improve network performance significantly. In previous work, the probabilistic neural network (PNN) was shown to provide better zero-reject error performance on character and fingerprint classification problems than radial basis function and MLP-based neural network methods. We will show that performance equal to or better than PNN can be achieved with a single three-layer MLP by making fundamental changes in the network optimization strategy. These changes are: 1) use of neuron activation functions, which reduce the probability of singular Jacobians; 2) use of successive regularization to constrain the volume of the minimized weight space; 3) use of Boltzmann pruning to constrain the dimension of the weight space; 4) use of Prior class probabilities to normalize all error calculations, so that statistically significant samples of rare but important classes can be included without distorting the error surface. All four of these changes are made in the inner loop of a conjugate gradient optimization iteration and are intended to simplify the training dynamics of the optimization. On handprinted digits and fingerprint classification problems these modifications improve error-reject performance by factors between 2 and 4, and reduce network size by 40-60%. Copyright 1997 Elsevier Science Ltd.
ISSN:0893-6080
1879-2782
DOI:10.1016/S0893-6080(96)00119-0