TriuneCadence/latex/integration_and_hybridizati...

2692 lines
92 KiB
TeX

\documentclass[12pt]{report}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{margin=1in}
\title{Integration and Hybridization in Neural Network Modelling}
\author{Wesley Royce Elsberry}
\date{August 1989}
\begin{document}
\begin{titlepage}
\centering
{\Large Integration and Hybridization in Neural Network Modelling\par}
\vspace{1.5cm}
{\large Wesley Royce Elsberry\par}
\vspace{1cm}
Presented to the Faculty of the Graduate School of\par
The University of Texas at Arlington in Partial Fulfillment\par
of the Requirements for the Degree of\par
\vspace{0.5cm}
{\large Master of Science in Computer Science\par}
\vfill
The University of Texas at Arlington\par
August 1989\par
\end{titlepage}
\chapter*{Acknowledgements}
I wish to thank the many people who have made my graduate program a
rewarding, enlightening, and interesting experience. My especial thanks
to Bob Weems, Ken Youngers, Vijay Raj, Steve Hufnagel, and Farhad Kamangar
for their exemplary instruction. The advice and encouragement of Bill
Buckles was critically important to this refugee from the life sciences.
The members of my graduate committee, Karan Briggs, Lynn Peterson, and
Daniel Levine, have provided me with technical resources, instruction,
referrals, and general good advice in plenty. Sam Leven provided the
example problem description, the classical sequence data, and much of his
own time and expertise to aid me in developing both the simulation program
and my understanding of the field. I am indebted to Dr. Levine for his
great interest in teaching the principles of cognitive modelling to as
wide an audience as possible, for without his express encouragement I
would not have become acquainted with the field, and also for his
considerable personal assistance in developing this thesis. The
enthusiasm of Harold Szu helped to motivate me to undertake a deeper
inquiry into neural network modelling. Finally, without the continual
support of my spouse, Diane Blackwood, this thesis and the classroom work
which formed the basis for it would not have been possible.
July 28, 1989
\chapter*{Abstract}
Artificial neural network models derived from different biological
behaviors or functions can be used in an integrative fashion to create an
extensive problem-solving environment. An example problem of limited
melodic composition is approached by the use of Hopfield-Tank, back-
propagation, and Adaptive Resonance Theory networks serving as plausible
next note generator, musical sequence critic, and novelty detector,
respectively. Biological bases for integrative function are discussed,
and the experimental role of synthetic systems such as the example
integrated network is explored.
\tableofcontents
\chapter{Defining The Role And Nature Of Artificial Neural Network Modelling}
Cultural Bias and Its Application to Cognitive Inquiry
Plato believed that there existed a pure world of ideas, whose
perfect forms were poorly mimicked by copies in the reality apparent to
our senses. This concept of the dominant, lofty nature of ideas and the
thoughts that manipulated those ideas would do more to inhibit inquiry
into material processes than virtually any other single cause in the
succeeding two millennia. The legacy of this philosophical outlook still
permeates our culture, coloring the basic assumptions and viewpoints of
researchers even after extended scientific training.
The firmness of general belief in the separate existence of ideas
contributed to the Lamarckian hypothesis of the inheritance of acquired
characters. While there were critics of Jean Baptiste Lamarck's work from
its introduction, these were by no means an overwhelming group by numbers.
The rediscovery of Gregor Mendel's work in 1905 provided clear and
convincing evidence against Lamarck's hypothesis, yet even Watson and
Crick's elucidation and characterization of DNA failed to dispel the last
vestiges of belief in the inheritance of acquired characters. The ready
and continued acceptance of the Lamarckian hypothesis into the mid-
twentieth century gives an indication of the continued influence of
Plato's world of ideas, even when confronted with contradicting evidence
and convincing counterarguments.
Biology, however, was by no means the only scientific discipline to
be touched by the cultural bias of Platonic ideals. Physics experienced
great upheaval and internal dissent as the deterministic view of Newton
and Laplace gave way to the quirkiness and Heisenbergian uncertainty of
quantum mechanics (Hawking 1988).
In thinking about thinking, researchers have tended to denigrate
approaches dependent upon investigation of physical processes. The high
regard given ideas and thought processes has nearly always provided
Western observers with a sense of revulsion at even considering that such
sacred items could be solely the product of soft, squishy brain tissues
and their component parts, neurons. This could be considered somewhat
akin to Roquentin's nausea upon consideration of radical contingency
(Ferguson 1987).
Our culture does not encourage modes of inquiry that tend to displace
the traditional view of the mind as the only known organ with which the
perfect world of ideas can be perceived. It especially does not encourage
the denial of the Platonic ideal world, yet it has been quite some time
since any reputable scientist has explicitly advanced support for that
concept.
Cognitive science, however, must deal explicitly and forthrightly
with the subject of ideas and thoughts. In a field where cultural bias
has, can, and will directly effect the discussion of the subject at hand,
it pays to recognize the existence and probable magnitude of that bias. I
have briefly delineated the persistence and magnitude of our Platonic bias
in biology and physics; I will assert that the bias is greater in
cognitive science, where it may be more directly challenged.
Artificial Intelligence: Definitions
Artificial intelligence describes at once a sub-discipline of
cognitive science and the goal of that sub-discipline. The goal is the
description and production of an artificial system or systems that can be
described as intelligent in operation, function, or effect, in both global
and local contexts. As with most complex fields, there are several ways
in which to approach direct inquiry into the topic. In what has been
termed the "top-down modelling school" of artificial intelligence, the
emphasis has been upon systems of formal logic and other explicit symbol
manipulation techniques. As may have been apparent from the previous
description, there is a "bottom-up modelling school" of artificial
intelligence. This school of inquiry seeks to examine the basic processes
that are known to produce intelligent action in humans, those basic
processes of neural function and coordination.
By extension, the bottom-up modelling school is concerned with neural
function in general. This is considered necessary because of the great
complexity of biological neural systems. While much research has been
done and continues to be done, there exists no basic understanding of the
detailed structure and operation of biological neural systems. There is a
wealth of data, but a paucity of organizing principles. The bottom-up
modelling school attempts to provide possible organizing principles,
testing these through a process of modelling and incremental design
improvements. This area of research has become known as artificial neural
network modelling.
The definitions given above differ somewhat from what may be
considered standard in the artificial intelligence community. In their
Turing Award speech, Newell and Simon state:
The notion of physical symbol system had taken essentially its present form by the middle of the 1950's, and one can date from that time the growth of artificial intelligence as a coherent subfield of computer science. The twenty years of work since then has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence, attempting to construct and test specific systems that have such a capability. The second kind of evidence addresses itself to the necessity of having a physical symbol system wherever intelligence is exhibited. It starts with Man, the intelligent system best known to us, and attempts to discover whether his cognitive activity can be explained as the working of a physical symbol system. . . . The first is generally called artificial intelligence, the second, research in cognitive psychology. (Newell and Simon 1976)
This formulation of definitions seems at once too narrow and not properly
descriptive. It is too narrow in that it limits severely the range of
activities which may be considered artificial intelligence research.
While Newell and Simon talk of physical symbol systems, there is the
strong implication that they refer only to computational methods of
explicit symbol manipulation, as in the use of languages such as LISP and
Prolog. The definition of the second part is too narrow, in that it
postulates no overlap between the fields of artificial intelligence and
cognitive psychology, and also provides no linkage for concepts of
operation derived from empirical studies of biological intelligent systems
to be incorporated into an artificial intelligence framework. As stated
later by Newell and Simon,
The symbol system hypothesis implies that the symbolic behavior of man arises because he has the characteristics of a physical symbol system. Hence, the results of efforts to model human behavior
with symbol systems become an important part of the evidence for the hypothesis, and research in artificial intelligence goes on in close collaboration with research in information processing psychology, as it is usually called.
This seems to indicate that the previously given definitions were not
truly descriptive of the relationship between artificial intelligence and
other components of cognitive science. By including artificial
intelligence as a sub-discipline of cognitive science, it becomes clear
that artificial intelligence is not disjoint from research into
considerations of what constitutes intelligence.
An alternative formulation for defining artificial intelligence in
more general terms is given by Charniak and McDermott (1985), where they
state that artificial intelligence "is the study of mental faculties
through the use of computational models." This form of definition allows
for the top-down and bottom-up approaches to artificial intelligence to be
given co-equal rank as complementary research disciplines.
Artificial Neural Network Modelling
Artificial neural network modelling, the bottom-up school of
artificial intelligence, derives from the use of biological nervous
systems as suitable exemplars for an approach to coordination, cognition,
and control in artificial systems. This approach, in one form or another,
has been with us for a long time.
In the 1940's, McCulloch and Pitts demonstrated the possibility of
casting boolean logic systems into networks of thresholded logic units.
This certainly fits the Newell-Simon definition of a physical symbol
system. McCulloch and Pitts' seminal paper and subsequent work introduced
a new concept for consideration: rather than driving the understanding of
biological cognitive processes through increasing sophistication of
technology, perhaps technology can be made more capable and sophisticated
by elucidating mechanisms of function from biological processes involved
in cognition (Levine 1983, 1990).
This differs qualitatively from the viewpoint common to many
observers of human intelligence. There has been a tendency for people to
compare the operation of thought processes with the current leading edge
of technology. Freud compared the mind to a steam engine, Twain would
speak of "mill-works" in relation to speech production, and various
persons in this century have blithely and confidently settled upon the
computer as a fully analogous system. A circular system can be noted
here, though, as the image of the mind as being like some mechanism helped
contribute to such endeavors as Pascal's calculator, Babbage's Analytical
Engine, and the Hollerith punched card controlled census tabulator. These
efforts, in turn, inspired the modern computer. (The influence of the two
World Wars upon the motivation for the development of the computer must
also be acknowledged.) However, as the Metroplex Study Group on
Computational Neuroscience (1988) points out,
Computational study of brain structures began in the 1940's when digital computers were emerging. The first computer architectures were developed in a collaboration between the mathematicians John von Neuman and Norbert Wiener working closely with physiologists like Warren McCulloch. Early computer scientists were captivated by the analogy between the all-or-none action potentials of the neuron and binary switches representing bits in computers. Models of biological computation thus had a seminal role in the design of modern computer architecture.
Thus, while comparisons of the workings of the mind to technology helped
to motivate research into further technological advances, the insight into
the successful design for a significant new technology, computers, came
instead from paying attention to the actual mechanisms of brain function.
The computer has come to be pressed into service as a "new" metaphor
for thought processes in biological systems. While the use of
technological metaphors for cognition is persistent, having produced
quaint turns of speech and aphorisms, it has not been significantly
conducive to a direct understanding of the actual cognitive processes
being described. The turnabout analogy produced by McCulloch and Pitts
has provided far more benefits to cognitive science and computing through
its establishment of neural modelling. Binary switching technology failed
to provide significant insight into biological neural activity and
function. Leon Harmon (1970) noted that "[a]nother kind of difficulty in
coming to an understanding of nervous systems is that we may be
conditioned into thinking about them in ways that are more constrained
that we like to admit or are sufficiently aware of." The cultural
identity which we share through environment can be a handicap in research
into the nature and mechanisms of our intelligence.
The model networks of McCulloch and Pitts, an example of which is
shown in Figure 1, were seen to provide useful mechanisms and insights for
computing machinery. These networks consist of threshold logic units
receiving excitatory and inhibitory inputs of integer value. A simple sum
of these values is compared with a threshold level for the unit, and the
unit is considered to "fire" if its inputs equal or exceed the threshold
value. A unit's activity is then carried forward to further connected
units in the system. McCulloch and Pitts (1943) proved that any boolean
Figure 1, Mc-P
function could be created by the appropriate combination of such units.
The McCulloch-Pitts formalism is quite similar to principles of
design using TTL logic, as diagrams can attest. It should be noted that
McCulloch and Pitts described their formalism in 1943, some sixteen years
before the invention of the integrated circuit. While significant
problems reduce the overall utility of McCulloch-Pitts networks, modified
forms continue to be advanced (Szu 1989).
Given the basis of a neural computational framework, additional
features extracted from biological research were gradually added to
proposed models. The idea that inputs to neurons had graded values
modified by the efficiency of synaptic connections led to several
important advances. Hebb proposed a synaptic modification rule to allow a
form of learning in randomly connected networks (Hebb 1949, Levine 1983).
His formulation was that the efficiency of a synapse which connects two
neurons increases if both of the neurons have high activities at the same
time. This relatively simple rule has a number of drawbacks, which have
since been extensively cataloged and discussed. However, it cannot be
denied that Hebb's Law, as it has been called, was an important advance
for modelling.
The idea of stating an organizational principle for network design
also represented an important advance for artificial neural network
modelling. Rosenblatt's Perceptron architectures demonstrate this well.
Rosenblatt postulated several network models, among these was a three-
layer network. The layer which received external inputs was the sensory
layer, and the layer which gave an output (either as a raw signal or
interpreted as a motor output) was termed the response layer. An
associative layer of neurons provided the articulation between sensory and
response layers (Levine 1983). This general method of organization helped
move artificial neural network modelling away from specifically designed
networks and randomly organized network models.
The drawbacks associated with the special design of network models
remain a concern for ANN modellers today. Such systems are fragile or
brittle, meaning that a fault in any part of the network could cause a
general failure of function. Also, if an error in design occurred, the
resulting network would have no method of overcoming the design fault.
This particular pitfall is known by the assumption necessary for correct
function to be achieved: "programmer omniscience." Since programmer
omniscience cannot be guaranteed, systems predicated on this principle are
inherently unreliable (Hecht-Nielsen 1986).
Designing with generality as an important principle of function
results in artificial neural networks that are described as self-
organizing. A self-organizing network will conform its function to the
particular problem or context through a process of adaptation or learning.
Current Models Narrowly Focussed
The tendency in developing an artificial neural network model is to
constrain its function to a well-defined domain. This design principle
enables better control over evaluation of the functionality of the design.
Data from neurological or behavioral studies dealing with the problem
domain can then often be directly applied to either training or evaluating
the model.
There exist many general functions that are commonly encountered in
biological neural networks which can be considered to have important
implications for computational study. Among these are included
associative memory, classification, pattern recognition, and function
mapping. ANN models proposed to implement these functions include Bi-
directional Associative Memory, Adaptive Resonance Theory
(classification), Brain-State-in-a-Box (classification), Neocognitron
(invariant pattern recognition), and Back-Propagation (function mapping)
(Simpson 1988).
Unfortunately, the underlying reason for existence of each model is
overlooked in comparing different models. The human tendency to wish to
attribute a single scalar quantity denoting how "good" an ANN model is
will often cause people to overlook the fact that certain networks should
not be compared as being equivalent. Lippmann provides a good basic
overview of six different ANN models, yet falls prey to this pitfall.
Each of the models is interpreted as a classification network, yet only
the Hamming and ART networks are designed to function as classifiers
(Lippmann 1987).
Multi-architecture Integrative Systems
The use of multiple complementary architectures in designing
application systems has not often been explored. While this approach to
system design is commonly touted in ANN simulation aid advertisements, it
is less frequently featured in the literature. In an advertisement for
the ANZA Neurocomputer, HNC Corporation states, "In these networks the
interconnect geometry is already determined and the form of the transfer
equations is fixed. However, the number of processing elements (neurons),
their initial state and weight values, learning rates, and time constants
are all user selectable, thus allowing one to customize a particular
network paradigm (or combination of paradigms) to suit a particular
application." There are some reasons for the relative lack of multi-
paradigm (multi-architecture) research results in publication.
The design principles stated earlier still hold, that it is easiest
to design an architecture with a narrow definition of function. There is
the tendency for such architectures to incorporate simplifying assumptions
which aid in simulation or real-world implementation ("casting in
silicon," as the phrase goes). The ease of verifying correct operation is
increased for architectures which have a narrowly defined application, as
relatively clean data or simple theoretical findings are likely to be
available against which the architecture may be evaluated.
The relative complexity of modelling for even constrained contexts
provides another reason for concentration upon single paradigm systems.
The number of parameters which can effect a model's performance ranges
from zero for linear systems, such as Widrow's ADALINE (Widrow 1987), to
tens of scalar parameters, as may be encountered in Carpenter and
Grossberg's Adaptive Resonance Theory architecture (Carpenter and
Grossberg 1987a, 1987b; Simpson 1988). When dealing with nonlinear
dynamical systems for which no closed-form solution exists, slight changes
of system parameters can result in large scale changes in behavior.
Carpenter and Grossberg point out relative constraints upon the parameters
used in the ART 1 architecture, giving guidelines for values which should
yield stable operation of the network. Finding suitable parameters for
operation of a particular architecture can be a frustrating and time-
intensive experience.
As an example, an architecture called the "on-center, off-surround"
network (OCOS) is based upon a relatively simple equation that appears in
slightly modified form in many competitive networks. One form of this
equation (Grossberg 1973) is:
dx/dt = -A Xi [first term]
+ (B-Xi) (Ii+f(Xi)) [second term]
- (Xi+C) (Sum from k = 1 to n of ((k<>i) (Jk+f(Xk))) [third term]
(Eq. 1)
The first term represents decay of activity over time; the second
term represents increase in activity due to excitatory input values, Ii,
and recurrent self-excitation; the third term represents decrease in
activity due to inhibitory input values, Jk and competition from other
nodes in the network. For each term, there is an associated parameter. A
hidden parameter of the architecture is a factor by which to multiply the
result of the difference equation. This is usually set to be much less
than one, especially when using the simple forward Euler method of system
updating. However, there is a performance trade-off involved with setting
this factor too small: the system will converge to a stable result, but
will take a long time to converge. If the factor is set to a large value
the network will "blow-up," which is a state of wild fluctuation in
activation values for the nodes in the network. Such a system cannot
converge. The solution is to find a value for this factor which will
ensure convergence without an inordinate amount of time taken in reaching
convergence. The explicit parameters given in the equation must be
matched to the size of the other parameters to maintain set relationships.
In order to find parameter values, I have used a network simulation which
runs through permutations of sets of discrete parameter values. By
examining the resulting output which indicated whether the network run
achieved convergence and the number of time steps necessary to achieve
convergence, I was able to settle upon a suitable choice of parameters for
regular use and was able to decipher a coarse set of relationships between
the parameters.
In consideration of these complexities in modelling and simulation,
synthesis of complex systems from simpler subunits has some significant
obstacles to overcome. There are more degrees of freedom for system
operation, leading to complexities in articulation and coordination of
subunits. This may increase the necessary number of system parameters,
increasing complexity in a combinatorial fashion. Usually, there will
either be less data available for evaluation of the complete system
behavior, or the data that is available will be uncertain. The general
principles to apply to a synthetic endeavor in artificial neural network
modelling remain to be elucidated.
In addition, synthesis has traditionally been given short shrift in
Western culture (Paris 1989). The application of synthesis in philosophy
has largely been confined to the works of Wittgenstein and Marx. Marx, of
course, developed a philosophy whose application was inimical to most
Western economic structures. This has caused both his work and his
approach to be considered with contempt.
On the positive side, however, synthesis can lead to significant
insights as different assumptions and features can be applied to
functional design. The process of integration and hybridization in
artificial neural network modelling may be expected to lead to new
insights into structure and function of benefit to artificially
intelligent systems, given the past history of benefits derived from
consideration of concepts borrowed from disparate disciplines. In this
case, the complexity of knowledge or topic of a contributing discipline
can provide a measure of the possible beneficial overlap: as complexity
increases, the set of concepts which may profitably be applied to the new
context also increases. As this rule would predict, biological nervous
systems provide a very large pool of concepts ready for reconstruction in
the artificial intelligence framework.
Biological Neural System Complexity
The complexity which drives that expectation can be readily
appreciated, as in the human nervous system the number of neurons is in
the billions, and each neuron will have connection to between tens and
tens of thousands of other neurons. This physio-spatial complexity pales
when compared to the complexity that can result when one attempts to
define a state for the neuron. The state can be considered to be a
combination of electrical activity, ionic balance, hormonal levels, input
activities, and many other factors. Since for several of these attributes
there exist separate contributing factors, the number of components
contributing to neuron state can easily exceed twenty, and there may be
hundreds of such components. Add to this that these components often have
analog values, separate time scales of action, and capacity for permanent
change in the neuron's response, and the possible state function for a
single neuron displays a suitably bewildering complexity.
Since this level of complexity is not conducive to direct
examination, the process of problem decomposition and solution is applied.
The implication for artificial neural network modelling is that certain
gross features are modelled. Typically, a model assumes that the major
interesting component of neural function is the electrical activity of the
contributing neurons or nodes. A further assumption in general currency
is that the electrical activity of an ANN may be modeled with the elements
or nodes responding in the manner expected of neural populations rather
than individually thresholded neurons (Levine 1983). Some models take
into account generalized neurotransmitter effects, with Grossberg's gated
dipole model providing a good example (Grossberg 1972). These simplifying
assumptions still leave a high level of complexity to be dealt with in
developing ANN designs and applications.
Relevance of Biological Models
Biology provides the exemplars for self-organizing adaptive systems
which have proven useful in artificial neural network modelling thus far.
While strict adherence to biological accuracy in modelling may well be
counter-productive to advances in modelling techniques, rejecting
biological principles may then lead to difficulties in later integrative
work.
The biological framework of neurophysiological function provides a
basic structure which makes for a common ground of interaction in models.
For example, the range of activation and output of neurons is rather
strictly proscribed, while the range of activation of an analogous unit in
neural network models is limited only by the capability of the processor
in specifying a floating point exponent. This would make interfacing two
models using different ranges of activation less straightforward.
The inherent complexity of biological systems can support two
different arguments concerning the future of modelling. As Hawking (1988)
says, describing a complex system all at once is terrifically difficult.
It is much simpler to model several different components of the overall
system. Bringing these partial models together to form a complete model
can become problematic, as in Hawking's example of the search for the
Grand Unified Theory in physics. In one sense, then, the effort to
integrate models of limited function may be premature. However, it may be
that along with the incremental advances made in modelling small subunits
of biological neural function, we should also attempt incremental
integration of well-understood low level models. This would help prevent
the kind of situation which exists in physics, with two or more highly
disjoint models prevailing and no unification or integration yet in sight.
Since biological systems are inherently more complex than the physical
systems upon which they are based, it becomes important to keep an eye on
the eventual need for integration and resolution of subsidiary models.
Models which utilize features from two or more extant ANN
architectures such as Hecht-Nielsen's Counterpropagation Network (Hecht-
Nielsen 1987) demonstrate the useful qualities which synthesis of models
can bring to functionality. The counterpropagation network is derived
from Kohonen's self-organizing map model and Grossberg's competitive
learning networks. Hecht-Nielsen notes that this network is designed for
function mapping and analyzes its performance as compared to the back-
propagation architecture. While the general functionality of counter-
propagation networks (CPN) remains lower than that of back-propagation
networks, there exists a subclass of mapping functions for which the CPN
will train faster, and there also exists a closed-form solution for the
error of the CPN. As Hecht-Nielsen notes, "Finally, CPN illustrates a key
point about neurocomputing system design. Namely, that many of the
existing network paradigms can be viewed as building block components that
can be assembled into new configurations that offer different information
processing capabilities."
Artificial Neural Network Models: Three Architectures
For the purposes of exploring multi-architecture system design, I
selected three artificial neural network architectures for incorporation
into the overall system. These were the Hopfield-Tank network, back-
propagation, and Adaptive Resonance Theory 1 models. These networks
perform three different functions: Hopfield-Tank is used for optimization
or constraint satisfaction, back-propagation is a general-purpose mapping
network, and Adaptive Resonance Theory 1 is a classifier network.
Hopfield-Tank Networks
In 1982, John Hopfield wrote an article, "Neural networks and
physical systems with emergent collective computational abilities," which
was published in Proceedings of the National Academy of Sciences,
describing a model of neural computation which was readily implementable
in current solid-state technology. This article and the model which it
described has been widely credited with a resurgence of interest in ANN
modelling.
The network architecture presented by Hopfield and Tank (1985),
hereafter known as HTN for "Hopfield-Tank Network[s]", is a single-layer
fully interconnected network model (Figure 2). There is no learning rule
for this network, although various researchers have proposed modified HTN
architectures that do incorporate adaptive learning. Weights between
nodes in an HTN are fixed and symmetric, and connections between a node
and itself are zero. The advantages of these criteria for network design
are that the system dynamics can be shown to perform an "energy
minimization" in reaching a stable state. When the weights are determined
according to system constraints, the system can be characterized by a
Liapunov function, providing a measure for system energy.
HTN's have been applied to various constraint satisfaction and
optimization problems. Hopfield and Tank attracted much attention by
demonstrating the utility of an HTN in generating good solutions to the
"Traveling Salesman Problem" (or TSP), a non-polynomial time complete
problem. The TSP can be described as choosing a minimum length path among
Figure 2
a set of cities such that each city is visited once, and the salesman
returns to the city of origin. The closed path length constitutes the
measure for any particular solution. As the number of cities increases,
the number of possible valid tours increases combinatorially. However, it
can be shown that an HTN will produce good tours in constant time. An HTN
used for TSP computation does not necessarily converge upon the optimum
solution, but will reject non-optimum solutions and give relatively "good"
solutions. Unfortunately, it can also be shown that the HTN trades off
constant time operation for O(n2) space considerations.
Figure 2 shows the HTN architecture for solution of the TSP.
Although the nodes in the network are treated as elements in a vector, the
visual representation which makes the most sense to the designer and other
interested humans is a matrix of nodes. By imposing this structure upon
our view of the network, we can associate each row with a city in the
tour, and each column with the position of a city in the final tour (City
n is in Position m in the tour). The constraints of importance are that
for a valid tour, each city appears once, and each position in the tour
has only one city. Thus, for the state of the network at equilibrium, we
should find high activities in n nodes, where n is equal to the number of
cities or positions. The interconnections between nodes in a row or
column are inhibitory, causing highly active nodes to reduce the activity
of other nodes in the same row or column. The interconnections from a
node to its neighbors in adjacent columns are proportionally more
inhibitory as distance between cities represented by those nodes
increases.
Hopfield-Tank Equation
The equation defining the network's activity over time (Hopfield and
Tank 1985) is
Ci (dui/dt) = (Sum from j = 1 to N of (TijVj - ui/Ri + Ii)) (Eq. 2)
where i and j designate neurons in the network,
N is the total number of neurons in the network,
T is a connection weight between neurons,
V is the output value function for a neuron,
u is the activity of a neuron,
I is the external input for a neuron,
C is the capacitance of a neuron,
and R is the resistance of a neuron.
Back-propagation Networks
Back-propagation is a short version of "correction by the backward
propagation of errors." The learning rule used in back-propagation
networks (BPNs) is termed the generalized delta rule (Simpson 1988;
Rumelhart, Hinton, and Williams 1986). Basically, a BPN is a multi-layer
(with at least three layers) network whose nodes use a sigmoid output
function. The BPN will map a set of input activities to another set of
output activities, given training upon a set of example input/expected
output vector pairs. The BPN will generally discriminate and adapt to
non-linear relationships in the training data. For example, a BPN can
learn the exclusive-or relation, which a single-layer perceptron cannot.
Nodes in the BPN are often called "units."
The basic premise of the back-propagation algorithm has been
independently derived several times. Werbos (1974) is generally credited
with the first publication of the learning rule, which he called dynamic
feedback; Parker (1985) gave the rule the name, "learning logic;" and the
popularity of the BPN architecture is primarily attributable to Rumelhart,
Hinton, and Williams (1986), as related by Simpson (1988).
All units in the BPN operate in basically the same manner. There are
some slight differences dependent upon whether a unit resides in the
input, hidden, or output layer. Generally, however, a unit generates an
output signal as follows:
oj = f(netj) (Eq. 3)
where j is a unit in the BPN,
net represents the input to a unit,
f is a sigmoid function,
and o is the output of a unit.
netj = (Sum over i of (oiwij)) (Eq. 4)
where i is a unit in a preceding layer of the BPN,
and w is a connection weight linking two units.
f(netj) = 1 / (1 + e(-(netj + thetaj))) (Eq. 5)
where theta is the bias weight for the unit.
An input unit has a net input which is simply the provided external input,
as there are no preceding layers.
Figure 3 demonstrates a three layer BP network. At the bottom are
five input units, which receive their activation from an external source.
In the middle are sixteen hidden units, which receive their input
according to Equation 4. The hidden units' output is sent on over another
set of weights to the single output node.
The error between the external training signal and the output node's
output activity provides the basis for correcting the behavior of the
network. This raw measure of error is used to find the delta for the
output node.
deltak = (tk - ok) f'(netk) (Eq. 6)
where k is an output unit,
t is the expected output,
f' is the derivative of the output function.
Figure 3
Fortunately, the derivative of the sigmoid function is symbolically easy
to specify:
f' = f (1 - f) (Eq. 7)
The network then must distribute this error measure backward through the
network. For each of the hidden units, then, deltas are found as follows:
deltaj = (Sum over k of (deltakwjk)) f'(netj) (Eq. 8)
where k is a unit in the succeeding layer of the BPN.
Each of the weights in the network is changed according to:
(Change in wij) = L * oideltaj (Eq. 9)
where L is a constant representing the learning rate for the BPN.
Theta values for nodes are treated in the same manner, thus
(Change in thetaj) = L * f(thetaj)deltaj (Eq. 10)
So, for the example BPN in Figure 3, an input vector causes output of
the input units to be distributed to the hidden units, modified by the
intervening weights. Similarly, the hidden units send on their output
through weights to the output unit, which provides the response of the
network to the particular input vector. This is called "feed-forward"
processing.
Once the network output is known, it can be compared to the expected
output, and the delta value for the output unit is determined. This
begins the "back-propagation" process. The delta values for the hidden
units can now be derived as in Equation 8. The amount of change for each
of the weights between the hidden and output layers can now be found
according to Equation 9. Weight changes between input and hidden layers
proceed similarly. For each node in the hidden and output layers, the
value of theta is also changed. This completes the normal back-
propagation phase.
In the example problem, we have made a slight change to the normal
back-propagation process, and have allowed theta for the input units to
also be adaptively changed.
In practice, one would normally construct a network with the correct
numbers of input and output units, make some guess as to the number of
hidden units needed, and assign random values to the weights. The network
would then be trained upon the available data vector pairs until the error
becomes suitably low, or the implementor decides to make a change in
network design.
Possible applications for BPNs include encrypting, data compression,
non-linear pattern matching and feature detection. Existing BP
applications include translation of text inputs into phoneme outputs,
acoustic signal classification, character recognition, speech analysis,
motor learning, image processing, knowledge representation, combinatorial
optimization, natural language, forecasting and prediction, and multi-
target tracking. BP has been implemented or theorized in electronic,
VLSI, and optical formats (Simpson 1988).
Adaptive Resonance Theory 1
Adaptive Resonance Theory 1 (ART 1) is a model introduced by Gail
Carpenter and Stephen Grossberg (Carpenter and Grossberg 1987a). There
are two ART models of note, ART 1 and ART 2, and many modified
architectures which are premised upon one or the other of the ART models.
Basically, an ART architecture is a two-layer network which provides
unsupervised learning of categories of inputs (Figure 4). The F1 layer is
composed of "feature nodes," which accept external inputs. In ART 1,
these inputs are binary patterns, while ART 2 incorporates preprocessing
to accept analog inputs to the F1 layer. The F2 layer is composed of
"category nodes," which compete to respond to valid F1 activations. There
are control structures built into the architecture to prevent F2
activation without input being received at the F1 layer. There are other
control structures to prevent "resonance" from occurring when the
prototype pattern determined by the most active F2 node does not
correspond to the pattern of F1 activation.
Short term and long term memory are represented in ART architectures
by node activations and inter-node weights, respectively. A "bottom-up
activation" refers to the pattern of activation received by F2 nodes
through the weighted links from F1 nodes. Similarly, a "top-down
activation" is the pattern of activation received by F1 nodes through
weighted links from F2 category nodes. Long term memory is changed only
through resonance between the F1 nodes and a selected F2 node. A more
Figure 4
detailed diagram of the ART 1 interconnections is shown in Figure 5.
Resonance is a state in which long term memory traces between the F1
and F2 layers are modified to more closely represent the input activation
in the category node's top down weights. An F2 node which wins the
competition among F2 nodes has its top-down activation tested against the
bottom-up activation of the F1 feature nodes. If these match within a
level of tolerance, called the vigilance level, a resonant state is
entered and long term memory is changed. If not, the current winning F2
category node is made ineligible for further consideration against the
input, and F2 competition is restarted among remaining eligible F2 nodes.
For an example pattern presented to the ART network given in Figure
4, the presence of input turns on the gain control nodes and activates the
F1 layer. The activation of the F1 layer, or bottom-up activation, is fed
across a set of long term memory weights to generate a new pattern of
activity at the F2 layer. The F2 layer responds with a top down
activation which is filtered by the top down weights linking each of the
F2 nodes to the F1 layer. Competition among the F2 nodes results in a
single winner for application to the current feature input. A comparison
of the bottom up activation with the top down activation yields a set
theoretic measure of the match between the presented pattern and the
category represented by the F2 node. If I represents the input pattern,
V(J) represents the F2 category node J's archetypal pattern, and p
represents the vigilance parameter, then the cardinality of the set
intersection between pattern I and the category pattern V(J) must be
greater than or equal to p times the cardinality of I, or reset occurs.
If reset occurs, the F2 node J becomes ineligible for matching to the
current input pattern, and the process of bottom up activation, top down
activation, competition, and match testing continues until some category
node is found to match the input sufficiently well, or until all category
nodes have been matched against the input and found to be too different.
Figure 5 displays a more detailed ART 1 network, with the various weights,
nodes, and connections visible.
Obviously, it is possible that none of the eligible F2 nodes will
match within the vigilance level's acceptable tolerance. When an ART
network is first trained, there are no eligible F2 nodes, rather there are
a number of uncommitted F2 nodes. An F2 node will be selected and enter a
resonant state, providing the first category. If a subsequent input does
not match this category within the vigilance level, the single F2 category
is rendered ineligible and the F2 layer is reset. This brings the network
to a state analogous to that of having no eligible category nodes
available, and a second category node is selected and resonated with the
F1 layer. At any point where no further eligible F2 nodes exist, but an
uncommitted F2 node remains, then a new category node is formed from the
formerly uncommitted node. If no uncommitted category nodes remain, then
the input has been found not to match the available categories.
Figure 5
\chapter{Integration In Neural Network Modelling}
Integration in neural network modelling is taken here to mean the
combination of different neural network architectures in a coordinated
system. This differs from the casual usage sometimes found in the
literature where the term has been applied either to systems composed of
multiple units of the same base architecture, or else to trivial
modifications of a known architecture. Integration applies properly to
cases of multiple-architecture systems and there are some instances of
systems for which the term genuinely applies but has not been used.
As an example, Matsuoka, Hamada, and Nakatsu (1989) have proposed an
architecture for phoneme recognition that subdivides the hidden and output
layers of a back-propagation network in order to enhance the network's
ability to recognize phonemes and also to substantially reduce the
training time necessary for the network. However, Matsuoka terms this
architecture the Integrated Neural Network (INN). While there is a
substantial improvement in training time, there is no fundamental
difference between an INN and a back-propagation network: they differ only
in the connectivity of the weights between the hidden and output layers.
The reduction in internal complexity of the INN can explain the decreased
training time discovered.
A slightly different form of integration is pursued by Foo and Szu
(1989). A "divide and conquer" approach to problem solving employs the
same architecture, a modified Hopfield-Tank network, to handle smaller
subproblems and then brings together the resulting subproblem solutions
into an overall solution. This requires some coordination to effect the
overall solution, bringing into play elements of the broader integrative
issues I have noted, but is not properly an integrated system as I have
defined it.
A better example of an integrated artificial neural network system
appears in Cruz et al. (1989). Cruz uses a MADALINE architecture for
image preprocessing and a back-propagation network for removing image
distortion. The MADALINE architecture is a Multiple ADALINE system,
developed by Widrow (Widrow, Pierce, and Angell 1961). The ADALINE, short
for Adaptive Linear Neuron, is a neuronal model using the Least Mean
Square learning rule developed by Widrow and Hoff. Widrow and Winter
(1988) have updated the learning rule used for multiple layer, multiple
ADALINE networks. The specific example given by Widrow and Winter
presents a MADALINE network for invariant pattern recognition. The
MADALINE architecture most closely resembles the Perceptron architecture
given by Rosenblatt. In creating his integrated system, Cruz applied the
"divide and conquer" concept not for the purpose of reducing simulation
time, but in consideration of space requirements needed for a system which
could handle the 256x256 pixel images used.
Integration and Convergence
A perennial problem for the artificial neural network modeller is the
issue of convergence in finite time. It is nice to know that the
architecture selected for a function will converge to a solution before
the heat death of the universe. It is similarly a concern that a system
composed of subsidiary architectures will converge. This can be
problematic, since general convergence theorems have not been found for
several of the most popular architectures (Widrow 1987).
Back-propagation provides a particularly pointed example, since it is
by far the most popularly used network, if number of papers concerning
applications is taken as the criterion. The generalized delta learning
rule of back-propagation has long been appreciated to be generally useful,
yet significant progress in firmly establishing this usefulness in the
form of theorems concerning convergence has been lacking. Sontag and
Sussman (1989) provide a theorem demonstrating for back-propagation an
analogous result to the perceptron learning rule: if a separation solution
exists, the generalized delta rule's gradient descent will find the
solution in finite time. While much has been made of a theorem by
Kolmogorov (Farhat 1986), it must be conceded that Kolmogorov's theorem is
an existence theorem: there is some network based upon the back-
propagation architecture which will perform a mapping from n to m, but the
theorem gives no clues as to what that specific network looks like.
There is some promising news concerning convergence which is of
interest for building integrative ANN systems. Hirsch (1989) gives
several theorems which hold that if component subnets of a neural network
converge, then the network will converge. While Hirsch's theorems assume
that the convergence properties of the subnetworks can be described by
Liapunov energy functions, he notes, "It is more difficult to obtain
convergence for cascades of systems that are merely assumed to be
convergent, but without benefit of Liapunov functions or global asymptotic
stability. One way of doing this is to place strong restrictions on the
rates of convergence." Hirsch defines a cascade as a layered network
where the output of one layer serves as the input of the next. Many
integrative designs can be cast into this framework.
Incremental Synthesis
The process of synthesis leading to integrative artificial neural
network modelling is important to the development of insights into topics
of critical application, such as sensor fusion. By confronting directly
the need for coherent internal use of available resources and
capabilities, we are more likely to generate an understanding of fusion
principles.
The synthetic approach to modelling provides a supportive environment
for creating extensive systems. Just as the topic of artificial neural
network modelling benefits from the interdisciplinary nature of its
supporting sciences, so a synthetically derived artificial neural network
system benefits from the range of problem solution approaches and features
inherent in the underlying network architectures. By creating and
maintaining a system of network architectures applicable to subfunctions
in the problem solution, subfunction solution by a particular system
component can benefit from the co-option of features normally found in
different components of the artificial neural network system.
Networks Under Consideration, System Properties
Each of the three network architectures used in the example problem
of melodic composition has its own set of features and drawbacks which
play an important role in system design.
Hopfield-Tank Networks
As mentioned earlier, the Hopfield-Tank architecture is generally
used in achieving a specific function. By this I mean that each Hopfield-
Tank network is designed for a particular purpose, and can provide no
functionality for other unrelated purposes. So the use of a Hopfield-Tank
network should generally be reserved for functions which do not change
over time. However, I will immediately cite a counter-example, due to its
elegance of integrative design.
An ingenious mechanism for extending the utility of the Hopfield-Tank
architecture is pursued by Tsutsumi (1989), where one back-propagation net
remaps Hopfield-Tank network inputs and another back-propagation network
remaps Hopfield-Tank network outputs. The problem given is one of
avoiding robotic arm deadlocking. The movement space is constrained, and
therefore applicable to Hopfield-Tank network solution. However, the
adaptation of the internal space representation to the real world arm
movement must be adaptive. The Hopfield-Tank network does not provide
learning rules, so the back-propagation networks provide the adaptation to
real world feedback. In this manner, some significant benefits accrue to
the use of the integrated system: since the Hopfield-Tank network is
static with respect to the encoded weights, it provides a good repository
for the robot's joint-arm space; since the back-propagation networks are
adaptive, the system can configure itself to respond to a changing
environment.
Since the Hopfield-Tank network was conceived of in the context of
implementation in silicon, the possibility for reducing a Hopfield-Tank
network instance to a hardware component can be important for real-world
applications. This step would bring the benefit often touted for
Hopfield-Tank networks, speed. Speed is rarely noted in practice, since
most practice involves simulation. The simplicity of the Hopfield-Tank
architecture can be a strong point for system design and integration,
however, even in simulated systems.
A drawback which may eliminate the Hopfield-Tank network from
consideration for a particular function is that the weights for the
network must be derived from the constraints to be implemented and from
any data functions necessary for solution but not available in the input
to the network. This requires an understanding of the system to be
solved, which may not be available. Some architecture with learning rule
would then be more suitable for application.
The output of a Hopfield-Tank network must be deciphered from the
final pattern of activation of the net at equilibrium (cf. discussion in
Chapter 1). In the case of the Travelling Salesman Problem, position of
active nodes provides an encoding of placement of cities in the tour.
This information might be rendered more compact in another format,
depending upon the input type expected for further networks in the system.
Back-propagation Network System Considerations
The back-propagation architecture provides a mapping from the input
vector to the output vector, and can be trained by example. The system
properties of back-propagation networks include stability of learning
given a fixed universe. By implication, the learning is not stable if
some perturbation in the problem set changes the mapping function. The
back-propagation network would then learn the new mapping over time, and
the old mapping would be lost.
Back-propagation networks have a moderately complex structure. The
properties gained from this increase in complexity over the Hopfield-Tank
architecture include the capacity for learning, the ability to extract
features from input data and generate internal representations for those
features, and the possibility of complex input to output transforms in
accordance with learned associations.
Back-propagation networks can accept binary or analog inputs, so the
inputs can represent conditional probabilities as well as more strictly
constrained values. The outputs are analog values which can be
interpreted as binary through the use of thresholding functions. This
allows a wide variety of input and output possibilities to achieve overall
system function. However, the choice of representation of inputs can be
critical for speedy and reliable training to occur. For example, use of
analog values should be avoided when there exists a natural partition of
the range of the input into distinct states. Our example of the melodic
note generator will illustrate this concern.
ART 1 System Properties
Adaptive Resonance Theory 1 architectures provide unsupervised
learning of "clusters" or classifications of input vectors. An internal
representation of classification archetypes is generated. This
architecture ensures that new inputs will tend not to perturb
classifications of previous inputs. This compromise in the stability-
plasticity tradeoff (Carpenter and Grossberg 1987a, b) can be modified for
special purposes, as the melodic note generator program will demonstrate.
The internal operation of the ART 1 network can provide certain features
of especial interest to system design. Specifically, one by-product of
the classification algorithm is the detection of novelty, which will be
shown to have functional significance beyond that of the original design
of the ART 1 network.
Some properties of ART 1 require particular attention from the
designer. For example, there is a matching parameter (vigilance) which
controls how much deviation from a category prototype is acceptable.
There are no guidelines for the selection of the vigilance parameter, and
it is left to the designer to select and assign a "proper" value. Some
guidelines do exist for certain of the other learning parameters in the
ART 1 architecture, such as the learning coefficients for each of the top-
down and bottom-up memory equations (Carpenter and Grossberg 1987a).
The ART 1 architecture provides no standard output. The designer
must access internal values of the ART 1 network to provide useful
information to the remainder of the system which includes that network.
ART 1 is a highly complex architecture with many parameters to be
selected and set by the designer. Useful modifications to be made to the
architecture would include creating adaptive functions to replace some of
the static and arbitrary parameters of the network, such as the gain and
reset parameters (cf. Figure 5).
Table 1. System Properties Summary
Hopfield-Tank network properties: - Data initialization process only place for changing adaptively (not "learning" at all) - Fast convergence (on system, not necessarily on simulation) - Inflexible structure (individual design necessary) - Simple structure
Back-propagation Network Properties: - Stable learning given fixed universe - Change to design implies relearning necessary (costs for self-adaptation include forgetting what has been learned) - Adaptive weights changed as consequence of training (supervised learning) - Medium complexity of structure - Input to output transform can be computationally complex - Output nodes may deliver digital, bipolar, or analog values
ART Network Properties: - Stable self-organizing learning - Change in design produces unknown effect on existing "knowledge" - Non-adaptive parameters for gain and reset possible drawbacks, should be replaced by adaptive functions
System considerations
Considerations of interfacing outputs of one model to inputs of another: Inputs: HTN : analog or discrete value, one per node BPN : analog or discrete value, one per input node ART 1: discrete value, one per input node ART 2: analog value, one per input node
Outputs: HTN : discrete value, one per node BPN : analog or discrete value, one per output node ART 1 : none specified ART 2 : none specified
\chapter{Example Problem}
In developing a simulation to test out processes of integration,
several factors have to be considered. The simulation should have enough
scope to provide good subsidiary roles for the component functions, it
should be small enough to be implementable in a reasonable period of time,
and it should produce output of a form which is readily apprehensible and
analyzable. The first criterion is easily fulfilled; the second and third
are rather more difficult to fulfill.
As a starting point for exploring the possibilities of an integrated
approach to artificial neural network modelling, the problem of producing
a melodic line in music composition was selected. The complexity of music
composition in general provides ample considerations for the application
of component networks. Unfortunately, the complexity of musical
composition offers no hope for the relatively simple design and
implementation of an artificial neural network system which addresses all
the salient points. Therefore, simplifying assumptions are made to ease
the requirements for the ANN system. The output, to be interpreted as a
sequence of musical notes, can present some problems with evaluation, due
to the qualitative context of musical evaluation in general. However, it
is possible to treat the note sequence as a set of concatenated symbols,
and apply some of the information theory concepts of Shannon (1948) to
conduct an analysis.
Simplifying Assumptions
The great complexity of musical composition in general is constrained
to yield a problem of suitable scope for the example integrated ANN
system. A limited scale covering one octave is assumed, in the key of C.
There are no accidentals, and there is no explicit timing of notes. A
single voice is assumed, and there are no harmonics generated. A limited
and fixed set of classical composition rules forms the basis for the
constraint and comparison of the system.
Problem Approach
The use of several ANN architectures in creating an integrated system
whose function fulfills the requirements of the musical composition
problem is assumed. A preliminary set of hypotheses as to how a composer
develops a line of melody was advanced for defining the subfunctions of
the composition system, or note generator. As Teuvo Kohonen (1989) notes
in discussing his own ANN system for musical composition,
It is not possible to survey here the development of ideas in computer music. One of the traditional approaches, however, may be mentioned. It is based on Markov processes. Each note (pitch, duration) is thereby regarded as a stochastic state in a succession of states. The probability Pr = Pr(Si | Si-1, Si-2,
% raw formatter directive: ...) for state Si, on the condition that the previous states are
Si-1, Si-2, ..., is recorded from given examples. Usually three predecessor states are enough. New music is generated by starting with a key sequence to which, on the basis of Pr and, say, the three last notes, the most likely successor state is appended. The augmented sequence is used as a new key sequence, and so the process generates melodic successions ad infinitum. Auxiliary operations or rules are necessary to make typical forms (structures) of music out of pieces of melodic passages.
Leaving the problem of constructing musical forms for possible later
consideration, the approach given by Kohonen matches well the procedure of
composition undertaken here. A candidate note and note sequence are
proposed, a critique according to classical rules is made concerning the
last note in the candidate sequence. The approval of the critic should
mostly cause the acceptance of the candidate note, but the rules should be
broken often enough that there is not an absolute conformity to the rules
postulated.
Now we must confront the problem of how to best combine models in a
unified structure that accomplishes the example function of melodic
composition. The solution involves identifying the strengths and
weaknesses of the component models, and which functions each may
accomplish. Then when the subproblem mapping has been accomplished, it
must be determined how to integrate the subfunction module outputs to
other modules to accomplish the overall task.
Example Problem Network System
The identifiable problem subfunctions are candidate note generation,
sequence critique, and novelty detection. The candidate note generator
subfunction should propose notes in general, but not complete, conformance
to probabilities of the next note filling the requirement of being part of
a classical sequence. The sequence critique subfunction should evaluate
the proposed next note in strict conformance to the set of classical
sequences provided. Since the evaluation given by the sequence critique
subfunction is an incomplete criterion for composition given expectations
of novelty, some means of detecting novel sequences must exist for the use
of the coordinating system. The coordinating system then has the
information necessary to "break the rules" when needed to avoid long,
boring sequences of strictly mechanical melody. This can also be seen as
a requirement for stable and continued operation, as there exist sequences
which have no classical next note possibilities.
Candidate Note Generation
The candidate note generator should produce plausible next notes
given a historical partial sequence. Since the rules for identification
of plausible next notes are fixed and known, there is no need for learning
in this stage of the network. The candidate note generator should also
occasionally provide notes which do not necessarily conform to the
expectation of a classical next note.
The Hopfield-Tank network (HTN) was selected as the candidate note
generator since it provides the above features. HTNs are noted for their
utility in constraint satisfaction and other optimization problems, which
fits the requirement that only a single next note is to be proposed at a
given time and that it should basically follow the probabilities of a
classical next note. A well known attribute of the HTN architecture is
the inclusion of spurious local minima which do not represent "valid"
solution states. By purposefully utilizing this feature, we can convert
what normally constitutes a drawback into a asset for the subfunction.
The spurious local minima will give the occasional proposal of non-
classical next notes. The constant and known nature of expected sequences
determines the formation of the HTN weights, in conjunction with the known
constraints for proper HTN function. The use of "noisy" input values can
produce the semi-random distribution of possible notes that is needed for
variability.
Figure 6 shows the modified HTN architecture, called Bach, used in
our simulation. The rows represent note values and the columns represent
sequence placement. The constraints imposed on this network include the
need to present a single winning note in each place in the sequence
pattern, and the need to prevent endless repetition of the same candidate
note. By introducing relatively strong inhibitory links within rows and
columns, we can satisfy the constraint requirements. We achieve
preferential selection of classical next notes by reducing those
inhibitory links somewhat for connections which follow classical sequence
patterns.
Sequence Critique
The sequence critic should provide an evaluation of the conformance
of the proposed next note to classical melodic sequence rules. If the
sequence rules are assumed to be provided by example, then some learning
function is required to allow the critic to become adept and reliable.
The subfunction receives a note sequence as input, and produces an output
which may be interpreted as a boolean statement of the classical nature of
the candidate note.
The back-propagation network (BPN) was selected as the sequence
Figure 6
critic. The BPN, as discussed in Chapter 2, can learn any input/output
transformation given to it (within some constraints upon the availability
of sufficient hidden nodes to form a stable internal representation).
The form of the BPN used, called Salieri, was in the end a network
accepting a binary representation of the input sequence, using twenty
hidden units for internal representation, and producing an output value
which was interpreted as a yes-or-no output. So the net structure used
forty input units, twenty hidden units, and one output unit. This is a
medium sized BPN.
Novelty Detection
The novelty detection function must have components to enable the
recollection of past sequences for discrimination of novel sequences.
However, the space requirements should not be overwhelming. A
classification system would provide good data compression while allowing
the nearly complete context information needed for novelty detection.
The ART 1 architecture was selected for the role of the novelty
detector, for it provides novelty detection as part of a classification
framework. The ART 1 architecture also has other features of interest to
further research on integrated and extensive systems.
The ART 1 network used here, called Beethoven, is modified from the
Carpenter-Grossberg architecture. Some of the explicit features of the
Carpenter-Grossberg architecture are handled as implicit assumptions in
the procedural simulation. The "2/3rds Rule," for example, is not
invoked, since the only time that Beethoven is active, the network meets
the constraints of the rule. The separate rules for top-down and bottom-
up weight modifications are replaced by a single rule for both, as is done
in the ART 2 architecture (Carpenter and Grossberg 1987b).
Coordination
In any integrative system, some means of coordinating subfunctions
becomes necessary. Some interpretation and processing of input or output
terms may be accomplished by the coordinating system. The ultimate
decision for whether or not a candidate note is accepted falls to the
coordinating system.
The coordinating system is called Lobes, since the features and
activities of the frontal lobes (Levine 1986, Levine and Prueitt 1989)
provided the inspiration for its operation. Lobes generates the context
management and state dependent actions which drive the integrated system
to completion of the intended function, melodic composition.
Lobes also contains an internal boredom function, which tends to
increase over time. There is a boredom threshold, which causes a change
in behavior in Lobes if it is exceeded.
Operation
In operation, the integrated ANN note generator uses its components
in a sequential manner. Lobes generates a call to Bach for a candidate
note, which when returned is sent to Salieri for critique. As a mechanism
for preventing wastage of Beethoven category nodes, the entire sequence
which Bach settles upon is used for further processing by Salieri and
Beethoven. Since at the beginning of composition there is no sequence
history, Bach's inputs are determined randomly over the entire sequence.
As notes are added to the sequence history, Bach inputs are determined
with the addition of some noise, except for the candidate note column
which receives only random noise as input.
Salieri receives the Bach sequence values and makes a determination
of classical conformance, which it passes back to Lobes. Lobes then sends
on the entire context, sequence plus critique, to Beethoven. At first,
Beethoven is virtually certain to encode new categories for each input it
receives. As time goes by, the likelihood that Beethoven will encode a
new category decreases, until all category nodes are utilized. So it is
more likely that the first few notes generated will conform to classical
sequence examples. Sometimes, however, a note which has no possible
classical successors will be generated early in the simulation. In this
case, it is likely that Beethoven's category encoding will proceed at a
much faster than average pace.
The indication of novelty generated by Beethoven is used by Lobes to
modify the system response to the internal state, which is determined by
Salieri's critique of the next note, Beethoven's detection of novelty, and
the boredom threshold in Lobes. If Lobes is not bored and Salieri
approves of the candidate note, the note is accepted. If, however, Lobes
has reached its boredom threshold and Salieri approves of a note, it will
reject the candidate note and request another one from Bach. Likewise, if
Salieri disapproves of the note but Lobes is bored, Salieri may be
overridden and Lobes may accept the note. Indications of novelty from
Beethoven can satisfy Lobes' drive to be "excited," or not-bored. This
will tend to make Lobes more conservatively classical as Beethoven
continues to detect novelty. Figure 7 shows the operation of the
coordinated system.
Figure 7
\chapter{Results}
Performance
The integrated ANN note generator system produces 152 notes in about
three hours when run on an 80386 PC compatible at 16MHz clock speed. It
takes approximately fifteen hours to produce the same number of notes on
an 8088 based machine with an 8087 numeric coprocessor.
Example and Analysis of Output
Appendix A contains sample output from a run of the note generator.
With a problem such as musical composition, assigning an objective measure
denoting the "worth" of the output is not possible. However, it is
possible to compare the output of our note generator system with random
sequences of note values. By use of a binomial performance measure, it is
possible to define how much the sample output differs from a random
sequence.
Random sequences have their own mystique and interest, but subjective
evaluations of random melodic forms by the untrained ear tend toward the
negative. The output of our note generator network was intended to
basically follow the guidelines of an example set of classical sequences.
The system included mechanisms for breaking out of a strict adherence to
the guide set of sequences. Thus, it would be expected that the output
have somewhat more resemblance to random sequences than the "rules" would
state. Since the classical rules of composition, while not our sole
criteria of fitness, are the only quantifiable part of our criteria, we
compare our output with random note sequences on the basis of these rules.
Table 2 summarizes the characteristics of output sequences generated
under several different conditions. A random note generator, a mostly
classical note generator, and our integrated ANN note generator each
produced outputs and were evaluated using the critic developed for
training Salieri. The "Successes" column indicates the number of times
the next note in the sequence could be considered to be classical.
The random note generator simply output a sequence of random numbers
for a set sequence length. The Classical Instructor sequence generator
operated by determining the available pool of possible classical next
notes, then randomly selecting one of those notes. In some cases, no next
note fit the criteria of being "classical," and a random note was
generated. The rest of the sequence generators were variations upon our
ANN note generator. The use of different back-propagation nets in the
critic role gave different results in the output. Trained and untrained
back-propagation nets with analog inputs were used. No significant
difference in output could be distinguished between the trained and
untrained versions, but the result was far closer to random performance
than classical. The inability of the Salieri net with analog inputs to
converge to a reliable and accurate performance measure explains the
similarity of result with the untrained version of the same type. On the
other hand, a Salieri composed of a trained back-propagation network using
a binary representation for inputs was able to converge to a stable and
fairly accurate state. Hence the performance of the trained Salieri with
binary inputs was considerably closer to classical than was the
performance of the random sequence generator. In two cases, a rule-based
critique system was substituted for the back-propagation network.
Table 2. Classical components of output sequences.
Sequence generator Sequence Successes \% Z versus Z versus Length random classical
Random 10,000 890 0.089 0.00 -102.54
System w/ untrained Salieri (analog inputs) 152 18 0.118 1.26 -23.84
System w/ trained Salieri (analog inputs) 152 19 0.125 1.54 -23.64
System w/ trained Salieri (binary inputs) 152 36 0.237 6.28 -20.07
System w/ rule-based critique (Salieri supervisor) Run 1 150 35 0.233 6.10 -20.06 Run 2 150 47 0.313 9.42 -17.50
Classical Instructor 8,150 6898 0.846 102.54 0.00 (rule-based critique w/out ANN system)
\chapter{Discussion}
The integrated note generation system's performance suggests that it
met operational expectations. It produced notes according to a mixture of
somewhat conflicting criteria. This kind of operation in the midst of
uncertainty characterizes many human decision-making processes, and may be
assumed to play a role in human music composition as well. Our framework
allows for further experimentation with hypotheses concerning the
fundamental processes involved in higher-order constraint satisfaction
systems in an extensive environment (cf. Pao 1989).
The integrated approach has demonstrated several advantages and
disadvantages in development and operation. The disadvantages include the
complexity of handling several different network architectures at once,
which can contribute to programmer confusion (the downfall of programmer
omniscience). The necessity for dealing directly with "model mismatches,"
where one subnetwork may produce a different representation as output than
the next subnetwork requires as input, can cause system design time to be
protracted. Failure to recognize that a problem exists in articulation of
networks can result in behavior that diverges wildly from expected norms.
On the other hand, any complex problem may present similar
difficulties regardless of the choice of solution approach. By using
subnetworks of known characteristics, one may be able to achieve a
solution with fewer uncertainties than a totally top-down approach would
yield. With a range of different capabilities available from subsidiary
networks, the likelihood of encountering an insoluble subproblem is
reduced. Synthetic integrated systems also lead to a combinatorial
explosion in the richness of possible system behaviors, which again is
reminiscent of the increasingly interesting behaviors noted as more
complex biological organisms are considered.
Simulation Concerns
The integrated ANN system from the example problem relied on several
procedural programming shortcuts. For example, the implementation of
"boredom" in Lobes was simply a counter which would be compared with a
threshold value. This does not have any basis in biological neural
systems, yet rather neatly simulates the behavior of a simple neuronal
model for the same task. As another example, the decision in the ART 1
network as to which category an input belongs to is currently made on the
basis of an arbitrary, winner-take-all rule. The same effect could
probably be achieved by means of on-center-off-surround neural
interactions that are more biologically realistic. Casting the system
functions into an entirely biological framework would yield a better, more
capable system for future work. However, the system stands as a first
effort toward this goal.
Integration and Artificial Intelligence
There have been several efforts toward integrative system design in
the top-down modelling school. In the HEARSAY system (Reddy et. al.
1973), the blackboard model was developed. The blackboard model
presupposes the combination of a set of possible problem-solving systems
which have available the current system state. The system state is said
to reside upon "the blackboard" as a visualization of the process of
problem-solving. Each of the several applicable subsidiary problem-
solving systems attempts to derive an incremental step toward the global
solution, and competes with other such problem-solvers to control the
blackboard, and thus be able to change the system state. This is a
significant development, and one which is paralleled by concepts in
several artificial neural network models. In ART models, for example, the
concept of competition among various classification prototypes bears a
strong resemblance to the blackboard model. If the F1 layer's activation
is considered analogous to the system state, the F2 nodes each are
analogous to problem-solvers in the blackboard model.
There have been some efforts toward explicitly bringing together top-
down and bottom-up models in hybrid systems. Amano et al. (1989) present
an example of a phoneme recognition system combining an expert system with
a perceptron network. The expert system provides feature extraction from
speech data, which is input to the perceptron. The perceptron allows
decision-making under uncertainty, whose output is interpreted using fuzzy
logic rules. This avoids drawbacks associated with "template matching"
phoneme recognition schemes. Rabelo and Alptekin (1989) have integrated a
neural network with an expert system into an intelligent scheduling system
for manufacturing applications. Their system has the ability to learn
from experience and generate schedules within real-time constraints.
Neurobiological Evidence of Multifunctionality
Multi-state functionality of memory is supported by work of Nottebohm
(1989). Nottebohm's work involves the development and remembrance of
complex songs in songbirds. Through a series of studies, Nottebohm
demonstrates that hormonal changes can cause the forgetting of songs in a
bird's repertoire, or allow the formation and remembrance of new songs.
The hormonal changes in question center around testosterone, and typically
the mating season is the time when levels of testosterone allow the
formation of songs, presumably conferring a reproductive advantage for
male songbirds. Nottebohm demonstrates that the changes are only
hormonally dependent by the simple strategy of artificially inducing song
creation by the application of testosterone to songbirds of both sexes at
various times of the year. The withholding of regular doses of
testosterone was also shown to cause the forgetting of known songs in the
same birds.
The implications of Nottebohm's work include support for state-
dependent memory. Since a specific memory function can be modulated by a
specific hormone, this implies that other memory systems may also have
recall dependent upon some hormone or other chemically mediated process.
Given that recall and learning can be so modulated, the necessity for
taking state dependencies (levels of hormones and other neuroactive
substances) into account for system function can be appreciated.
State-dependent memory is also supported by the work of many other
investigators, such as Bower (1981). In Bower's experiments, happy or sad
moods were induced in subjects by hypnotic suggestion, in order to
investigate the influence of emotions on memory and thinking. This
influence was profound; for example, people recalled a greater percentage
of those experiences which had taken place when they were in the same mood
as they were in during recall. Also, when the feeling tone of a story
agreed with a reader's hypnotically generated emotion, the reader found
the events and characters in the story more memorable and easier to
identify with.
State dependence of memory or other neural function can give rise to
quite useful modelling constructs. The ability to recast problem
solutions given functional states becomes biologically justifiable. The
logical power of conditional activation of entire subnetworks becomes
available through the modelling, however coarse, of these state
dependencies. The almost direct implementation of expert system analogues
which can be analyzed in a completely biological and ANN context is made
possible.
Implications include the higher order integration of functionally
changed sub-units over time. In humans, well known state dependencies
include the fight or flight response to norepinephrine production and the
diving response typical of mammals entering cold water. Without the
appropriate integrative control, neither fight or flight nor the diving
response would produce the desired, or selected, effect. The coordination
of separate functional neural "circuits" is clearly present; the exact
mechanism remains to be elucidated but there have been some promising
beginnings in neural network models. For example, in the neural model of
attention described by Grossberg (1975), there is competition between
nodes representing activations of different drives (hunger, thirst, sex,
etc.). The winner of this competition is not determined solely by which
drive is highest, but also by the availability of compatible cues in the
environment.
State-dependent memory implies the existence of functional changes
over time in cortical structures. Since we now have evidence of multi-
modal neural circuitry, at least some consideration should be given to the
implied necessary integration. The problem of understanding a system
which is dynamic not only in processing of input but also in functional
neural subsystems is both daunting and exhilarating. It is daunting,
because the complexities of modelling such systems exceeds our current
capacity for ready assimilation and understanding of the underlying
concepts and mechanisms (which have not yet been elucidated), and
exhilarating because there appears to be no end to the variety of
expression of these systems in the natural world, and thus no apparent end
to the problem-solving challenge awaiting the researcher.
The function of speech processing in humans, for example, requires
the acquisition of external signals, the separation of those signals into
semantic and affective content, the recognition of mode in affective
content, the parsing of semantic content, and the integration of semantic
and affective content to determine meaning. This list of subfunctions is
not complete, which gives an indication of the extent to which integration
remains a regular and important activity in biological systems.
The Triune Brain Theory
Integrative theories of neural/cognitive function have a long
history. One of the best known is the triune brain theory of Paul MacLean
(1970). MacLean's research into behavioral studies of different brain
areas led him to propose that the human brain is divided into three
developmentally derived regions of separate function (see Levine 1990 for
discussion). The earliest, and presumably the most primitive region, is
termed the reptilian brain, and is composed of the the brainstem and basal
ganglia. The reptilian brain is responsible, in this theory, for the
preprogrammed, innate behaviors. The paleomammalian brain, composed of
the limbic system, modifies the expressed pattern of reptilian brain
responses and is the source, in this theory, of the basic emotions (love,
hate, fear, arousal, etc.). The neomammalian brain, composed of the newer
parts of the cerebral cortex, provides further modifications of the
expression of the two older brain areas, and gives us our rational
capacity, seen in the ability for planning and verbal expression.
While MacLean's theory is oversimplified, it does provide a useful
set of distinctions between various cognitive subfunctions, all of which
are involved in complex behaviors. In fact, if one stretches the
imagination, one can draw analogies between the reptilian brain and our
Hopfield-Tank network; the paleomammalian brain and our ART 1 network, and
the neomammalian brain and our back-propagation network.
The integrated ANN note generator had its origins in a collaborative
effort to develop an extensive ANN system suitable for exploring multi-
modal cognitive hypotheses (Blackwood, Elsberry, and Leven 1988). That
project, in turn, was derived from insights provided by Leven (1987b).
Leven's SAM model was depicted in a manner which led to a discussion of
the possibility of replacing the components of MacLean's triune brain
model with current ANN architectures. The difficulty of describing a
suitably restricted problem for adequate application of the limited
current architectures was resolved with the simple melodic composition
problem outlined previously. Points of difference from the original
MacLean theory can be attributed to certain changes in model context (as
modified by Leven's separation of memory into three components:
motoric/instinctive [reptilian], sensory/affective [paleomammalian], and
associative/semantic [neomammalian] (Leven 1987a)) and to the mismatch
between architectures derived not for their similarity to these basic
cognitive forms, but to satisfy more immediate criteria such as being
implementable in current electronic devices. The desired system based on
our loose analogy to MacLean's theory has been demonstrated to be
operational and ready for incremental refinement.
We hope that in future work such analogies can be made more precise.
The development of both neural network theory and neuroscientific data
should allow the critical research to continue into these theories of
integrative cognitive function.
\appendix
\chapter{Sample Melody Output Of The Various Note Generator Programs}
Integrated ANN Note Generator Sample Output
b61T output, page 1
b61t output, page 2
b61t output, page 3
Random Note Generator Sample Output
random output, page 1
random output, page 2
random output, page 3
Classical Note Generator Sample Output
classical output, page 1
classical output, page 2
classical output, page 3
\chapter{Program Source Listing: Integrated Ann Note Generator}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Back-Propagation Unit}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: List Structures Unit}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Miscellaneous Procedures Unit}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Global Type And Variables Unit}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Classical Instructor Unit}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Ansi Screen Control Unit Unit Ansi\_Z;}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Musical Sequence Evaluator Program}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Random Note Generation Program}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Note Sequence Playing Program}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Rule-Based Note Sequence Generation Program}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Offline Back-Propagation Network Training Program}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Data File Listing: Hopfield-Tank Network Weight Data File}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Data File Listing: Classical Sequences Data File}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Data File Listing: Back-Propagation Network Data File}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\chapter{Program Source Listing: Translator From Program Note Files To Music Transcription System Song Format}
This appendix is represented in the repository by the legacy source and data files in \texttt{THES/}. The automated thesis conversion suppresses the full listing here to keep the document manageable.
\nocite{*}
\bibliographystyle{plain}
\bibliography{integration_and_hybridization_in_neural_network_modelling}
\end{document}