INTEGRATION AND HYBRIDIZATION IN NEURAL NETWORK MODELLING The members of the Committee approve the masters thesis of Wesley Royce Elsberry Karan Briggs _____________________________________ Supervising Professor Lynn Peterson _____________________________________ Daniel Levine _____________________________________ Copyright by Wesley Royce Elsberry 1989 All rights reserved INTEGRATION AND HYBRIDIZATION IN NEURAL NETWORK MODELLING by WESLEY ROYCE ELSBERRY Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE THE UNIVERSITY OF TEXAS AT ARLINGTON August 1989 ACKNOWLEDGEMENTS I wish to thank the many people who have made my graduate program a rewarding, enlightening, and interesting experience. My especial thanks to Bob Weems, Ken Youngers, Vijay Raj, Steve Hufnagel, and Farhad Kamangar for their exemplary instruction. The advice and encouragement of Bill Buckles was critically important to this refugee from the life sciences. The members of my graduate committee, Karan Briggs, Lynn Peterson, and Daniel Levine, have provided me with technical resources, instruction, referrals, and general good advice in plenty. Sam Leven provided the example problem description, the classical sequence data, and much of his own time and expertise to aid me in developing both the simulation program and my understanding of the field. I am indebted to Dr. Levine for his great interest in teaching the principles of cognitive modelling to as wide an audience as possible, for without his express encouragement I would not have become acquainted with the field, and also for his considerable personal assistance in developing this thesis. The enthusiasm of Harold Szu helped to motivate me to undertake a deeper inquiry into neural network modelling. Finally, without the continual support of my spouse, Diane Blackwood, this thesis and the classroom work which formed the basis for it would not have been possible. July 28, 1989 iv ABSTRACT INTEGRATION AND HYBRIDIZATION IN NEURAL NETWORK MODELLING Publication No. _______ Wesley Royce Elsberry, M.S. The University of Texas at Arlington, 1989 Supervising Professor: Karan Briggs Artificial neural network models derived from different biological behaviors or functions can be used in an integrative fashion to create an extensive problem-solving environment. An example problem of limited melodic composition is approached by the use of Hopfield-Tank, back- propagation, and Adaptive Resonance Theory networks serving as plausible next note generator, musical sequence critic, and novelty detector, respectively. Biological bases for integrative function are discussed, and the experimental role of synthetic systems such as the example integrated network is explored. v TABLE OF CONTENTS ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iv ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF ILLUSTRATIONS . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1. DEFINING THE ROLE AND NATURE OF ARTIFICIAL NEURAL NETWORK MODELLING . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. INTEGRATION IN NEURAL NETWORK MODELLING . . . . . . . . . . . 33 3. EXAMPLE PROBLEM . . . . . . . . . . . . . . . . . . . . . . . 43 4. RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5. DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Appendix 1. SAMPLE MELODY OUTPUT OF THE VARIOUS NOTE GENERATOR PROGRAMS . 66 2. PROGRAM SOURCE LISTING: INTEGRATED ANN NOTE GENERATOR . . . . 79 3. PROGRAM SOURCE LISTING: BACK-PROPAGATION UNIT . . . . . . . . 115 4. PROGRAM SOURCE LISTING: LIST STRUCTURES UNIT . . . . . . . . . 153 5. PROGRAM SOURCE LISTING: MISCELLANEOUS PROCEDURES UNIT . . . . 160 6. PROGRAM SOURCE LISTING: GLOBAL TYPE AND VARIABLES UNIT . . . . 172 7. PROGRAM SOURCE LISTING: CLASSICAL INSTRUCTOR UNIT . . . . . . 177 8. PROGRAM SOURCE LISTING: ANSI SCREEN CONTROL UNIT . . . . . . . 181 9. PROGRAM SOURCE LISTING: MUSICAL SEQUENCE EVALUATOR PROGRAM . . 190 10. PROGRAM SOURCE LISTING: RANDOM NOTE GENERATION PROGRAM . . . . 193 11. PROGRAM SOURCE LISTING: NOTE SEQUENCE PLAYING PROGRAM . . . . 195 vi Appendix Page 12. PROGRAM SOURCE LISTING: RULE-BASED NOTE SEQUENCE GENERATION PROGRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 13. PROGRAM SOURCE LISTING: OFFLINE BACK-PROPAGATION NETWORK TRAINING PROGRAM . . . . . . . . . . . . . . . . . . . . . . . 202 14. DATA FILE LISTING: HOPFIELD-TANK NETWORK WEIGHT DATA FILE . . 208 15. DATA FILE LISTING: CLASSICAL SEQUENCES DATA FILE . . . . . . . 214 16. DATA FILE LISTING: BACK-PROPAGATION NETWORK DATA FILE . . . . 216 17. PROGRAM SOURCE LISTING: TRANSLATOR FROM PROGRAM NOTE FILES TO MUSIC TRANSCRIPTION SYSTEM SONG FORMAT . . . . . . . . . . . . 218 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 vii LIST OF ILLUSTRATIONS Figure Page 1. McCulloch-Pitts Network . . . . . . . . . . . . . . . . . . . . . . 8 2. Standard Hopfield-Tank Network . . . . . . . . . . . . . . . . . . 20 3. Back-propagation Network . . . . . . . . . . . . . . . . . . . . . 25 4. ART 1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . 29 5. ART 1 Detailed Diagram . . . . . . . . . . . . . . . . . . . . . . 32 6. Modified Hopfield-Tank Network . . . . . . . . . . . . . . . . . . 48 7. Note Generator Block Diagram . . . . . . . . . . . . . . . . . . . 53 viii LIST OF TABLES Table Page 1. System Properties Summary . . . . . . . . . . . . . . . . . . . . . 42 2. Classical Components of Output Sequences . . . . . . . . . . . . . 57 ix CHAPTER 1 DEFINING THE ROLE AND NATURE OF ARTIFICIAL NEURAL NETWORK MODELLING Cultural Bias and Its Application to Cognitive Inquiry Plato believed that there existed a pure world of ideas, whose perfect forms were poorly mimicked by copies in the reality apparent to our senses. This concept of the dominant, lofty nature of ideas and the thoughts that manipulated those ideas would do more to inhibit inquiry into material processes than virtually any other single cause in the succeeding two millennia. The legacy of this philosophical outlook still permeates our culture, coloring the basic assumptions and viewpoints of researchers even after extended scientific training. The firmness of general belief in the separate existence of ideas contributed to the Lamarckian hypothesis of the inheritance of acquired characters. While there were critics of Jean Baptiste Lamarck's work from its introduction, these were by no means an overwhelming group by numbers. The rediscovery of Gregor Mendel's work in 1905 provided clear and convincing evidence against Lamarck's hypothesis, yet even Watson and Crick's elucidation and characterization of DNA failed to dispel the last vestiges of belief in the inheritance of acquired characters. The ready and continued acceptance of the Lamarckian hypothesis into the mid- twentieth century gives an indication of the continued influence of Plato's world of ideas, even when confronted with contradicting evidence 1 2 and convincing counterarguments. Biology, however, was by no means the only scientific discipline to be touched by the cultural bias of Platonic ideals. Physics experienced great upheaval and internal dissent as the deterministic view of Newton and Laplace gave way to the quirkiness and Heisenbergian uncertainty of quantum mechanics (Hawking 1988). In thinking about thinking, researchers have tended to denigrate approaches dependent upon investigation of physical processes. The high regard given ideas and thought processes has nearly always provided Western observers with a sense of revulsion at even considering that such sacred items could be solely the product of soft, squishy brain tissues and their component parts, neurons. This could be considered somewhat akin to Roquentin's nausea upon consideration of radical contingency (Ferguson 1987). Our culture does not encourage modes of inquiry that tend to displace the traditional view of the mind as the only known organ with which the perfect world of ideas can be perceived. It especially does not encourage the denial of the Platonic ideal world, yet it has been quite some time since any reputable scientist has explicitly advanced support for that concept. Cognitive science, however, must deal explicitly and forthrightly with the subject of ideas and thoughts. In a field where cultural bias has, can, and will directly effect the discussion of the subject at hand, it pays to recognize the existence and probable magnitude of that bias. I have briefly delineated the persistence and magnitude of our Platonic bias in biology and physics; I will assert that the bias is greater in 3 cognitive science, where it may be more directly challenged. Artificial Intelligence: Definitions Artificial intelligence describes at once a sub-discipline of cognitive science and the goal of that sub-discipline. The goal is the description and production of an artificial system or systems that can be described as intelligent in operation, function, or effect, in both global and local contexts. As with most complex fields, there are several ways in which to approach direct inquiry into the topic. In what has been termed the "top-down modelling school" of artificial intelligence, the emphasis has been upon systems of formal logic and other explicit symbol manipulation techniques. As may have been apparent from the previous description, there is a "bottom-up modelling school" of artificial intelligence. This school of inquiry seeks to examine the basic processes that are known to produce intelligent action in humans, those basic processes of neural function and coordination. By extension, the bottom-up modelling school is concerned with neural function in general. This is considered necessary because of the great complexity of biological neural systems. While much research has been done and continues to be done, there exists no basic understanding of the detailed structure and operation of biological neural systems. There is a wealth of data, but a paucity of organizing principles. The bottom-up modelling school attempts to provide possible organizing principles, testing these through a process of modelling and incremental design improvements. This area of research has become known as artificial neural 4 network modelling. The definitions given above differ somewhat from what may be considered standard in the artificial intelligence community. In their Turing Award speech, Newell and Simon state: The notion of physical symbol system had taken essentially its present form by the middle of the 1950's, and one can date from that time the growth of artificial intelligence as a coherent subfield of computer science. The twenty years of work since then has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence, attempting to construct and test specific systems that have such a capability. The second kind of evidence addresses itself to the necessity of having a physical symbol system wherever intelligence is exhibited. It starts with Man, the intelligent system best known to us, and attempts to discover whether his cognitive activity can be explained as the working of a physical symbol system. . . . The first is generally called artificial intelligence, the second, research in cognitive psychology. (Newell and Simon 1976) This formulation of definitions seems at once too narrow and not properly descriptive. It is too narrow in that it limits severely the range of activities which may be considered artificial intelligence research. While Newell and Simon talk of physical symbol systems, there is the strong implication that they refer only to computational methods of explicit symbol manipulation, as in the use of languages such as LISP and Prolog. The definition of the second part is too narrow, in that it postulates no overlap between the fields of artificial intelligence and cognitive psychology, and also provides no linkage for concepts of operation derived from empirical studies of biological intelligent systems to be incorporated into an artificial intelligence framework. As stated later by Newell and Simon, The symbol system hypothesis implies that the symbolic behavior of man arises because he has the characteristics of a physical symbol system. Hence, the results of efforts to model human behavior 5 with symbol systems become an important part of the evidence for the hypothesis, and research in artificial intelligence goes on in close collaboration with research in information processing psychology, as it is usually called. This seems to indicate that the previously given definitions were not truly descriptive of the relationship between artificial intelligence and other components of cognitive science. By including artificial intelligence as a sub-discipline of cognitive science, it becomes clear that artificial intelligence is not disjoint from research into considerations of what constitutes intelligence. An alternative formulation for defining artificial intelligence in more general terms is given by Charniak and McDermott (1985), where they state that artificial intelligence "is the study of mental faculties through the use of computational models." This form of definition allows for the top-down and bottom-up approaches to artificial intelligence to be given co-equal rank as complementary research disciplines. Artificial Neural Network Modelling Artificial neural network modelling, the bottom-up school of artificial intelligence, derives from the use of biological nervous systems as suitable exemplars for an approach to coordination, cognition, and control in artificial systems. This approach, in one form or another, has been with us for a long time. In the 1940's, McCulloch and Pitts demonstrated the possibility of casting boolean logic systems into networks of thresholded logic units. This certainly fits the Newell-Simon definition of a physical symbol 6 system. McCulloch and Pitts' seminal paper and subsequent work introduced a new concept for consideration: rather than driving the understanding of biological cognitive processes through increasing sophistication of technology, perhaps technology can be made more capable and sophisticated by elucidating mechanisms of function from biological processes involved in cognition (Levine 1983, 1990). This differs qualitatively from the viewpoint common to many observers of human intelligence. There has been a tendency for people to compare the operation of thought processes with the current leading edge of technology. Freud compared the mind to a steam engine, Twain would speak of "mill-works" in relation to speech production, and various persons in this century have blithely and confidently settled upon the computer as a fully analogous system. A circular system can be noted here, though, as the image of the mind as being like some mechanism helped contribute to such endeavors as Pascal's calculator, Babbage's Analytical Engine, and the Hollerith punched card controlled census tabulator. These efforts, in turn, inspired the modern computer. (The influence of the two World Wars upon the motivation for the development of the computer must also be acknowledged.) However, as the Metroplex Study Group on Computational Neuroscience (1988) points out, Computational study of brain structures began in the 1940's when digital computers were emerging. The first computer architectures were developed in a collaboration between the mathematicians John von Neuman and Norbert Wiener working closely with physiologists like Warren McCulloch. Early computer scientists were captivated by the analogy between the all-or-none action potentials of the neuron and binary switches representing bits in computers. Models of biological computation thus had a seminal role in the design of modern computer architecture. Thus, while comparisons of the workings of the mind to technology helped 7 to motivate research into further technological advances, the insight into the successful design for a significant new technology, computers, came instead from paying attention to the actual mechanisms of brain function. The computer has come to be pressed into service as a "new" metaphor for thought processes in biological systems. While the use of technological metaphors for cognition is persistent, having produced quaint turns of speech and aphorisms, it has not been significantly conducive to a direct understanding of the actual cognitive processes being described. The turnabout analogy produced by McCulloch and Pitts has provided far more benefits to cognitive science and computing through its establishment of neural modelling. Binary switching technology failed to provide significant insight into biological neural activity and function. Leon Harmon (1970) noted that "[a]nother kind of difficulty in coming to an understanding of nervous systems is that we may be conditioned into thinking about them in ways that are more constrained that we like to admit or are sufficiently aware of." The cultural identity which we share through environment can be a handicap in research into the nature and mechanisms of our intelligence. The model networks of McCulloch and Pitts, an example of which is shown in Figure 1, were seen to provide useful mechanisms and insights for computing machinery. These networks consist of threshold logic units receiving excitatory and inhibitory inputs of integer value. A simple sum of these values is compared with a threshold level for the unit, and the unit is considered to "fire" if its inputs equal or exceed the threshold value. A unit's activity is then carried forward to further connected units in the system. McCulloch and Pitts (1943) proved that any boolean 8 Figure 1, Mc-P 9 function could be created by the appropriate combination of such units. The McCulloch-Pitts formalism is quite similar to principles of design using TTL logic, as diagrams can attest. It should be noted that McCulloch and Pitts described their formalism in 1943, some sixteen years before the invention of the integrated circuit. While significant problems reduce the overall utility of McCulloch-Pitts networks, modified forms continue to be advanced (Szu 1989). Given the basis of a neural computational framework, additional features extracted from biological research were gradually added to proposed models. The idea that inputs to neurons had graded values modified by the efficiency of synaptic connections led to several important advances. Hebb proposed a synaptic modification rule to allow a form of learning in randomly connected networks (Hebb 1949, Levine 1983). His formulation was that the efficiency of a synapse which connects two neurons increases if both of the neurons have high activities at the same time. This relatively simple rule has a number of drawbacks, which have since been extensively cataloged and discussed. However, it cannot be denied that Hebb's Law, as it has been called, was an important advance for modelling. The idea of stating an organizational principle for network design also represented an important advance for artificial neural network modelling. Rosenblatt's Perceptron architectures demonstrate this well. Rosenblatt postulated several network models, among these was a three- layer network. The layer which received external inputs was the sensory layer, and the layer which gave an output (either as a raw signal or interpreted as a motor output) was termed the response layer. An 10 associative layer of neurons provided the articulation between sensory and response layers (Levine 1983). This general method of organization helped move artificial neural network modelling away from specifically designed networks and randomly organized network models. The drawbacks associated with the special design of network models remain a concern for ANN modellers today. Such systems are fragile or brittle, meaning that a fault in any part of the network could cause a general failure of function. Also, if an error in design occurred, the resulting network would have no method of overcoming the design fault. This particular pitfall is known by the assumption necessary for correct function to be achieved: "programmer omniscience." Since programmer omniscience cannot be guaranteed, systems predicated on this principle are inherently unreliable (Hecht-Nielsen 1986). Designing with generality as an important principle of function results in artificial neural networks that are described as self- organizing. A self-organizing network will conform its function to the particular problem or context through a process of adaptation or learning. Current Models Narrowly Focussed The tendency in developing an artificial neural network model is to constrain its function to a well-defined domain. This design principle enables better control over evaluation of the functionality of the design. Data from neurological or behavioral studies dealing with the problem domain can then often be directly applied to either training or evaluating the model. 11 There exist many general functions that are commonly encountered in biological neural networks which can be considered to have important implications for computational study. Among these are included associative memory, classification, pattern recognition, and function mapping. ANN models proposed to implement these functions include Bi- directional Associative Memory, Adaptive Resonance Theory (classification), Brain-State-in-a-Box (classification), Neocognitron (invariant pattern recognition), and Back-Propagation (function mapping) (Simpson 1988). Unfortunately, the underlying reason for existence of each model is overlooked in comparing different models. The human tendency to wish to attribute a single scalar quantity denoting how "good" an ANN model is will often cause people to overlook the fact that certain networks should not be compared as being equivalent. Lippmann provides a good basic overview of six different ANN models, yet falls prey to this pitfall. Each of the models is interpreted as a classification network, yet only the Hamming and ART networks are designed to function as classifiers (Lippmann 1987). Multi-architecture Integrative Systems The use of multiple complementary architectures in designing application systems has not often been explored. While this approach to system design is commonly touted in ANN simulation aid advertisements, it is less frequently featured in the literature. In an advertisement for the ANZA Neurocomputer, HNC Corporation states, "In these networks the 12 interconnect geometry is already determined and the form of the transfer equations is fixed. However, the number of processing elements (neurons), their initial state and weight values, learning rates, and time constants are all user selectable, thus allowing one to customize a particular network paradigm (or combination of paradigms) to suit a particular application." There are some reasons for the relative lack of multi- paradigm (multi-architecture) research results in publication. The design principles stated earlier still hold, that it is easiest to design an architecture with a narrow definition of function. There is the tendency for such architectures to incorporate simplifying assumptions which aid in simulation or real-world implementation ("casting in silicon," as the phrase goes). The ease of verifying correct operation is increased for architectures which have a narrowly defined application, as relatively clean data or simple theoretical findings are likely to be available against which the architecture may be evaluated. The relative complexity of modelling for even constrained contexts provides another reason for concentration upon single paradigm systems. The number of parameters which can effect a model's performance ranges from zero for linear systems, such as Widrow's ADALINE (Widrow 1987), to tens of scalar parameters, as may be encountered in Carpenter and Grossberg's Adaptive Resonance Theory architecture (Carpenter and Grossberg 1987a, 1987b; Simpson 1988). When dealing with nonlinear dynamical systems for which no closed-form solution exists, slight changes of system parameters can result in large scale changes in behavior. Carpenter and Grossberg point out relative constraints upon the parameters used in the ART 1 architecture, giving guidelines for values which should 13 yield stable operation of the network. Finding suitable parameters for operation of a particular architecture can be a frustrating and time- intensive experience. As an example, an architecture called the "on-center, off-surround" network (OCOS) is based upon a relatively simple equation that appears in slightly modified form in many competitive networks. One form of this equation (Grossberg 1973) is: dx/dt = -A Xi [first term] + (B-Xi) (Ii+f(Xi)) [second term] - (Xi+C) (Sum from k = 1 to n of ((k<>i) (Jk+f(Xk))) [third term] (Eq. 1) The first term represents decay of activity over time; the second term represents increase in activity due to excitatory input values, Ii, and recurrent self-excitation; the third term represents decrease in activity due to inhibitory input values, Jk and competition from other nodes in the network. For each term, there is an associated parameter. A hidden parameter of the architecture is a factor by which to multiply the result of the difference equation. This is usually set to be much less than one, especially when using the simple forward Euler method of system updating. However, there is a performance trade-off involved with setting this factor too small: the system will converge to a stable result, but will take a long time to converge. If the factor is set to a large value the network will "blow-up," which is a state of wild fluctuation in activation values for the nodes in the network. Such a system cannot converge. The solution is to find a value for this factor which will ensure convergence without an inordinate amount of time taken in reaching 14 convergence. The explicit parameters given in the equation must be matched to the size of the other parameters to maintain set relationships. In order to find parameter values, I have used a network simulation which runs through permutations of sets of discrete parameter values. By examining the resulting output which indicated whether the network run achieved convergence and the number of time steps necessary to achieve convergence, I was able to settle upon a suitable choice of parameters for regular use and was able to decipher a coarse set of relationships between the parameters. In consideration of these complexities in modelling and simulation, synthesis of complex systems from simpler subunits has some significant obstacles to overcome. There are more degrees of freedom for system operation, leading to complexities in articulation and coordination of subunits. This may increase the necessary number of system parameters, increasing complexity in a combinatorial fashion. Usually, there will either be less data available for evaluation of the complete system behavior, or the data that is available will be uncertain. The general principles to apply to a synthetic endeavor in artificial neural network modelling remain to be elucidated. In addition, synthesis has traditionally been given short shrift in Western culture (Paris 1989). The application of synthesis in philosophy has largely been confined to the works of Wittgenstein and Marx. Marx, of course, developed a philosophy whose application was inimical to most Western economic structures. This has caused both his work and his approach to be considered with contempt. On the positive side, however, synthesis can lead to significant 15 insights as different assumptions and features can be applied to functional design. The process of integration and hybridization in artificial neural network modelling may be expected to lead to new insights into structure and function of benefit to artificially intelligent systems, given the past history of benefits derived from consideration of concepts borrowed from disparate disciplines. In this case, the complexity of knowledge or topic of a contributing discipline can provide a measure of the possible beneficial overlap: as complexity increases, the set of concepts which may profitably be applied to the new context also increases. As this rule would predict, biological nervous systems provide a very large pool of concepts ready for reconstruction in the artificial intelligence framework. Biological Neural System Complexity The complexity which drives that expectation can be readily appreciated, as in the human nervous system the number of neurons is in the billions, and each neuron will have connection to between tens and tens of thousands of other neurons. This physio-spatial complexity pales when compared to the complexity that can result when one attempts to define a state for the neuron. The state can be considered to be a combination of electrical activity, ionic balance, hormonal levels, input activities, and many other factors. Since for several of these attributes there exist separate contributing factors, the number of components contributing to neuron state can easily exceed twenty, and there may be hundreds of such components. Add to this that these components often have 16 analog values, separate time scales of action, and capacity for permanent change in the neuron's response, and the possible state function for a single neuron displays a suitably bewildering complexity. Since this level of complexity is not conducive to direct examination, the process of problem decomposition and solution is applied. The implication for artificial neural network modelling is that certain gross features are modelled. Typically, a model assumes that the major interesting component of neural function is the electrical activity of the contributing neurons or nodes. A further assumption in general currency is that the electrical activity of an ANN may be modeled with the elements or nodes responding in the manner expected of neural populations rather than individually thresholded neurons (Levine 1983). Some models take into account generalized neurotransmitter effects, with Grossberg's gated dipole model providing a good example (Grossberg 1972). These simplifying assumptions still leave a high level of complexity to be dealt with in developing ANN designs and applications. Relevance of Biological Models Biology provides the exemplars for self-organizing adaptive systems which have proven useful in artificial neural network modelling thus far. While strict adherence to biological accuracy in modelling may well be counter-productive to advances in modelling techniques, rejecting biological principles may then lead to difficulties in later integrative work. The biological framework of neurophysiological function provides a 17 basic structure which makes for a common ground of interaction in models. For example, the range of activation and output of neurons is rather strictly proscribed, while the range of activation of an analogous unit in neural network models is limited only by the capability of the processor in specifying a floating point exponent. This would make interfacing two models using different ranges of activation less straightforward. The inherent complexity of biological systems can support two different arguments concerning the future of modelling. As Hawking (1988) says, describing a complex system all at once is terrifically difficult. It is much simpler to model several different components of the overall system. Bringing these partial models together to form a complete model can become problematic, as in Hawking's example of the search for the Grand Unified Theory in physics. In one sense, then, the effort to integrate models of limited function may be premature. However, it may be that along with the incremental advances made in modelling small subunits of biological neural function, we should also attempt incremental integration of well-understood low level models. This would help prevent the kind of situation which exists in physics, with two or more highly disjoint models prevailing and no unification or integration yet in sight. Since biological systems are inherently more complex than the physical systems upon which they are based, it becomes important to keep an eye on the eventual need for integration and resolution of subsidiary models. Models which utilize features from two or more extant ANN architectures such as Hecht-Nielsen's Counterpropagation Network (Hecht- Nielsen 1987) demonstrate the useful qualities which synthesis of models can bring to functionality. The counterpropagation network is derived 18 from Kohonen's self-organizing map model and Grossberg's competitive learning networks. Hecht-Nielsen notes that this network is designed for function mapping and analyzes its performance as compared to the back- propagation architecture. While the general functionality of counter- propagation networks (CPN) remains lower than that of back-propagation networks, there exists a subclass of mapping functions for which the CPN will train faster, and there also exists a closed-form solution for the error of the CPN. As Hecht-Nielsen notes, "Finally, CPN illustrates a key point about neurocomputing system design. Namely, that many of the existing network paradigms can be viewed as building block components that can be assembled into new configurations that offer different information processing capabilities." Artificial Neural Network Models: Three Architectures For the purposes of exploring multi-architecture system design, I selected three artificial neural network architectures for incorporation into the overall system. These were the Hopfield-Tank network, back- propagation, and Adaptive Resonance Theory 1 models. These networks perform three different functions: Hopfield-Tank is used for optimization or constraint satisfaction, back-propagation is a general-purpose mapping network, and Adaptive Resonance Theory 1 is a classifier network. 19 Hopfield-Tank Networks In 1982, John Hopfield wrote an article, "Neural networks and physical systems with emergent collective computational abilities," which was published in Proceedings of the National Academy of Sciences, describing a model of neural computation which was readily implementable in current solid-state technology. This article and the model which it described has been widely credited with a resurgence of interest in ANN modelling. The network architecture presented by Hopfield and Tank (1985), hereafter known as HTN for "Hopfield-Tank Network[s]", is a single-layer fully interconnected network model (Figure 2). There is no learning rule for this network, although various researchers have proposed modified HTN architectures that do incorporate adaptive learning. Weights between nodes in an HTN are fixed and symmetric, and connections between a node and itself are zero. The advantages of these criteria for network design are that the system dynamics can be shown to perform an "energy minimization" in reaching a stable state. When the weights are determined according to system constraints, the system can be characterized by a Liapunov function, providing a measure for system energy. HTN's have been applied to various constraint satisfaction and optimization problems. Hopfield and Tank attracted much attention by demonstrating the utility of an HTN in generating good solutions to the "Traveling Salesman Problem" (or TSP), a non-polynomial time complete problem. The TSP can be described as choosing a minimum length path among 20 Figure 2 21 a set of cities such that each city is visited once, and the salesman returns to the city of origin. The closed path length constitutes the measure for any particular solution. As the number of cities increases, the number of possible valid tours increases combinatorially. However, it can be shown that an HTN will produce good tours in constant time. An HTN used for TSP computation does not necessarily converge upon the optimum solution, but will reject non-optimum solutions and give relatively "good" solutions. Unfortunately, it can also be shown that the HTN trades off constant time operation for O(n2) space considerations. Figure 2 shows the HTN architecture for solution of the TSP. Although the nodes in the network are treated as elements in a vector, the visual representation which makes the most sense to the designer and other interested humans is a matrix of nodes. By imposing this structure upon our view of the network, we can associate each row with a city in the tour, and each column with the position of a city in the final tour (City n is in Position m in the tour). The constraints of importance are that for a valid tour, each city appears once, and each position in the tour has only one city. Thus, for the state of the network at equilibrium, we should find high activities in n nodes, where n is equal to the number of cities or positions. The interconnections between nodes in a row or column are inhibitory, causing highly active nodes to reduce the activity of other nodes in the same row or column. The interconnections from a node to its neighbors in adjacent columns are proportionally more inhibitory as distance between cities represented by those nodes increases. 22 Hopfield-Tank Equation The equation defining the network's activity over time (Hopfield and Tank 1985) is Ci (dui/dt) = (Sum from j = 1 to N of (TijVj - ui/Ri + Ii)) (Eq. 2) where i and j designate neurons in the network, N is the total number of neurons in the network, T is a connection weight between neurons, V is the output value function for a neuron, u is the activity of a neuron, I is the external input for a neuron, C is the capacitance of a neuron, and R is the resistance of a neuron. Back-propagation Networks Back-propagation is a short version of "correction by the backward propagation of errors." The learning rule used in back-propagation networks (BPNs) is termed the generalized delta rule (Simpson 1988; Rumelhart, Hinton, and Williams 1986). Basically, a BPN is a multi-layer (with at least three layers) network whose nodes use a sigmoid output function. The BPN will map a set of input activities to another set of output activities, given training upon a set of example input/expected output vector pairs. The BPN will generally discriminate and adapt to 23 non-linear relationships in the training data. For example, a BPN can learn the exclusive-or relation, which a single-layer perceptron cannot. Nodes in the BPN are often called "units." The basic premise of the back-propagation algorithm has been independently derived several times. Werbos (1974) is generally credited with the first publication of the learning rule, which he called dynamic feedback; Parker (1985) gave the rule the name, "learning logic;" and the popularity of the BPN architecture is primarily attributable to Rumelhart, Hinton, and Williams (1986), as related by Simpson (1988). All units in the BPN operate in basically the same manner. There are some slight differences dependent upon whether a unit resides in the input, hidden, or output layer. Generally, however, a unit generates an output signal as follows: oj = f(netj) (Eq. 3) where j is a unit in the BPN, net represents the input to a unit, f is a sigmoid function, and o is the output of a unit. netj = (Sum over i of (oiwij)) (Eq. 4) where i is a unit in a preceding layer of the BPN, and w is a connection weight linking two units. 24 f(netj) = 1 / (1 + e(-(netj + thetaj))) (Eq. 5) where theta is the bias weight for the unit. An input unit has a net input which is simply the provided external input, as there are no preceding layers. Figure 3 demonstrates a three layer BP network. At the bottom are five input units, which receive their activation from an external source. In the middle are sixteen hidden units, which receive their input according to Equation 4. The hidden units' output is sent on over another set of weights to the single output node. The error between the external training signal and the output node's output activity provides the basis for correcting the behavior of the network. This raw measure of error is used to find the delta for the output node. deltak = (tk - ok) f'(netk) (Eq. 6) where k is an output unit, t is the expected output, f' is the derivative of the output function. 25 Figure 3 26 Fortunately, the derivative of the sigmoid function is symbolically easy to specify: f' = f (1 - f) (Eq. 7) The network then must distribute this error measure backward through the network. For each of the hidden units, then, deltas are found as follows: deltaj = (Sum over k of (deltakwjk)) f'(netj) (Eq. 8) where k is a unit in the succeeding layer of the BPN. Each of the weights in the network is changed according to: (Change in wij) = L * oideltaj (Eq. 9) where L is a constant representing the learning rate for the BPN. Theta values for nodes are treated in the same manner, thus (Change in thetaj) = L * f(thetaj)deltaj (Eq. 10) So, for the example BPN in Figure 3, an input vector causes output of the input units to be distributed to the hidden units, modified by the intervening weights. Similarly, the hidden units send on their output through weights to the output unit, which provides the response of the network to the particular input vector. This is called "feed-forward" 27 processing. Once the network output is known, it can be compared to the expected output, and the delta value for the output unit is determined. This begins the "back-propagation" process. The delta values for the hidden units can now be derived as in Equation 8. The amount of change for each of the weights between the hidden and output layers can now be found according to Equation 9. Weight changes between input and hidden layers proceed similarly. For each node in the hidden and output layers, the value of theta is also changed. This completes the normal back- propagation phase. In the example problem, we have made a slight change to the normal back-propagation process, and have allowed theta for the input units to also be adaptively changed. In practice, one would normally construct a network with the correct numbers of input and output units, make some guess as to the number of hidden units needed, and assign random values to the weights. The network would then be trained upon the available data vector pairs until the error becomes suitably low, or the implementor decides to make a change in network design. Possible applications for BPNs include encrypting, data compression, non-linear pattern matching and feature detection. Existing BP applications include translation of text inputs into phoneme outputs, acoustic signal classification, character recognition, speech analysis, motor learning, image processing, knowledge representation, combinatorial optimization, natural language, forecasting and prediction, and multi- target tracking. BP has been implemented or theorized in electronic, 28 VLSI, and optical formats (Simpson 1988). Adaptive Resonance Theory 1 Adaptive Resonance Theory 1 (ART 1) is a model introduced by Gail Carpenter and Stephen Grossberg (Carpenter and Grossberg 1987a). There are two ART models of note, ART 1 and ART 2, and many modified architectures which are premised upon one or the other of the ART models. Basically, an ART architecture is a two-layer network which provides unsupervised learning of categories of inputs (Figure 4). The F1 layer is composed of "feature nodes," which accept external inputs. In ART 1, these inputs are binary patterns, while ART 2 incorporates preprocessing to accept analog inputs to the F1 layer. The F2 layer is composed of "category nodes," which compete to respond to valid F1 activations. There are control structures built into the architecture to prevent F2 activation without input being received at the F1 layer. There are other control structures to prevent "resonance" from occurring when the prototype pattern determined by the most active F2 node does not correspond to the pattern of F1 activation. Short term and long term memory are represented in ART architectures by node activations and inter-node weights, respectively. A "bottom-up activation" refers to the pattern of activation received by F2 nodes through the weighted links from F1 nodes. Similarly, a "top-down activation" is the pattern of activation received by F1 nodes through weighted links from F2 category nodes. Long term memory is changed only through resonance between the F1 nodes and a selected F2 node. A more 29 Figure 4 30 detailed diagram of the ART 1 interconnections is shown in Figure 5. Resonance is a state in which long term memory traces between the F1 and F2 layers are modified to more closely represent the input activation in the category node's top down weights. An F2 node which wins the competition among F2 nodes has its top-down activation tested against the bottom-up activation of the F1 feature nodes. If these match within a level of tolerance, called the vigilance level, a resonant state is entered and long term memory is changed. If not, the current winning F2 category node is made ineligible for further consideration against the input, and F2 competition is restarted among remaining eligible F2 nodes. For an example pattern presented to the ART network given in Figure 4, the presence of input turns on the gain control nodes and activates the F1 layer. The activation of the F1 layer, or bottom-up activation, is fed across a set of long term memory weights to generate a new pattern of activity at the F2 layer. The F2 layer responds with a top down activation which is filtered by the top down weights linking each of the F2 nodes to the F1 layer. Competition among the F2 nodes results in a single winner for application to the current feature input. A comparison of the bottom up activation with the top down activation yields a set theoretic measure of the match between the presented pattern and the category represented by the F2 node. If I represents the input pattern, V(J) represents the F2 category node J's archetypal pattern, and p represents the vigilance parameter, then the cardinality of the set intersection between pattern I and the category pattern V(J) must be greater than or equal to p times the cardinality of I, or reset occurs. If reset occurs, the F2 node J becomes ineligible for matching to the 31 current input pattern, and the process of bottom up activation, top down activation, competition, and match testing continues until some category node is found to match the input sufficiently well, or until all category nodes have been matched against the input and found to be too different. Figure 5 displays a more detailed ART 1 network, with the various weights, nodes, and connections visible. Obviously, it is possible that none of the eligible F2 nodes will match within the vigilance level's acceptable tolerance. When an ART network is first trained, there are no eligible F2 nodes, rather there are a number of uncommitted F2 nodes. An F2 node will be selected and enter a resonant state, providing the first category. If a subsequent input does not match this category within the vigilance level, the single F2 category is rendered ineligible and the F2 layer is reset. This brings the network to a state analogous to that of having no eligible category nodes available, and a second category node is selected and resonated with the F1 layer. At any point where no further eligible F2 nodes exist, but an uncommitted F2 node remains, then a new category node is formed from the formerly uncommitted node. If no uncommitted category nodes remain, then the input has been found not to match the available categories. 32 Figure 5 CHAPTER 2 INTEGRATION IN NEURAL NETWORK MODELLING Integration in neural network modelling is taken here to mean the combination of different neural network architectures in a coordinated system. This differs from the casual usage sometimes found in the literature where the term has been applied either to systems composed of multiple units of the same base architecture, or else to trivial modifications of a known architecture. Integration applies properly to cases of multiple-architecture systems and there are some instances of systems for which the term genuinely applies but has not been used. As an example, Matsuoka, Hamada, and Nakatsu (1989) have proposed an architecture for phoneme recognition that subdivides the hidden and output layers of a back-propagation network in order to enhance the network's ability to recognize phonemes and also to substantially reduce the training time necessary for the network. However, Matsuoka terms this architecture the Integrated Neural Network (INN). While there is a substantial improvement in training time, there is no fundamental difference between an INN and a back-propagation network: they differ only in the connectivity of the weights between the hidden and output layers. The reduction in internal complexity of the INN can explain the decreased training time discovered. A slightly different form of integration is pursued by Foo and Szu (1989). A "divide and conquer" approach to problem solving employs the same architecture, a modified Hopfield-Tank network, to handle smaller 33 34 subproblems and then brings together the resulting subproblem solutions into an overall solution. This requires some coordination to effect the overall solution, bringing into play elements of the broader integrative issues I have noted, but is not properly an integrated system as I have defined it. A better example of an integrated artificial neural network system appears in Cruz et al. (1989). Cruz uses a MADALINE architecture for image preprocessing and a back-propagation network for removing image distortion. The MADALINE architecture is a Multiple ADALINE system, developed by Widrow (Widrow, Pierce, and Angell 1961). The ADALINE, short for Adaptive Linear Neuron, is a neuronal model using the Least Mean Square learning rule developed by Widrow and Hoff. Widrow and Winter (1988) have updated the learning rule used for multiple layer, multiple ADALINE networks. The specific example given by Widrow and Winter presents a MADALINE network for invariant pattern recognition. The MADALINE architecture most closely resembles the Perceptron architecture given by Rosenblatt. In creating his integrated system, Cruz applied the "divide and conquer" concept not for the purpose of reducing simulation time, but in consideration of space requirements needed for a system which could handle the 256x256 pixel images used. Integration and Convergence A perennial problem for the artificial neural network modeller is the issue of convergence in finite time. It is nice to know that the architecture selected for a function will converge to a solution before 35 the heat death of the universe. It is similarly a concern that a system composed of subsidiary architectures will converge. This can be problematic, since general convergence theorems have not been found for several of the most popular architectures (Widrow 1987). Back-propagation provides a particularly pointed example, since it is by far the most popularly used network, if number of papers concerning applications is taken as the criterion. The generalized delta learning rule of back-propagation has long been appreciated to be generally useful, yet significant progress in firmly establishing this usefulness in the form of theorems concerning convergence has been lacking. Sontag and Sussman (1989) provide a theorem demonstrating for back-propagation an analogous result to the perceptron learning rule: if a separation solution exists, the generalized delta rule's gradient descent will find the solution in finite time. While much has been made of a theorem by Kolmogorov (Farhat 1986), it must be conceded that Kolmogorov's theorem is an existence theorem: there is some network based upon the back- propagation architecture which will perform a mapping from n to m, but the theorem gives no clues as to what that specific network looks like. There is some promising news concerning convergence which is of interest for building integrative ANN systems. Hirsch (1989) gives several theorems which hold that if component subnets of a neural network converge, then the network will converge. While Hirsch's theorems assume that the convergence properties of the subnetworks can be described by Liapunov energy functions, he notes, "It is more difficult to obtain convergence for cascades of systems that are merely assumed to be convergent, but without benefit of Liapunov functions or global asymptotic 36 stability. One way of doing this is to place strong restrictions on the rates of convergence." Hirsch defines a cascade as a layered network where the output of one layer serves as the input of the next. Many integrative designs can be cast into this framework. Incremental Synthesis The process of synthesis leading to integrative artificial neural network modelling is important to the development of insights into topics of critical application, such as sensor fusion. By confronting directly the need for coherent internal use of available resources and capabilities, we are more likely to generate an understanding of fusion principles. The synthetic approach to modelling provides a supportive environment for creating extensive systems. Just as the topic of artificial neural network modelling benefits from the interdisciplinary nature of its supporting sciences, so a synthetically derived artificial neural network system benefits from the range of problem solution approaches and features inherent in the underlying network architectures. By creating and maintaining a system of network architectures applicable to subfunctions in the problem solution, subfunction solution by a particular system component can benefit from the co-option of features normally found in different components of the artificial neural network system. 37 Networks Under Consideration, System Properties Each of the three network architectures used in the example problem of melodic composition has its own set of features and drawbacks which play an important role in system design. Hopfield-Tank Networks As mentioned earlier, the Hopfield-Tank architecture is generally used in achieving a specific function. By this I mean that each Hopfield- Tank network is designed for a particular purpose, and can provide no functionality for other unrelated purposes. So the use of a Hopfield-Tank network should generally be reserved for functions which do not change over time. However, I will immediately cite a counter-example, due to its elegance of integrative design. An ingenious mechanism for extending the utility of the Hopfield-Tank architecture is pursued by Tsutsumi (1989), where one back-propagation net remaps Hopfield-Tank network inputs and another back-propagation network remaps Hopfield-Tank network outputs. The problem given is one of avoiding robotic arm deadlocking. The movement space is constrained, and therefore applicable to Hopfield-Tank network solution. However, the adaptation of the internal space representation to the real world arm movement must be adaptive. The Hopfield-Tank network does not provide learning rules, so the back-propagation networks provide the adaptation to real world feedback. In this manner, some significant benefits accrue to the use of the integrated system: since the Hopfield-Tank network is 38 static with respect to the encoded weights, it provides a good repository for the robot's joint-arm space; since the back-propagation networks are adaptive, the system can configure itself to respond to a changing environment. Since the Hopfield-Tank network was conceived of in the context of implementation in silicon, the possibility for reducing a Hopfield-Tank network instance to a hardware component can be important for real-world applications. This step would bring the benefit often touted for Hopfield-Tank networks, speed. Speed is rarely noted in practice, since most practice involves simulation. The simplicity of the Hopfield-Tank architecture can be a strong point for system design and integration, however, even in simulated systems. A drawback which may eliminate the Hopfield-Tank network from consideration for a particular function is that the weights for the network must be derived from the constraints to be implemented and from any data functions necessary for solution but not available in the input to the network. This requires an understanding of the system to be solved, which may not be available. Some architecture with learning rule would then be more suitable for application. The output of a Hopfield-Tank network must be deciphered from the final pattern of activation of the net at equilibrium (cf. discussion in Chapter 1). In the case of the Travelling Salesman Problem, position of active nodes provides an encoding of placement of cities in the tour. This information might be rendered more compact in another format, depending upon the input type expected for further networks in the system. 39 Back-propagation Network System Considerations The back-propagation architecture provides a mapping from the input vector to the output vector, and can be trained by example. The system properties of back-propagation networks include stability of learning given a fixed universe. By implication, the learning is not stable if some perturbation in the problem set changes the mapping function. The back-propagation network would then learn the new mapping over time, and the old mapping would be lost. Back-propagation networks have a moderately complex structure. The properties gained from this increase in complexity over the Hopfield-Tank architecture include the capacity for learning, the ability to extract features from input data and generate internal representations for those features, and the possibility of complex input to output transforms in accordance with learned associations. Back-propagation networks can accept binary or analog inputs, so the inputs can represent conditional probabilities as well as more strictly constrained values. The outputs are analog values which can be interpreted as binary through the use of thresholding functions. This allows a wide variety of input and output possibilities to achieve overall system function. However, the choice of representation of inputs can be critical for speedy and reliable training to occur. For example, use of analog values should be avoided when there exists a natural partition of the range of the input into distinct states. Our example of the melodic note generator will illustrate this concern. 40 ART 1 System Properties Adaptive Resonance Theory 1 architectures provide unsupervised learning of "clusters" or classifications of input vectors. An internal representation of classification archetypes is generated. This architecture ensures that new inputs will tend not to perturb classifications of previous inputs. This compromise in the stability- plasticity tradeoff (Carpenter and Grossberg 1987a, b) can be modified for special purposes, as the melodic note generator program will demonstrate. The internal operation of the ART 1 network can provide certain features of especial interest to system design. Specifically, one by-product of the classification algorithm is the detection of novelty, which will be shown to have functional significance beyond that of the original design of the ART 1 network. Some properties of ART 1 require particular attention from the designer. For example, there is a matching parameter (vigilance) which controls how much deviation from a category prototype is acceptable. There are no guidelines for the selection of the vigilance parameter, and it is left to the designer to select and assign a "proper" value. Some guidelines do exist for certain of the other learning parameters in the ART 1 architecture, such as the learning coefficients for each of the top- down and bottom-up memory equations (Carpenter and Grossberg 1987a). The ART 1 architecture provides no standard output. The designer must access internal values of the ART 1 network to provide useful information to the remainder of the system which includes that network. ART 1 is a highly complex architecture with many parameters to be 41 selected and set by the designer. Useful modifications to be made to the architecture would include creating adaptive functions to replace some of the static and arbitrary parameters of the network, such as the gain and reset parameters (cf. Figure 5). 42 Table 1. System Properties Summary Hopfield-Tank network properties: - Data initialization process only place for changing adaptively (not "learning" at all) - Fast convergence (on system, not necessarily on simulation) - Inflexible structure (individual design necessary) - Simple structure Back-propagation Network Properties: - Stable learning given fixed universe - Change to design implies relearning necessary (costs for self-adaptation include forgetting what has been learned) - Adaptive weights changed as consequence of training (supervised learning) - Medium complexity of structure - Input to output transform can be computationally complex - Output nodes may deliver digital, bipolar, or analog values ART Network Properties: - Stable self-organizing learning - Change in design produces unknown effect on existing "knowledge" - Non-adaptive parameters for gain and reset possible drawbacks, should be replaced by adaptive functions System considerations Considerations of interfacing outputs of one model to inputs of another: Inputs: HTN : analog or discrete value, one per node BPN : analog or discrete value, one per input node ART 1: discrete value, one per input node ART 2: analog value, one per input node Outputs: HTN : discrete value, one per node BPN : analog or discrete value, one per output node ART 1 : none specified ART 2 : none specified CHAPTER 3 EXAMPLE PROBLEM In developing a simulation to test out processes of integration, several factors have to be considered. The simulation should have enough scope to provide good subsidiary roles for the component functions, it should be small enough to be implementable in a reasonable period of time, and it should produce output of a form which is readily apprehensible and analyzable. The first criterion is easily fulfilled; the second and third are rather more difficult to fulfill. As a starting point for exploring the possibilities of an integrated approach to artificial neural network modelling, the problem of producing a melodic line in music composition was selected. The complexity of music composition in general provides ample considerations for the application of component networks. Unfortunately, the complexity of musical composition offers no hope for the relatively simple design and implementation of an artificial neural network system which addresses all the salient points. Therefore, simplifying assumptions are made to ease the requirements for the ANN system. The output, to be interpreted as a sequence of musical notes, can present some problems with evaluation, due to the qualitative context of musical evaluation in general. However, it is possible to treat the note sequence as a set of concatenated symbols, and apply some of the information theory concepts of Shannon (1948) to conduct an analysis. 43 44 Simplifying Assumptions The great complexity of musical composition in general is constrained to yield a problem of suitable scope for the example integrated ANN system. A limited scale covering one octave is assumed, in the key of C. There are no accidentals, and there is no explicit timing of notes. A single voice is assumed, and there are no harmonics generated. A limited and fixed set of classical composition rules forms the basis for the constraint and comparison of the system. Problem Approach The use of several ANN architectures in creating an integrated system whose function fulfills the requirements of the musical composition problem is assumed. A preliminary set of hypotheses as to how a composer develops a line of melody was advanced for defining the subfunctions of the composition system, or note generator. As Teuvo Kohonen (1989) notes in discussing his own ANN system for musical composition, It is not possible to survey here the development of ideas in computer music. One of the traditional approaches, however, may be mentioned. It is based on Markov processes. Each note (pitch, duration) is thereby regarded as a stochastic state in a succession of states. The probability Pr = Pr(Si | Si-1, Si-2, ...) for state Si, on the condition that the previous states are Si-1, Si-2, ..., is recorded from given examples. Usually three predecessor states are enough. New music is generated by starting with a key sequence to which, on the basis of Pr and, say, the three last notes, the most likely successor state is appended. The augmented sequence is used as a new key sequence, and so the process generates melodic successions ad infinitum. Auxiliary operations or rules are necessary to make typical forms (structures) of music out of pieces of melodic passages. 45 Leaving the problem of constructing musical forms for possible later consideration, the approach given by Kohonen matches well the procedure of composition undertaken here. A candidate note and note sequence are proposed, a critique according to classical rules is made concerning the last note in the candidate sequence. The approval of the critic should mostly cause the acceptance of the candidate note, but the rules should be broken often enough that there is not an absolute conformity to the rules postulated. Now we must confront the problem of how to best combine models in a unified structure that accomplishes the example function of melodic composition. The solution involves identifying the strengths and weaknesses of the component models, and which functions each may accomplish. Then when the subproblem mapping has been accomplished, it must be determined how to integrate the subfunction module outputs to other modules to accomplish the overall task. Example Problem Network System The identifiable problem subfunctions are candidate note generation, sequence critique, and novelty detection. The candidate note generator subfunction should propose notes in general, but not complete, conformance to probabilities of the next note filling the requirement of being part of a classical sequence. The sequence critique subfunction should evaluate the proposed next note in strict conformance to the set of classical sequences provided. Since the evaluation given by the sequence critique subfunction is an incomplete criterion for composition given expectations 46 of novelty, some means of detecting novel sequences must exist for the use of the coordinating system. The coordinating system then has the information necessary to "break the rules" when needed to avoid long, boring sequences of strictly mechanical melody. This can also be seen as a requirement for stable and continued operation, as there exist sequences which have no classical next note possibilities. Candidate Note Generation The candidate note generator should produce plausible next notes given a historical partial sequence. Since the rules for identification of plausible next notes are fixed and known, there is no need for learning in this stage of the network. The candidate note generator should also occasionally provide notes which do not necessarily conform to the expectation of a classical next note. The Hopfield-Tank network (HTN) was selected as the candidate note generator since it provides the above features. HTNs are noted for their utility in constraint satisfaction and other optimization problems, which fits the requirement that only a single next note is to be proposed at a given time and that it should basically follow the probabilities of a classical next note. A well known attribute of the HTN architecture is the inclusion of spurious local minima which do not represent "valid" solution states. By purposefully utilizing this feature, we can convert what normally constitutes a drawback into a asset for the subfunction. The spurious local minima will give the occasional proposal of non- classical next notes. The constant and known nature of expected sequences 47 determines the formation of the HTN weights, in conjunction with the known constraints for proper HTN function. The use of "noisy" input values can produce the semi-random distribution of possible notes that is needed for variability. Figure 6 shows the modified HTN architecture, called Bach, used in our simulation. The rows represent note values and the columns represent sequence placement. The constraints imposed on this network include the need to present a single winning note in each place in the sequence pattern, and the need to prevent endless repetition of the same candidate note. By introducing relatively strong inhibitory links within rows and columns, we can satisfy the constraint requirements. We achieve preferential selection of classical next notes by reducing those inhibitory links somewhat for connections which follow classical sequence patterns. Sequence Critique The sequence critic should provide an evaluation of the conformance of the proposed next note to classical melodic sequence rules. If the sequence rules are assumed to be provided by example, then some learning function is required to allow the critic to become adept and reliable. The subfunction receives a note sequence as input, and produces an output which may be interpreted as a boolean statement of the classical nature of the candidate note. The back-propagation network (BPN) was selected as the sequence 48 Figure 6 49 critic. The BPN, as discussed in Chapter 2, can learn any input/output transformation given to it (within some constraints upon the availability of sufficient hidden nodes to form a stable internal representation). The form of the BPN used, called Salieri, was in the end a network accepting a binary representation of the input sequence, using twenty hidden units for internal representation, and producing an output value which was interpreted as a yes-or-no output. So the net structure used forty input units, twenty hidden units, and one output unit. This is a medium sized BPN. Novelty Detection The novelty detection function must have components to enable the recollection of past sequences for discrimination of novel sequences. However, the space requirements should not be overwhelming. A classification system would provide good data compression while allowing the nearly complete context information needed for novelty detection. The ART 1 architecture was selected for the role of the novelty detector, for it provides novelty detection as part of a classification framework. The ART 1 architecture also has other features of interest to further research on integrated and extensive systems. The ART 1 network used here, called Beethoven, is modified from the Carpenter-Grossberg architecture. Some of the explicit features of the Carpenter-Grossberg architecture are handled as implicit assumptions in the procedural simulation. The "2/3rds Rule," for example, is not invoked, since the only time that Beethoven is active, the network meets 50 the constraints of the rule. The separate rules for top-down and bottom- up weight modifications are replaced by a single rule for both, as is done in the ART 2 architecture (Carpenter and Grossberg 1987b). Coordination In any integrative system, some means of coordinating subfunctions becomes necessary. Some interpretation and processing of input or output terms may be accomplished by the coordinating system. The ultimate decision for whether or not a candidate note is accepted falls to the coordinating system. The coordinating system is called Lobes, since the features and activities of the frontal lobes (Levine 1986, Levine and Prueitt 1989) provided the inspiration for its operation. Lobes generates the context management and state dependent actions which drive the integrated system to completion of the intended function, melodic composition. Lobes also contains an internal boredom function, which tends to increase over time. There is a boredom threshold, which causes a change in behavior in Lobes if it is exceeded. Operation In operation, the integrated ANN note generator uses its components in a sequential manner. Lobes generates a call to Bach for a candidate note, which when returned is sent to Salieri for critique. As a mechanism for preventing wastage of Beethoven category nodes, the entire sequence 51 which Bach settles upon is used for further processing by Salieri and Beethoven. Since at the beginning of composition there is no sequence history, Bach's inputs are determined randomly over the entire sequence. As notes are added to the sequence history, Bach inputs are determined with the addition of some noise, except for the candidate note column which receives only random noise as input. Salieri receives the Bach sequence values and makes a determination of classical conformance, which it passes back to Lobes. Lobes then sends on the entire context, sequence plus critique, to Beethoven. At first, Beethoven is virtually certain to encode new categories for each input it receives. As time goes by, the likelihood that Beethoven will encode a new category decreases, until all category nodes are utilized. So it is more likely that the first few notes generated will conform to classical sequence examples. Sometimes, however, a note which has no possible classical successors will be generated early in the simulation. In this case, it is likely that Beethoven's category encoding will proceed at a much faster than average pace. The indication of novelty generated by Beethoven is used by Lobes to modify the system response to the internal state, which is determined by Salieri's critique of the next note, Beethoven's detection of novelty, and the boredom threshold in Lobes. If Lobes is not bored and Salieri approves of the candidate note, the note is accepted. If, however, Lobes has reached its boredom threshold and Salieri approves of a note, it will reject the candidate note and request another one from Bach. Likewise, if Salieri disapproves of the note but Lobes is bored, Salieri may be overridden and Lobes may accept the note. Indications of novelty from 52 Beethoven can satisfy Lobes' drive to be "excited," or not-bored. This will tend to make Lobes more conservatively classical as Beethoven continues to detect novelty. Figure 7 shows the operation of the coordinated system. 53 Figure 7 CHAPTER 4 RESULTS Performance The integrated ANN note generator system produces 152 notes in about three hours when run on an 80386 PC compatible at 16MHz clock speed. It takes approximately fifteen hours to produce the same number of notes on an 8088 based machine with an 8087 numeric coprocessor. Example and Analysis of Output Appendix A contains sample output from a run of the note generator. With a problem such as musical composition, assigning an objective measure denoting the "worth" of the output is not possible. However, it is possible to compare the output of our note generator system with random sequences of note values. By use of a binomial performance measure, it is possible to define how much the sample output differs from a random sequence. Random sequences have their own mystique and interest, but subjective evaluations of random melodic forms by the untrained ear tend toward the negative. The output of our note generator network was intended to basically follow the guidelines of an example set of classical sequences. The system included mechanisms for breaking out of a strict adherence to the guide set of sequences. Thus, it would be expected that the output 54 55 have somewhat more resemblance to random sequences than the "rules" would state. Since the classical rules of composition, while not our sole criteria of fitness, are the only quantifiable part of our criteria, we compare our output with random note sequences on the basis of these rules. Table 2 summarizes the characteristics of output sequences generated under several different conditions. A random note generator, a mostly classical note generator, and our integrated ANN note generator each produced outputs and were evaluated using the critic developed for training Salieri. The "Successes" column indicates the number of times the next note in the sequence could be considered to be classical. The random note generator simply output a sequence of random numbers for a set sequence length. The Classical Instructor sequence generator operated by determining the available pool of possible classical next notes, then randomly selecting one of those notes. In some cases, no next note fit the criteria of being "classical," and a random note was generated. The rest of the sequence generators were variations upon our ANN note generator. The use of different back-propagation nets in the critic role gave different results in the output. Trained and untrained back-propagation nets with analog inputs were used. No significant difference in output could be distinguished between the trained and untrained versions, but the result was far closer to random performance than classical. The inability of the Salieri net with analog inputs to converge to a reliable and accurate performance measure explains the similarity of result with the untrained version of the same type. On the other hand, a Salieri composed of a trained back-propagation network using a binary representation for inputs was able to converge to a stable and 56 fairly accurate state. Hence the performance of the trained Salieri with binary inputs was considerably closer to classical than was the performance of the random sequence generator. In two cases, a rule-based critique system was substituted for the back-propagation network. 57 Table 2. Classical components of output sequences. Sequence generator Sequence Successes % Z versus Z versus Length random classical Random 10,000 890 0.089 0.00 -102.54 System w/ untrained Salieri (analog inputs) 152 18 0.118 1.26 -23.84 System w/ trained Salieri (analog inputs) 152 19 0.125 1.54 -23.64 System w/ trained Salieri (binary inputs) 152 36 0.237 6.28 -20.07 System w/ rule-based critique (Salieri supervisor) Run 1 150 35 0.233 6.10 -20.06 Run 2 150 47 0.313 9.42 -17.50 Classical Instructor 8,150 6898 0.846 102.54 0.00 (rule-based critique w/out ANN system) CHAPTER 5 DISCUSSION The integrated note generation system's performance suggests that it met operational expectations. It produced notes according to a mixture of somewhat conflicting criteria. This kind of operation in the midst of uncertainty characterizes many human decision-making processes, and may be assumed to play a role in human music composition as well. Our framework allows for further experimentation with hypotheses concerning the fundamental processes involved in higher-order constraint satisfaction systems in an extensive environment (cf. Pao 1989). The integrated approach has demonstrated several advantages and disadvantages in development and operation. The disadvantages include the complexity of handling several different network architectures at once, which can contribute to programmer confusion (the downfall of programmer omniscience). The necessity for dealing directly with "model mismatches," where one subnetwork may produce a different representation as output than the next subnetwork requires as input, can cause system design time to be protracted. Failure to recognize that a problem exists in articulation of networks can result in behavior that diverges wildly from expected norms. On the other hand, any complex problem may present similar difficulties regardless of the choice of solution approach. By using subnetworks of known characteristics, one may be able to achieve a solution with fewer uncertainties than a totally top-down approach would yield. With a range of different capabilities available from subsidiary 58 59 networks, the likelihood of encountering an insoluble subproblem is reduced. Synthetic integrated systems also lead to a combinatorial explosion in the richness of possible system behaviors, which again is reminiscent of the increasingly interesting behaviors noted as more complex biological organisms are considered. Simulation Concerns The integrated ANN system from the example problem relied on several procedural programming shortcuts. For example, the implementation of "boredom" in Lobes was simply a counter which would be compared with a threshold value. This does not have any basis in biological neural systems, yet rather neatly simulates the behavior of a simple neuronal model for the same task. As another example, the decision in the ART 1 network as to which category an input belongs to is currently made on the basis of an arbitrary, winner-take-all rule. The same effect could probably be achieved by means of on-center-off-surround neural interactions that are more biologically realistic. Casting the system functions into an entirely biological framework would yield a better, more capable system for future work. However, the system stands as a first effort toward this goal. 60 Integration and Artificial Intelligence There have been several efforts toward integrative system design in the top-down modelling school. In the HEARSAY system (Reddy et. al. 1973), the blackboard model was developed. The blackboard model presupposes the combination of a set of possible problem-solving systems which have available the current system state. The system state is said to reside upon "the blackboard" as a visualization of the process of problem-solving. Each of the several applicable subsidiary problem- solving systems attempts to derive an incremental step toward the global solution, and competes with other such problem-solvers to control the blackboard, and thus be able to change the system state. This is a significant development, and one which is paralleled by concepts in several artificial neural network models. In ART models, for example, the concept of competition among various classification prototypes bears a strong resemblance to the blackboard model. If the F1 layer's activation is considered analogous to the system state, the F2 nodes each are analogous to problem-solvers in the blackboard model. There have been some efforts toward explicitly bringing together top- down and bottom-up models in hybrid systems. Amano et al. (1989) present an example of a phoneme recognition system combining an expert system with a perceptron network. The expert system provides feature extraction from speech data, which is input to the perceptron. The perceptron allows decision-making under uncertainty, whose output is interpreted using fuzzy logic rules. This avoids drawbacks associated with "template matching" phoneme recognition schemes. Rabelo and Alptekin (1989) have integrated a 61 neural network with an expert system into an intelligent scheduling system for manufacturing applications. Their system has the ability to learn from experience and generate schedules within real-time constraints. Neurobiological Evidence of Multifunctionality Multi-state functionality of memory is supported by work of Nottebohm (1989). Nottebohm's work involves the development and remembrance of complex songs in songbirds. Through a series of studies, Nottebohm demonstrates that hormonal changes can cause the forgetting of songs in a bird's repertoire, or allow the formation and remembrance of new songs. The hormonal changes in question center around testosterone, and typically the mating season is the time when levels of testosterone allow the formation of songs, presumably conferring a reproductive advantage for male songbirds. Nottebohm demonstrates that the changes are only hormonally dependent by the simple strategy of artificially inducing song creation by the application of testosterone to songbirds of both sexes at various times of the year. The withholding of regular doses of testosterone was also shown to cause the forgetting of known songs in the same birds. The implications of Nottebohm's work include support for state- dependent memory. Since a specific memory function can be modulated by a specific hormone, this implies that other memory systems may also have recall dependent upon some hormone or other chemically mediated process. Given that recall and learning can be so modulated, the necessity for taking state dependencies (levels of hormones and other neuroactive 62 substances) into account for system function can be appreciated. State-dependent memory is also supported by the work of many other investigators, such as Bower (1981). In Bower's experiments, happy or sad moods were induced in subjects by hypnotic suggestion, in order to investigate the influence of emotions on memory and thinking. This influence was profound; for example, people recalled a greater percentage of those experiences which had taken place when they were in the same mood as they were in during recall. Also, when the feeling tone of a story agreed with a reader's hypnotically generated emotion, the reader found the events and characters in the story more memorable and easier to identify with. State dependence of memory or other neural function can give rise to quite useful modelling constructs. The ability to recast problem solutions given functional states becomes biologically justifiable. The logical power of conditional activation of entire subnetworks becomes available through the modelling, however coarse, of these state dependencies. The almost direct implementation of expert system analogues which can be analyzed in a completely biological and ANN context is made possible. Implications include the higher order integration of functionally changed sub-units over time. In humans, well known state dependencies include the fight or flight response to norepinephrine production and the diving response typical of mammals entering cold water. Without the appropriate integrative control, neither fight or flight nor the diving response would produce the desired, or selected, effect. The coordination of separate functional neural "circuits" is clearly present; the exact 63 mechanism remains to be elucidated but there have been some promising beginnings in neural network models. For example, in the neural model of attention described by Grossberg (1975), there is competition between nodes representing activations of different drives (hunger, thirst, sex, etc.). The winner of this competition is not determined solely by which drive is highest, but also by the availability of compatible cues in the environment. State-dependent memory implies the existence of functional changes over time in cortical structures. Since we now have evidence of multi- modal neural circuitry, at least some consideration should be given to the implied necessary integration. The problem of understanding a system which is dynamic not only in processing of input but also in functional neural subsystems is both daunting and exhilarating. It is daunting, because the complexities of modelling such systems exceeds our current capacity for ready assimilation and understanding of the underlying concepts and mechanisms (which have not yet been elucidated), and exhilarating because there appears to be no end to the variety of expression of these systems in the natural world, and thus no apparent end to the problem-solving challenge awaiting the researcher. The function of speech processing in humans, for example, requires the acquisition of external signals, the separation of those signals into semantic and affective content, the recognition of mode in affective content, the parsing of semantic content, and the integration of semantic and affective content to determine meaning. This list of subfunctions is not complete, which gives an indication of the extent to which integration remains a regular and important activity in biological systems. 64 The Triune Brain Theory Integrative theories of neural/cognitive function have a long history. One of the best known is the triune brain theory of Paul MacLean (1970). MacLean's research into behavioral studies of different brain areas led him to propose that the human brain is divided into three developmentally derived regions of separate function (see Levine 1990 for discussion). The earliest, and presumably the most primitive region, is termed the reptilian brain, and is composed of the the brainstem and basal ganglia. The reptilian brain is responsible, in this theory, for the preprogrammed, innate behaviors. The paleomammalian brain, composed of the limbic system, modifies the expressed pattern of reptilian brain responses and is the source, in this theory, of the basic emotions (love, hate, fear, arousal, etc.). The neomammalian brain, composed of the newer parts of the cerebral cortex, provides further modifications of the expression of the two older brain areas, and gives us our rational capacity, seen in the ability for planning and verbal expression. While MacLean's theory is oversimplified, it does provide a useful set of distinctions between various cognitive subfunctions, all of which are involved in complex behaviors. In fact, if one stretches the imagination, one can draw analogies between the reptilian brain and our Hopfield-Tank network; the paleomammalian brain and our ART 1 network, and the neomammalian brain and our back-propagation network. The integrated ANN note generator had its origins in a collaborative effort to develop an extensive ANN system suitable for exploring multi- 65 modal cognitive hypotheses (Blackwood, Elsberry, and Leven 1988). That project, in turn, was derived from insights provided by Leven (1987b). Leven's SAM model was depicted in a manner which led to a discussion of the possibility of replacing the components of MacLean's triune brain model with current ANN architectures. The difficulty of describing a suitably restricted problem for adequate application of the limited current architectures was resolved with the simple melodic composition problem outlined previously. Points of difference from the original MacLean theory can be attributed to certain changes in model context (as modified by Leven's separation of memory into three components: motoric/instinctive [reptilian], sensory/affective [paleomammalian], and associative/semantic [neomammalian] (Leven 1987a)) and to the mismatch between architectures derived not for their similarity to these basic cognitive forms, but to satisfy more immediate criteria such as being implementable in current electronic devices. The desired system based on our loose analogy to MacLean's theory has been demonstrated to be operational and ready for incremental refinement. We hope that in future work such analogies can be made more precise. The development of both neural network theory and neuroscientific data should allow the critical research to continue into these theories of integrative cognitive function. APPENDIX 1 SAMPLE MELODY OUTPUT OF THE VARIOUS NOTE GENERATOR PROGRAMS 66 67 Integrated ANN Note Generator Sample Output 68 b61T output, page 1 69 b61t output, page 2 70 b61t output, page 3 71 Random Note Generator Sample Output 72 random output, page 1 73 random output, page 2 74 random output, page 3 75 Classical Note Generator Sample Output 76 classical output, page 1 77 classical output, page 2 78 classical output, page 3 APPENDIX 2 PROGRAM SOURCE LISTING: INTEGRATED ANN NOTE GENERATOR 79 80 PROGRAM Note_Generator (INPUT,OUTPUT); { This program demonstrates a small application which uses three major neural network models to advantage in a cooperative, integrated manner. The problem is that of generating musical notes in a well-formed, but not repetitious manner. The scale of notes is limited, and only one note is generated per time interval. } { Copyright 1989 by Wesley R. Elsberry. All rights reserved. Commercial use of this software is prohibited without written consent of the author. For information, bug reports, and updates contact Wesley R. Elsberry 528 Chambers Creek Drive South Everman, Texas 76140 Telephone: (817) 551-7018 } USES DOS, {Turbo Pascal MS-DOS functions} CRT, {Turbo Pascal IBM-PC screen and keyboard functions} ANN, {Artificial Neural Network functions} Struct, {Linked-list structure functions} Misc1, {Miscellaneous functions} ANSI_Z, {ANSI screen control functions} BP_unit, {Back-propagation network functions} Globals, {Global types and constants} ClasInst; {Classical Instructor function} VAR inf : TEXT; {Input file handle} outf : TEXT; {Output file handle} main_ii, {Loop variable} main_jj : INTEGER; {Loop variable} main_done : BOOLEAN; {Loop variable} main_time : Time_rec_; {Record for time info} Time1, Time2 : Time_rec_; Testr : REAL; Inchar : char; note_rec : note_record_; ii : INTEGER; snet : BP_net_; 81 {For Beethoven (ART 1)} F1 : F1_layer_ptr_; F2 : F2_layer_ptr_; {End VAR declarations} {---------------------------------------------------------} PROCEDURE Wait; {} BEGIN {} { Writeln('Press a key...');} { READ(inchar);} END; {} {----------------------------------------------------------} PROCEDURE report_notes (VAR cmn : Common_area_); TYPE outstr_ = STRING[10]; VAR ii : INTEGER; raoutf : FILE OF CHAR; raoutfname : STRING; och : CHAR; BEGIN {open note file} {skip to end} {write note} {close} raoutfname := FSEARCH('BEETHOVN.MUS',GETENV('PATH')); IF raoutfname <> '' THEN BEGIN Assign(raoutf,raoutfname); Reset(raoutf); END ELSE BEGIN Assign(raoutf,'BEETHOVN.MUS'); Rewrite(raoutf); END; Seek (raoutf, FileSize(raoutf)); {go to end of file} IF note_rec.c = 2 THEN BEGIN och := ascii_cr; Write(raoutf,och); och := ascii_lf; 82 Write(raoutf,och); och := ascii_asterisk; Write(raoutf,och); och := ascii_equal; Write(raoutf,och); och := ascii_asterisk; Write(raoutf,och); och := ascii_cr; Write(raoutf,och); och := ascii_lf; Write(raoutf,och); END; och := Chr(note_rec.n[note_rec.c-1]+48); Write(raoutf,och); och := ascii_cr; Write(raoutf,och); och := ascii_lf; Write(raoutf,och); IF note_rec.c = 153 THEN BEGIN och := ascii_cr; Write(raoutf,och); och := ascii_lf; Write(raoutf,och); och := ascii_asterisk; Write(raoutf,och); och := ascii_equal; Write(raoutf,och); och := ascii_asterisk; Write(raoutf,och); och := ascii_cr; Write(raoutf,och); och := ascii_lf; Write(raoutf,och); END; Close(raoutf); ANSI_CUP(2,0); Write('Notes generated: '); ANSI_CUP(2,25); Write((note_rec.c-1):3); END; PROCEDURE record_a_note (VAR cmn : Common_area_); CONST Init : BOOLEAN = FALSE; c : INTEGER = 1; TYPE outstr_ = STRING[10]; 83 VAR ii : INTEGER; raoutf : FILE OF CHAR; raoutfname : STRING; och : CHAR; BEGIN IF NOT init THEN BEGIN FillChar(note_rec.n,SizeOf(note_rec.n),#0); note_rec.c := 1; Init := TRUE; END; FOR ii := 1 TO V_len_out-1 DO BEGIN {} cmn.notes[ii] := cmn.notes[ii+1]; END; {} cmn.notes[V_len_out] := 0; note_rec.n[note_rec.c] := cmn.notes[v_len_out-1]; INC(note_rec.c); report_notes(cmn); END; PROCEDURE play_a_note(VAR cn : BYTE); BEGIN CASE cn OF 1 : Sound(n_c_mid); 2 : Sound(n_d); 3 : Sound(n_e); 4 : Sound(n_f); 5 : Sound(n_g); 6 : Sound(n_a); 7 : Sound(n_b); 8 : Sound(n_c_hi); ELSE NoSound; END; Delay(180); NoSound; Delay(55); END; PROCEDURE play_notes (nr : note_record_); VAR ii : INTEGER; BEGIN ii := 1; 84 FOR ii := 1 TO nr.c DO play_a_note ( nr.n[ii]); END; PROCEDURE {Change_global_factors} user_keys; CONST Initialized : BOOLEAN = FALSE; VAR inch : CHAR; instr : STRING; tempr : REAL; err : INTEGER; PROCEDURE display_global_factors; BEGIN ANSI_CUP(13,12); Write('*res: ',HTN_co_res:5:4); ANSI_CUP(14,12); Write('*cap: ',HTN_co_cap:5:4); ANSI_CUP(15,12); Write(' *wt: ',HTN_co_wt:5:4); ANSI_CUP(16,12); Write('*inp: ',HTN_co_inp:5:4); ANSI_CUP(17,12); Write('epsi: ',epsilon:5:4); ANSI_CUP(18,12); Write('iter: ',HTN_co_iter:5:4); ANSI_CUP(9,55); Write('*Vigilance: ',ART_co_vigilance:5:4); ANSI_CUP(23,0); END; BEGIN IF NOT Initialized THEN BEGIN ANSI_CUP(23,0); Write ('Type "C" to change factors, "P" to play notes so far.'); display_global_factors; ANSI_CUP(23,0); Initialized := TRUE; END; {IF NOT Initialized} IF check_kbd_status THEN BEGIN IF dir_console_IO(inch) THEN BEGIN inch := UpCase(inch); 85 IF inch = 'C' THEN BEGIN ANSI_CUP(21,0); Write( 'Change: 1)epsi 2)*res 3)*cap 4)*wt 5)*inp 6)iter 7)*vigilance' ); REPEAT WHILE (NOT dir_console_IO(inch)) DO ; UNTIL (inch IN ['1','2','3','4','5','6','7']); REPEAT ANSI_CUP(21,0); ANSI_EEOL; ANSI_CUP(21,0); Write('Input value: '); Readln(instr); Val(instr,tempr,ii); UNTIL (ii = 0); CASE inch OF '1' : epsilon := tempr; '2' : HTN_co_res := tempr; '3' : HTN_co_cap := tempr; '4' : HTN_co_wt := tempr; '5' : HTN_co_inp := tempr; '6' : HTN_co_iter := tempr; '7' : ART_co_vigilance := tempr; END; ANSI_CUP(22,0); ANSI_EEOL; display_global_factors; ANSI_CUP(23,0); END ELSE IF inch = 'P' THEN BEGIN play_notes(note_rec); END; {Else if inch} END; END; END; {----------------------------------------------------------} PROCEDURE Bach(VAR cmn : Common_Area_); {Generates a new note from past sequence and frequency information. Uses a Hopfield-Tank network to accomplish this task. } {INPUTS: Sequence of notes, 4 notes long OUTPUT: Single note, valued from 1 to 8 } {Hopfield-Tank network. Given data and input values, processes for output.} 86 PROCEDURE HTN(VAR cma : Common_Area_); CONST Initialized : BOOLEAN = FALSE; TYPE W_A_ptr_ = ^Weight_Array_; weight_array_ = ARRAY[1..64,1..64] OF REAL; file_string_ = STRING[127]; neuron_ = RECORD a : REAL; {activation value} r : REAL; {resistance} c : REAL; {capacitance} o : REAL; {output} i : REAL; {input} END; neuron_array_ = ARRAY[1..v_len_in,1..v_len_out] OF neuron_; note_array_ = ARRAY[1..5] OF INTEGER; CONST WA : W_A_ptr_ = NIL; VAR inf : FILE OF weight_array_; Time_step : INTEGER; ns : ARRAY[0..1] OF neuron_array_; ii, jj, kk : INTEGER; nbase, nindex : INTEGER; FUNCTION Neuron_Output(act,cap :REAL):REAL; BEGIN neuron_output := 0.5 *(1 + tanh(act/cap)); END; FUNCTION max_cell_in_column(col : INTEGER):INTEGER; VAR ii, jj : INTEGER; Hi : REAL; nsptr : INTEGER; BEGIN {max_cell_in_column} Hi := 0.0; nsptr := time_step MOD 2; FOR ii := 1 TO v_len_in DO BEGIN IF (ns[nsptr,ii,col].o > Hi) THEN BEGIN Hi := ns[nsptr,ii,col].o; jj := ii; END; END; 87 max_cell_in_column := jj; END; {max_cell_in_column} FUNCTION done(epsi : REAL):BOOLEAN; VAR finish : BOOLEAN; ii, jj : INTEGER; t : REAL; BEGIN finish := FALSE; ii := 1; WHILE (NOT finish) AND (ii <= V_len_in) DO BEGIN FOR jj := 1 TO 5 DO BEGIN IF ABS(ns[0,ii,jj].o - ns[1,ii,jj].o) > epsi THEN finish := TRUE; END; ii := ii + 1; END; done := NOT finish; END; FUNCTION Convert_to_weight_coord(note,posit : INTEGER):INTEGER; BEGIN {Convert_to_weight_coord} Convert_to_weight_coord := (v_len_in*(posit-1)+note); END; {Convert_to_weight_coord} FUNCTION delta_neuron_activation(a,r,i : REAL; note,posit : INTEGER):REAL; VAR ii, jj : INTEGER; tempr1, tempr2 : REAL; sum : REAL; di1,di2 : INTEGER; current : INTEGER; BEGIN {delta_neuron_activation} current := time_step MOD 2; sum := 0.0; di1 := Convert_to_weight_coord(note,posit); FOR ii := 1 TO v_len_in DO FOR jj := 1 TO v_len_out DO BEGIN di2 := Convert_to_weight_coord(ii,jj); sum := sum + (WA^[di1,di2] * HTN_co_wt) * ns[current,ii,jj].o; END; 88 delta_neuron_activation := (-(ns[current,note,posit].a / (ns[current,note,posit].r * HTN_co_res)) + (ns[current,note,posit].i * HTN_co_inp) + sum) / (ns[current,note,posit].c * HTN_co_cap); END; {delta_neuron_activation} FUNCTION iterate_htn(VAR nts : notes_):INTEGER; VAR II, JJ, next_time : INTEGER; PROCEDURE display_neuron_activation; CONST column = 0; row = 10; VAR ii, jj : INTEGER; ts : INTEGER; active : INTEGER; BEGIN {display_neuron_activation} ts := time_step; FOR ii := 1 TO v_len_in DO BEGIN {FOR ii} ANSI_CUP(row+ii,0); FOR jj := 1 TO v_len_out DO BEGIN {FOR jj} active := Round(ns[ts,ii,jj].o * 10); IF active > 10 THEN active := 10; IF active < 1 THEN active := 1; Write(Copy(graphic_string,active,1),' '); END; {FOR jj} END; {FOR ii} ANSI_CUP(23,0); END; {display_neuron_activation} PROCEDURE update_neuron_output; VAR ii, jj : INTEGER; BEGIN FOR ii := 1 TO v_len_in DO BEGIN {FOR ii} FOR jj := 1 TO v_len_out DO BEGIN {FOR jj} ns[time_step,ii,jj].o := 89 neuron_output(ns[time_step,ii,jj].a, ns[time_step,ii,jj].c); END; {FOR jj} END; {FOR ii} END; BEGIN {iterate_HTn} time_step := 0; {initialize the neuron arrays} FOR ii := 1 TO 8 DO FOR jj := 1 TO 5 DO BEGIN ns[0,ii,jj].a := 0.5; ns[0,ii,jj].o := 0.0; ns[1,ii,jj].o := 0.0; IF (nts[jj] = 0) THEN BEGIN {randomize input} ns[0,ii,jj].i := gaussian_noise(0.5,0.25); ns[1,ii,jj].i := ns[0,ii,jj].i; END ELSE BEGIN IF (nts[jj] = ii) THEN BEGIN ns[0,ii,jj].i := 0.67 + gaussian_noise(0.0,0.1); ns[1,ii,jj].i := ns[0,ii,jj].i; END ELSE BEGIN ns[0,ii,jj].i := 0.33 + gaussian_noise(0.0,0.1); ns[1,ii,jj].i := ns[0,ii,jj].i; END; END; END; {prevent premature end} ns[1,1,1].o := 20; update_neuron_output; WHILE (NOT done(epsilon)) DO BEGIN user_keys; time_step := time_step MOD 2; next_time := (time_step + 1) MOD 2; {determine output term for neurons} update_neuron_output; FOR ii := 1 TO v_len_in DO BEGIN {FOR ii} FOR jj := 1 TO v_len_out DO BEGIN {FOR jj} ns[next_time,ii,jj].a := ns[time_step,ii,jj].a + HTN_co_iter * delta_neuron_activation(ns[time_step, 90 ii,jj].a, ns[time_step,ii,jj].r, ns[time_step,ii,jj].i,ii,jj); END; {FOR jj} END; {FOR ii} display_neuron_activation; time_step := time_step + 1; END; {WHILE} {**Change of note: Finding the highest activity in each column and setting it for further processing will change the behavior of the net and program. The change is being made to improve performance of the ART1 network.**} FOR ii := 1 TO v_len_out DO BEGIN nts[ii] := max_cell_in_column(ii); END; iterate_htn := nts[v_len_out]; END; {Iterate_HTn} BEGIN {HTn} { Initialized := False;} IF (NOT Initialized) THEN{do initialize} BEGIN {get weight matrix} New(WA); Assign(inf,'htn.dat'); Reset(inf); Read(inf,WA^); Close(inf); Initialized := TRUE; ANSI_CUP(8,0); Write('HTN:'); ANSI_CUP(9,0); Write('Node activity'); ANSI_CUP(11,12); Write('R: ',global_resistance:5:3); ANSI_CUP(12,12); Write('C: ',global_capacitance:5:3); ANSI_CUP(23,0); END; FOR ii := 1 TO 8 DO FOR jj := 1 TO 5 DO BEGIN ns[0,ii,jj].r := global_resistance; ns[0,ii,jj].c := global_capacitance; ns[1,ii,jj].r := global_resistance; ns[1,ii,jj].c := global_capacitance; END; FOR ii := 1 TO v_len_out DO {clear notes} cma.notes[ii] := 0; 91 nbase := note_rec.c - v_len_out; FOR nindex := 1 TO (v_len_out - 1) DO BEGIN IF ((nbase + nindex) > 0) THEN cma.notes[nindex] := note_rec.n[(nbase+nindex)]; END; {For} cma.Candidate_note := iterate_htn(cma.notes); cma.notes[v_len_out] := cma.candidate_note; END; {HTn} BEGIN {Bach} Dump_Common(cmn); HTn(cmn); END; {Bach} {----------------------------------------------------------} PROCEDURE Salieri(VAR cmn : Common_Area_); {Compares past information and proposed note generated by Bach with rules of classical composition. A PDP network is used to do this.} PROCEDURE Back_propagation(VAR cmn : Common_Area_); {A PDP style back propagation network.} CONST Initialized : BOOLEAN = FALSE; count : WORD = 1; TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, jj, kk : INTEGER; Done : BOOLEAN; tp1 : DVE_ptr_; error_m, tne, sum : ARRAY[1..3] OF REAL; ss : STRING; binsum : ARRAY[1..3] OF INTEGER; FUNCTION max (r1, r2 :REAL):REAL; BEGIN IF r1 >= r2 THEN max := r1 ELSE max := r2; END; PROCEDURE Set_input_vector_from_notes (vp : DVE_ptr_; 92 n : notes_); VAR ii : INTEGER; vpt : DVE_ptr_; vn : ARRAY[1..40] OF INTEGER; BEGIN FillChar (vn,SizeOf(vn),#0); {Blank the current vector} FOR ii := 1 TO 5 DO BEGIN {Notes subscript} IF n[ii] > 0 THEN vn [((ii-1)*8)+n[ii]] := 1; END; {For notes subscript} vpt := vp; FOR ii := 1 TO snet.n_input DO BEGIN vnp_(vpt^.dptr)^.v := vn[ii]; vpt := vpt^.right; END; {FOR ii} END; BEGIN IF NOT Initialized THEN BEGIN snet.data_fname := 's61.dat'; ANSI_CUP(15,40); Setup_BP_net (snet,snet.data_fname); Write(snet.data_fname); ANSI_CUP(16,40); Set_BP_net_weights_from_file(snet,snet.wt_fname); Write(snet.wt_fname); Initialized := TRUE; END; {now set up current input vector} Set_input_vector_from_notes(snet.vi,cmn.notes); {get the supervisor's critique for the current sequence} vnp_(snet.vts^.dptr)^.v := Classical_instructor (cmn.notes); {Feed-forward, back-propagate, and make changes} BP_train_and_change (snet); {determine response of the net to the current sequence} cmn.Is_classical := (vnp_(snet.vos^.dptr)^.v > 0.50); 93 {keep weight changes that have been made now and then} IF (count MOD 49) = 0 THEN Dump_BP_net_weights(snet,snet.wt_fname); INC(count); END; BEGIN {Salieri} Dump_common(cmn); Back_propagation(cmn); END; {Salieri} {----------------------------------------------------------} PROCEDURE Beethoven(VAR cmn : Common_Area_); {} PROCEDURE ART1(VAR cma : Common_Area_); {Binary Adaptive Resonance Theory model} { Carpenter and Grossberg's ART1 paradigm Copyright 1988, W.R. Elsberry & D.J. Blackwood Written in Turbo Pascal 5.5, September 1988 } { Terminology: STM = Short Term Memory LTM = Long Term Memory TD = Top-down BU = Bottom-up F1 layer = a vector of nodes which receive input from three sources: External input, a binary vector Gain control, an internal processing input F2 layer, activation of category nodes through LTM traces F2 layer = a set of nodes which have LTM memory traces associated with them that relate to the BU activations to particular nodes in the F2 level. Inputs: F1 activation through BU LTM traces, Gain Control, System Reset } { Parameter constraints from Carpenter & Grossberg, 1987 94 A >= 0 mu1, mu2 >= 0 (Simpson) C >= 0 epsilon1, epsilon2 >= 0 (Simpson) MAX(1,D) < B < 1 + D MAX(1,gamma1) < sigma1 < (1 + gamma1) (Simpson) 0 < e << 1 K = O(1) L > 1 0 < p <= 1 0 < Vigilance <= 1 (Simpson) 0 < Z_IJ(0) < (L / (L-1+M)) 0 < Wup(0) < (L / (L - 1 + Max_F1_nodes)) (Simpson) 1 >= Z_JI(0) > Z_BAR == ((B-1)/D) 1 >= Wdn(0) > ? == ((sigma2-1)/gamma2) (Simpson) 0 <= I,f,g,h <= 1 } CONST Initialized : BOOLEAN = FALSE; Initial_Wup = 0.1; Initial_Wdn = 0.9; Number_committed_F2 : INTEGER = 0; Vigilance : REAL = 0.9; {Determines what level of mismatch will cause reset} Time_slice : REAL = 0.1; {Factor to multiply deltas by} {The following are part of the F1 STM recall equation} mu1 : REAL = 1; {Positive constant controlling BU input & TD feedback} sigma1 : REAL = 1.4; {Positive constant regulating gain control} epsilon1 : REAL = 1; {Positive constant regulating gain control} gamma1 : REAL = 1; {Positive constant regulating TD and BU feedback} {The following are part of the F2 STM recall equation} mu2 : REAL = 1; {Positive constant controlling BU input & TD feedback} sigma2 : REAL = 1.4; {Positive constant regulating gain control} epsilon2 : REAL = 1; {Positive constant regulating gain control} 95 gamma2 : REAL = 1; {Positive constant regulating BU input} {The following are part of the Bottom-Up LTM equation} alpha1 : REAL = 1; {Positive constant for learning rate} beta1 : REAL = 1; {Positive constant for passive decay} {The following are part of the Top-Down LTM equation} alpha2 : REAL = 1; {Positive constant for learning rate} beta2 : REAL = 1; {Positive constant for passive decay} VAR i, j : INTEGER; F2_winner : INTEGER; {Index of winning F2 node} Input_on : BOOLEAN; {Is input currently being received?} Resonance : BOOLEAN; {The following are part of the match operation equation} Vector_I : Vector_; {Binary input vector} Vector_X : Vector_; {Binary expected vector} PROCEDURE Build_Expected_Vector; {} CONST low = 0.0; high = 1.0; {thresh = 0.5;} VAR ii, jj : INTEGER; thresh : REAL; BEGIN {Build_Expected_Vector} thresh := 0.0; WITH F2^[F2_winner] DO BEGIN {with} FOR ii := 1 TO Max_F1_nodes DO BEGIN {} thresh := thresh + Wdn[ii]; END; {} thresh := thresh / Vector_length; FOR ii := 1 TO Max_F1_nodes DO BEGIN Vector_X[ii] := BYTE(Round (Threshold(low, high, thresh, Wdn[ii]) )); END; END; {with} END; {Build_Expected_Vector} 96 PROCEDURE Build_Input_Vector; {} VAR ii, jj : INTEGER; FUNCTION One_if_NZ(This, That : INTEGER) :INTEGER; {} VAR ii : INTEGER; BEGIN {} ii := This AND That; IF (ii <> 0) THEN {} BEGIN ii := 1; END; One_if_NZ := ii; END; {} BEGIN {Build_Input_Vector} FillChar (vector_i,SizeOf(vector_i),#0); {Blank the current vector} FOR ii := 1 TO 5 DO BEGIN {Notes subscript} IF cma.notes[ii] > 0 THEN vector_i [((ii-1)*8)+cma.notes[ii]] := 1; END; {For notes subscript} IF (cma.Is_classical) THEN {} BEGIN Vector_I[Max_F1_nodes] := 1; END ELSE {} BEGIN Vector_I[Max_F1_nodes] := 0; END; END; {Build_Input_Vector} PROCEDURE Display_vectors; {} VAR vii : INTEGER; BEGIN {} 97 ANSI_CUP(8,28); Write('ART1: # Committed F2: ',Number_committed_F2, ' Vigilance: ',vigilance:5:4); ANSI_CUP(9,37); Write('F2 Winner : ',F2_winner:3); ANSI_CUP(10,28); Write('Expected Vector vs. Input Vector'); ANSI_CUP(12,28); Write('IV: '); FOR vii := 1 TO vector_length DO BEGIN {} IF ((vii MOD 8) = 1) AND (vii <> 1) THEN Write(' '); Write(Vector_I[vii]); END; {} ANSI_CUP(11,28); Build_Expected_Vector; Write('EV: '); FOR vii := 1 TO vector_length DO BEGIN {} IF ((vii MOD 8) = 1) AND (vii <> 1) THEN Write(' '); Write(Vector_X[vii]); END; {} END; {} PROCEDURE Clear_ART1; {} VAR ii, jj : INTEGER; BEGIN {Clear_ART1} FOR ii := 1 TO Max_F2_nodes DO BEGIN {Clear F2 node parameters} WITH F2^[ii] DO BEGIN Eligible := TRUE; Curr_B := 0; Last_B := 0; END; END; {Clear F2 node parameters} FOR ii := 1 TO Max_F1_nodes DO BEGIN {Clear F1 node parameters} WITH F1^[ii] DO BEGIN Curr_A := 0; Last_A := 0; END; END; {Clear F1 node parameters} F2_winner := 0; END; {Clear_ART1} PROCEDURE Initialize_ART1; 98 {} VAR ii, jj, kk : INTEGER; BEGIN {Initialize_ART1} Number_committed_F2 := 0; New(F2); New(F1); FOR kk := 1 TO Max_F2_nodes DO BEGIN {FOR} WITH F2^[kk] DO BEGIN {WITH} Committed := FALSE; FOR ii := 1 TO Vector_Length DO BEGIN {For} Wup[ii] := Initial_Wup; Wdn[ii] := Initial_Wdn; Last_B := 0.0; Curr_B := 0.0; END; {For} END; {WITH} END; {FOR} FOR kk := 1 TO Max_F1_nodes DO BEGIN {} WITH F1^[kk] DO BEGIN {} Last_A := 0.0; Curr_A := 0.0; END; {} END; {} END; {Initialize_ART1} FUNCTION Delta_STM_F1_node(nde : INTEGER):REAL; { Simpson (1988) Eq. 19 a_dot[nde] = - a[nde] + (1 - mu1 * a[nde]) * (gamma1 * F2[f2_winner].wdn[nde] + Input[nde]) - (sigma1 + epsilon1 * a(nde) * (1 if there is a winner) (0 otherwise) t1 = - a[nde] t2 = + (1 - mu1 * a[nde]) t3 = (gamma1 * F2[f2_winner].wdn[nde] + Input[nde]) t4 = (sigma1 + epsilon1 * a(nde) t5 = (1 if there is a winner) (0 otherwise) 99 so, Delta_STM_F1_node := t1 + t2*t3 - t4*t5; } VAR ii, jj : INTEGER; t1, t2, t3, t4, t5 : REAL; BEGIN {Delta_STM_F1_node} t1 := 0; t2 := 0; t3 := 0; t4 := 0; t5 := 0; WITH F1^[nde] DO BEGIN t1 := - Last_A; t2 := (1 - mu1*Last_A); IF (F2_winner <> 0) THEN {} BEGIN {Make TD term} t3 := F2^[F2_winner].Wdn[nde]; END; t3 := (gamma1*t3 + Vector_I[nde]); t4 := (sigma1 + epsilon1*Last_A); IF (F2_winner > 0) THEN {} BEGIN t5 := 1; END ELSE {} BEGIN t5 := 0; END; END; Delta_STM_F1_node := t1 + t2*t3 - t4*t5; END; {Delta_STM_F1_node} FUNCTION Delta_STM_F2_node(nde : INTEGER):REAL; { Simpson (1988) Eq. 20 b_dot[nde] = - b[nde] + (1 - mu2 * b[nde])b* (gamma2 * [sum over i of S(a[i] * F2[i].wup[nde] ] + f(b[nde]) - (sigma2 + epsilon2 * b(nde) * [sum over k<>j of S(b[k]) ] 100 where, t1 = - b[nde] t2 = (1 - mu2 * b[nde]) t3 = (gamma2 * [sum over i of S(a[i] * F2[i].wup[nde] ] + f(b[nde]) t4 = (sigma2 + epsilon2 * b(nde) t5 = [sum over k<>j of S(b[k]) ] Delta_STM_F2_node := t1 + t2*t3 - t4*t5; } CONST range = 1; slope_mod = 1; shift = 0; VAR ii, jj : INTEGER; t1, t2, t3, t4, t5 : REAL; BEGIN {Delta_STM_F2_node} t1 := 0; t2 := 0; t3 := 0; t4 := 0; t5 := 0; WITH F2^[nde] DO BEGIN t1 := - Last_B; t2 := (1 - mu2*Last_B); FOR ii := 1 TO Max_F1_nodes DO BEGIN {Make TD term} t3 := t3 + sigmoid(1,1,0,F1^[ii].Curr_A)*Wup[ii]; END; {Make TD term} IF (nde = F2_winner) THEN{} BEGIN jj := 1; END ELSE {} BEGIN jj := 0; END; t3 := (gamma2*t3 + jj); t4 := (sigma2 + epsilon2*Last_B); FOR ii := 1 TO number_Committed_F2 DO BEGIN {for} IF (Eligible) THEN BEGIN IF (ii <> nde) THEN t5 := t5 + Sigmoid(range, slope_mod, shift, 101 F2^[ii].Last_B); END; END; {for} END; Delta_STM_F2_node := t1 + t2*t3 - t4*t5; END; {Delta_STM_F2_node} FUNCTION Delta_LTM_Bottom_Up(F2_nde, F1_nde : INTEGER):REAL; { Simpson (1988) Eq. 16 wup_dot(ij) = alpha1 * f(b[j]) * (-beta1 * wup[ij] + S(a[i]) ) This corrects an error in the text! } CONST range = 1; slope_mod = 1; shift = 0; VAR ii, jj : INTEGER; t1, t2, t3,t4, t5 : REAL; BEGIN {Delta_LTM_Bottom_Up} IF (F2_winner = F2_nde) THEN{} BEGIN WITH F2^[F2_nde] DO BEGIN{} Delta_LTM_Bottom_Up := alpha1 * ( -beta1 * Wup[F1_nde] + Sigmoid (range, slope_mod, shift, F1^[F1_nde].curr_A) ); END; END ELSE BEGIN Delta_LTM_Bottom_Up := 0; END; END; {Delta_LTM_Bottom_Up} FUNCTION Delta_LTM_Top_Down(F2_nde, F1_nde : INTEGER):REAL; { Simpson (1988) Eq. 17 wdn_dot(ji) = alpha2 * f(b[j]) * (-beta2 * wdn[ji] + S(a[i]) ) This also corrects an error in the text! } CONST range = 1; slope_mod = 1; 102 shift = 0; VAR ii, jj : INTEGER; t1, t2, t3,t4, t5 : REAL; BEGIN {Delta_LTM_Top_Down} IF (F2_winner = F2_nde) THEN{} BEGIN WITH F2^[F2_nde] DO BEGIN{} Delta_LTM_Top_Down := alpha2 * ( -beta2 * Wdn[F1_nde] + Sigmoid( range, slope_mod, shift, F1^[F1_nde].curr_A) ); END; {} END ELSE BEGIN Delta_LTM_Top_Down := 0; END; END; {Delta_LTM_Top_Down} FUNCTION Raw_match:INTEGER; { Result of bitwise AND of Vector_I and Vector_X } VAR ii, jj : INTEGER; BEGIN {Raw_match} jj := 0; FOR ii := 1 TO Vector_length DO BEGIN {} IF (Vector_I[ii] = 1) AND (Vector_X[ii] = 1) THEN {} BEGIN jj := jj + 1; END ELSE {} BEGIN END; END; {} Raw_match := jj; END; {Raw_match} FUNCTION Ones_in_Vector_I:REAL; {} VAR ii, jj : INTEGER; BEGIN {Ones_in_Vector_I} 103 jj := 0; FOR ii := 1 TO Vector_Length DO BEGIN {} jj := jj + Vector_I[ii]; END; {} Ones_in_Vector_I := jj; END; {Ones_in_Vector_I} FUNCTION Match:BOOLEAN; { Return TRUE if Match between I and X exceeds vigilance } VAR RM, OVI : REAL; BEGIN {Match} RM := Raw_Match; OVI := Ones_in_Vector_I; IF (OVI = 0) THEN {} BEGIN IF (RM > 0) THEN {} BEGIN Match := TRUE; END ELSE {} BEGIN Match := FALSE; END; END ELSE {} BEGIN Match := ((RM/OVI) >= (Vigilance * ART_co_vigilance)); END; END; {Match} FUNCTION Saturated:BOOLEAN; {} BEGIN {Saturated} Saturated := (Number_committed_F2 >= Max_F2_nodes); END; {Saturated} FUNCTION Find_F2_winner : INTEGER; {} VAR ii, jj : INTEGER; max_value : REAL; Max_position : INTEGER; 104 Min_num : INTEGER; BEGIN {Find_F2_winner} {Find the maximum activation} max_value := -1.0E38; MAX_POSITION := 1; IF (Number_Committed_F2 < Max_F2_nodes) THEN {} BEGIN Min_num := Number_Committed_F2; END ELSE {} BEGIN Min_num := Max_F2_nodes; END; FOR jj := 1 TO Min_Num DO BEGIN {FOR} WITH F2^[jj] DO BEGIN {WITH} IF (Eligible) AND (Committed) THEN BEGIN {Eligible AND Committed} IF (Curr_B > max_value) THEN BEGIN {Current > Max} max_value := Curr_B; max_position := jj; END; {Current > Max} END; {Eligible AND Committed} END; {WITH} END; {FOR} IF (Number_committed_F2 > 0) THEN {} BEGIN Find_F2_winner := max_position; ANSI_CUP(8,28); Write ('ART1: # Committed F2: ',Number_committed_F2, ' Vigilance: ',vigilance:5:4); ANSI_CUP(9,37); Write('F2 Winner : ',max_position:3); ANSI_CUP(23,0); wait; END ELSE {} BEGIN Find_F2_winner := 0; ANSI_CUP(9,37); Write('F2 Winner : ',0:3); ANSI_CUP(23,0); END; END; {Find_F2_winner} 105 FUNCTION Changed_STM_F1(epsilon : REAL) : BOOLEAN; {} VAR ii, jj : INTEGER; Temp : BOOLEAN; rr : REAL; BEGIN {Changed_STM_F1} Temp := FALSE; FOR II := 1 TO Vector_Length DO BEGIN {For} WITH F1^[ii] DO BEGIN {With} rr := ABS(Curr_A - Last_A); IF (rr > epsilon) THEN{Changed} BEGIN Temp := TRUE; END; END; {With} END; {For} Changed_STM_F1 := Temp; END; {Changed_STM_F1} FUNCTION Changed_STM_F2 (epsilon : REAL): BOOLEAN; {} VAR ii, jj : INTEGER; Temp : BOOLEAN; rr : REAL; BEGIN {Changed_STM_F2} Temp := FALSE; IF (F2_winner > 0) AND (F2_winner <= Max_F2_nodes) THEN BEGIN WITH F2^[F2_winner] DO BEGIN {With} rr := ABS(Curr_B - Last_B); IF (rr > epsilon) THEN{Changed} BEGIN Temp := TRUE; END; END; {With} END ELSE BEGIN Temp := FALSE; END; Changed_STM_F2 := Temp; END; {Changed_STM_F2} PROCEDURE Do_resonate; 106 {} CONST Iter = 10; E = 0.05; VAR ii, jj, End_loop : INTEGER; BEGIN {Do_resonate} End_loop := 0; {While change in STM do alternate BU and TD STM and BU and TD LTM} REPEAT {} {BU STM} FOR ii := 1 TO Vector_Length DO BEGIN {For F1 STM} WITH F1^[ii] DO BEGIN {With F1} Last_A := Curr_A; Curr_A := Last_A + time_slice * Delta_STM_F1_node(ii); END; {With F1} END; {For F1 STM} {BU LTM} FOR ii := 1 TO Vector_Length DO BEGIN {For BU LTM} WITH F2^[F2_winner] DO BEGIN {With F2_winner} Wup[ii] := Wup[ii] + time_slice * Delta_LTM_Bottom_up(F2_winner,ii); END; {With F2_winner} END; {For BU LTM} { Display_vectors;} {TD STM} WITH F2^[F2_winner] DO BEGIN {} Last_B := Curr_B; Curr_B := Last_B + time_slice * Delta_STM_F2_node(F2_winner); END; {} {TD LTM} FOR ii := 1 TO Vector_Length DO BEGIN {For TD LTM} WITH F2^[F2_winner] DO BEGIN {With F2_winner} Wdn[ii] := Wdn[ii] + time_slice * Delta_LTM_Top_Down(F2_winner,ii); END; {With F2_winner} 107 END; {For TD LTM} { Display_vectors;} End_Loop := End_Loop + 1; UNTIL ((NOT Changed_STM_F1(e)) AND (NOT Changed_STM_F2(e))) OR (End_loop > Iter); {} Display_vectors; Resonance := TRUE; END; {Do_resonate} FUNCTION Exists_eligible : BOOLEAN; {} VAR ii, jj : INTEGER; Temp : BOOLEAN; BEGIN {Exists_eligible} Temp := FALSE; FOR jj := 1 TO Number_committed_F2 DO BEGIN {FOR} WITH F2^[jj] DO BEGIN {WITH} IF (Eligible) AND (Committed) THEN BEGIN {Eligible AND Committed} Temp := TRUE; END; {Eligible AND Committed} END; {WITH} END; {FOR} Exists_eligible := Temp; END; {Exists_eligible} PROCEDURE Encode_New_F2; {} VAR ii, jj : INTEGER; BEGIN {} IF number_committed_F2 < Max_F2_nodes THEN BEGIN {Find next uncommitted F2 node} F2_winner := Number_committed_F2 + 1; {Resonate uncommitted F2 node with F1} Do_resonate; F2^[F2_winner].Eligible := TRUE; F2^[F2_winner].Committed := TRUE; Number_committed_F2 := Number_committed_F2 + 1; END; {IF } END; {} 108 PROCEDURE Find_F1_activation; VAR i, j : INTEGER; BEGIN ANSI_CUP(14,37); Write('Find F1 Activation':30); ANSI_CUP(23,0); FOR j := 1 TO 3 DO FOR i := 1 TO Vector_Length DO BEGIN {FOR} WITH F1^[i] DO BEGIN {WITH} Last_A := Curr_A; Curr_A := Last_A + time_slice * Delta_STM_F1_Node(i); END; {WITH} END; {FOR} END; PROCEDURE Find_F2_activation; VAR i, J : INTEGER; BEGIN ANSI_CUP(14,37); Write('Find F2 Activation':30); ANSI_CUP(23,0); FOR j := 1 TO Number_committed_F2 DO BEGIN {FOR} WITH F2^[j] DO BEGIN {WITH} IF (Eligible) AND (Committed) THEN {} BEGIN Last_B := Curr_B; Curr_B := Last_B + time_slice * Delta_STM_F2_node(j); END; END; {WITH} END; {FOR} END; BEGIN {ART1} ANSI_CUP(8,28); Write('ART1:'); ANSI_CUP(14,28); Write('Process:'); ANSI_CUP(23,0); Build_Input_vector; IF (NOT Initialized) THEN BEGIN {do initialize} 109 ANSI_CUP(14,37); Write('Initialize':30); ANSI_CUP(23,0); Initialize_ART1; Initialized := TRUE; END; {Clear_ART} ANSI_CUP(14,37); Write('Clear':30); ANSI_CUP(23,0); Clear_ART1; cma.Delta_Vigilance := FALSE; cma.New_Category := FALSE; REPEAT user_keys; Resonance := FALSE; {Find current F1 activation} {F1_i node activation * Wij} Find_F1_activation; {Send F1 activation to F2} Find_F2_activation; {If no committed F2 nodes, then proceed to new encoding} IF (NOT exists_eligible) OR (number_committed_F2 = 0) THEN {} BEGIN ANSI_CUP(14,37); Write('No comm. F2, encoding':30); ANSI_CUP(23,0); Encode_New_F2; Resonance := TRUE; cma.New_Category := TRUE; END ELSE BEGIN {F2 competition} {Determine maximum of eligible F2 nodes} ANSI_CUP(14,37); Write('F2 Competition':30); ANSI_CUP(23,0); F2_winner := Find_F2_winner; {F2 winner sends TD image back to F1} {Activation of F2 winner * Wji} Display_vectors; {Makes a call to Build_Expected_Vector} {Compare Input vector to F2 TD vector} IF (Match) THEN BEGIN {Resonate} ANSI_CUP(14,37); Write('Matched, now resonate':30); 110 ANSI_CUP(23,0); Resonance := TRUE; Do_resonate; END {Resonate} ELSE BEGIN {Mismatch} ANSI_CUP(14,37); Write('Mismatch':30); ANSI_CUP(23,0); {Make the F2 node ineligible} F2^[F2_winner].Eligible := FALSE; {Have we saturated?} {Yes, lower vigilance and restart} { * While this is not part of the Carpenter-Grossberg ART 1 architecture, this modification we felt necessary for the small number of category nodes which we are using in the model. * } IF (Saturated) AND (NOT exists_eligible) THEN {If saturation reached decrease vigilance and restart} BEGIN {Saturation} ANSI_CUP(14,37); Write('Saturated, vigilance--':30); ANSI_CUP(23,0); {vigilance is decreased} Vigilance := Vigilance * 0.99; ANSI_CUP(8,57); Write('Vigilance: ',vigilance:5:4); ANSI_CUP(23,0); cma.Delta_Vigilance := TRUE; {call clear_ART1} Clear_ART1; Find_F1_activation; Find_F2_activation;{now find closest match and resonate} F2_winner := Find_F2_winner; Resonance := TRUE; Do_resonate; END {Saturation} ELSE BEGIN {Not saturated or exists_eligible} {Are there eligible F2 nodes?} IF (Exists_eligible) THEN {Yes, go on with current process} BEGIN ANSI_CUP(14,37); Write('Search Eligible F2':30); ANSI_CUP(23,0); {Just continue} END ELSE BEGIN {No, form a new encoding if not saturated} 111 IF NOT saturated THEN BEGIN ANSI_CUP(14,37); Write('Encode new category':30); ANSI_CUP(23,0); Encode_New_F2; Resonance := TRUE; cma.New_Category := TRUE; END; {IF NOT saturated} END; END; {Not saturated} END; {Mismatch} END; {ELSE} UNTIL (Resonance); {Prep info to pass back} END; {ART1} {----------------------------------------------------} BEGIN {Beethoven} Dump_common(cmn); ART1(cmn); END; {Beethoven} {----------------------------------------------------------} PROCEDURE Lobes; { Keeps track of played notes, maintaining sequence information. Uses data from Beethoven to determine when to override Salieri. } CONST Max_notes_in_composition = 152; Object_threshold = 3; Frustration_threshold = 10; VAR Common : Common_Area_; Number_notes : INTEGER; {Note counter} Objects : BOOLEAN; Note_Played : BOOLEAN; Generate_Candidate : BOOLEAN; Need_Critique : BOOLEAN; Need_Compose : BOOLEAN; Since_Novelty, Frustration : INTEGER; ii : INTEGER; BEGIN {Lobes} 112 Randomize; note_rec.c := 0; Since_Novelty := 0; Frustration := 0; Common.notes[1] := 0; Common.notes[2] := 0; Common.notes[3] := 0; Common.notes[4] := 0; Common.notes[5] := 0; Common.delta_vigilance := FALSE; Common.new_category := FALSE; Common.candidate_note := 0; Common.is_classical := FALSE; ANSI_CUP(14,37); Write('Begin Simulation':30); ANSI_CUP(23,0); FOR Number_notes := 1 TO Max_notes_in_composition DO BEGIN {} user_keys; Note_played := FALSE; Generate_Candidate := TRUE; Need_Critique := TRUE; Need_Compose := TRUE; REPEAT IF Generate_candidate THEN {Generate a candidate note, HTn} BEGIN Bach(Common); ANSI_CUP(6,0); Write('Candidate Note:'); ANSI_CUP(6,35); Write(common.candidate_note); ANSI_CUP(23,0); END; wait; IF Need_Critique THEN BEGIN {Find if it is a candidate sequence, PDP} Salieri(Common); ANSI_CUP(5,0); Write('Candidate sequence classical?:'); ANSI_CUP(5,31); Write(common.Is_Classical); ANSI_CUP(23,0); END; wait; IF Need_Compose THEN {Pass through ART and } BEGIN 113 Beethoven(Common); END; wait; {IF Delta_vigilance or New_category, then zero the count} {Else increment the count} IF (Common.Delta_vigilance OR Common.New_Category) THEN {} BEGIN Since_Novelty := 0; END ELSE {} BEGIN Since_novelty := Since_Novelty + 1; END; IF (Common.Delta_vigilance) THEN BEGIN INC(Frustration); END; {If count >= Object_threshold), then Objects is true, reset count} {Else Objects is false} Objects := (Since_Novelty >= Object_threshold); {OR (Frustration > Frustration_Threshold);} IF Objects THEN since_novelty := 0; IF (objects AND common.is_classical) OR ((NOT objects) AND (NOT common.is_classical)) THEN BEGIN generate_candidate := TRUE; need_critique := TRUE; need_compose := TRUE; common.notes[v_len_out] := 0; INC(Frustration); END ELSE BEGIN record_a_note(Common); note_played := TRUE; Frustration := 0; END; UNTIL (Note_played); {A note has been played} END; {} END; {Lobes} {----------------------------------------------------------} BEGIN {Main} WRITELN('Copyright 1989 by Wesley R. Elsberry'); 114 DELAY(2000); ANSI_CLRSCR; Lobes; END. {Main} APPENDIX 3 PROGRAM SOURCE LISTING: BACK-PROPAGATION UNIT 115 116 UNIT BP_unit; { This unit implements the necessary functions for modelling back- propagation artificial neural network architectures. } { Copyright 1989 by Wesley R. Elsberry. All rights reserved. Commercial use of this software is prohibited without written consent of the author. For information, bug reports, and updates contact Wesley R. Elsberry 528 Chambers Creek Drive South Everman, Texas 76140 Telephone: (817) 551-7018 } INTERFACE USES DOS, Struct, ANN; CONST mach_inf = 1E37; exp_max = 80.0; TAB = ^I; Debug : BOOLEAN = FALSE; TYPE REAL = SINGLE; file_string_ = STRING; node_type_ = (Input,hidden,Output); weight_ptr_ = ^weight_; weight_ = RECORD w, dw : REAL; END; vector_node_ptr_ = ^vector_node_; 117 vector_node_ = RECORD V : REAL; END; sptr_ = ^STRING; data_rec_ptr_ = ^data_rec_; data_rec_ = RECORD s : sptr_; n : data_rec_ptr_; END; weight_node_ptr_ = ^weight_node_; weight_node_ = RECORD connect : BOOLEAN; w, dw, ldw : REAL; END; BP_net_ptr_ = ^BP_net_; BP_node_ptr_ = ^BP_node_; BP_net_ = RECORD vs, ve : DVE_ptr_; {node vector start and vector end} ws : DVE_ptr_; {weight array} learning_rate : REAL; alpha : REAL; {factor for momentum term} vi : DVE_ptr_; {input vector} vos, voe : DVE_ptr_; {output vector} vts, vte : DVE_ptr_; {training vector} n_input, n_hidden, n_output : WORD; maxerr : REAL; errtol : REAL; data_fname : file_string_; data_f : TEXT; training_iterations : INTEGER; out_fname : file_string_; out_f : TEXT; wt_fname : file_string_; wt_f : TEXT; END; BP_node_ = RECORD nt : node_type_; {Input, hidden, or output} loc : WORD; ni : REAL; {net input value} delta : REAL; {delta value for node} base : REAL; range : REAL; 118 theta : REAL; dtheta, ldtheta : REAL; fw, bw : DVE_ptr_; {points to entries in weight_matrix} END; PROCEDURE Dump_BP_net_weights (VAR BPN : BP_net_; VAR Fname : STRING); {Save weights and node bias unit values to a file} PROCEDURE Set_BP_net_weights_from_file (VAR BPN : BP_net_; VAR Fname : STRING); {Restore weights and node bias unit values from a file} PROCEDURE BP_set_net_connects_from_file (VAR BPN : BP_net_; VAR Fname : STRING); {Sets network connectivity values from a file} PROCEDURE Setup_BP_net (VAR BPN : BP_net_;VAR Fname : STRING); { Get data values from a text file to set up basic BP constants, sizes, and other necessary information, or query user if filename is not valid. } PROCEDURE Set_Input_vector_from_file (VAR BPN : BP_net_); {Get data values from a text file to fill input vector.} PROCEDURE Set_Training_vector_from_file (VAR BPN : BP_net_); {Get data values from a text file to fill training vector.} PROCEDURE BP_Feed_forward (VAR BPN : BP_net_); {Present values to network and propagate values forward, set the output vector.} 119 PROCEDURE BP_train_presentation (VAR BPN : BP_net_); { Present values to network, propagate forward, set output, compare output to training, back-propagate, collect statistics but do not change weights. } PROCEDURE BP_train_and_change (VAR BPN : BP_net_); {Present values to network, propagate forward, set output, compare output to training, back-propagate, collect statistics, change weights, and reset statistic variables.} PROCEDURE BP_change (VAR BPN : BP_net_); {Change weights using current statistics and reset statistics.} PROCEDURE BP_dump_net (VAR BPN : BP_net_); {Dump net parameters, node activities, and weights for inspection.} FUNCTION BP_net_error (VAR BPN : BP_net_):REAL; {Returns the largest error from the output nodes} PROCEDURE Display_weights (BPN : BP_net_); {Display of the current weight values for the network} {---------------------------------------------------------------------} IMPLEMENTATION {---------------------------------------------------------------------} {Private, internal functions} 120 FUNCTION max (r1, r2 : REAL):REAL; BEGIN IF r1 >= r2 THEN max := r1 ELSE max := r2; END; {---------------------------------------------------------------------} PROCEDURE Dump_BP_net_weights (VAR BPN : BP_net_; VAR Fname : STRING); {Save weights and node bias unit values to a file} { Preface vector length with !V Preface weight vectors with !W Preface bias unit vector with !T } TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, jj, m, n : WORD; inch : CHAR; done : BOOLEAN; tp1 : DVE_ptr_; ss : STRING; BEGIN n := BPN.n_input + BPN.n_hidden + BPN.n_output; (* ss := FSEARCH(Fname,GETENV('PATH')); IF LENGTH(ss) = 0 THEN BEGIN WRITELN('**ERROR** File does not exist'); EXIT; END; *) BPN.wt_fname := Fname; Assign(BPN.wt_f,Fname); Rewrite(BPN.wt_f); done := FALSE; {Write vector length} Writeln(BPN.wt_f,'!V ',n:1); 121 FOR jj := 1 TO n DO BEGIN Write(BPN.wt_f,'!W '); FOR ii := 1 TO n DO BEGIN tp1 := Find_element_matrix(ii,jj,BPN.ws); IF wnp_(tp1^.dptr)^.connect THEN Write(BPN.wt_f,wnp_(tp1^.dptr)^.w:4:4,' ') ELSE Write(BPN.wt_f,0.0:4:4,' '); END; Writeln(BPN.wt_f); END; Write(BPN.wt_f,'!T '); FOR ii := 1 TO n DO BEGIN tp1 := Find_element_DVE(ii,BPN.vs); Write(BPN.wt_f,bpnp_(tp1^.dptr)^.theta:4:4,' '); END; Writeln(BPN.wt_f); Writeln(BPN.wt_f,'!Z '); Close(BPN.wt_f); END; {Dump_BP_net_weights} PROCEDURE Set_BP_net_weights_from_file (VAR BPN : BP_net_; VAR Fname : STRING); {Restore weights and node bias unit values from a file} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, jj, m, n : WORD; rr : REAL; tp1, wp1, wp2 : DVE_ptr_; inch : CHAR; done : BOOLEAN; ss : STRING; BEGIN {} n := BPN.n_input + BPN.n_hidden + BPN.n_output; ss := FSEARCH (Fname,GETENV('PATH')); IF (Length(ss) = 0) THEN BEGIN Writeln ('**ERROR** File does not exist'); EXIT; END; 122 Assign (BPN.wt_f,ss); Reset (BPN.wt_f); done := FALSE; {Find vector length, compare to net vector length} REPEAT REPEAT {find command} Read (BPN.wt_f,inch); UNTIL (inch = '!') OR Eof(BPN.wt_f); {} Read (BPN.wt_f,inch); UNTIL (UpCase (inch) = 'V') OR Eof (BPN.wt_f); IF Eof (BPN.wt_f) THEN BEGIN EXIT; END; Read (BPN.wt_f,inch); Read (BPN.wt_f,m); IF (m <> n) THEN BEGIN {Vector lengths don't match, quit} EXIT; END; wp1 := BPN.ws; REPEAT {get net params} REPEAT {find command} Read (BPN.wt_f,inch); UNTIL (inch = '!'); {} Read (BPN.wt_f,inch); CASE UpCase (inch) OF 'T' : BEGIN {get bias values} Read (BPN.data_f, inch); FOR ii := 1 TO n DO BEGIN tp1 := Find_element_DVE (ii,BPN.vs); Read (BPN.wt_f,rr); bpnp_(tp1^.dptr)^.theta := rr; END; END; {} 'W' : BEGIN {get weights} IF wp1 <> NIL THEN BEGIN Read (BPN.data_f,inch); FOR ii := 1 TO n DO BEGIN wp2 := Find_element_DVE (ii,wp1); Read (BPN.wt_f,rr); wnp_(wp2^.dptr)^.w := rr; END; {For ii} wp1 := wp1^.down; END; 123 END; {} 'Z' : DONE := TRUE; ELSE BEGIN DONE := TRUE; END; END; UNTIL (done OR Eof (BPN.wt_f)); {} END; {set_BP_net_weights_from_file} PROCEDURE BP_set_net_connects_from_file (VAR BPN : BP_net_; VAR Fname : STRING); {Sets network connectivity values from a file} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, jj, kk, m, n : WORD; tp1, wp1, wp2 : DVE_ptr_; inch : CHAR; done : BOOLEAN; ss : STRING; cfile : TEXT; BEGIN {} n := BPN.n_input + BPN.n_hidden + BPN.n_output; ss := FSEARCH (Fname,GETENV('PATH')); IF (Length(ss) = 0) THEN BEGIN Writeln ('**ERROR** File does not exist'); EXIT; END; Assign (cfile,ss); Reset (cfile); done := FALSE; {Find vector length, compare to net vector length} REPEAT REPEAT {find command} Read (cfile,inch); UNTIL (inch = '!') OR Eof(cfile); {} 124 Read (cfile,inch); UNTIL (UpCase (inch) = 'V') OR Eof (cfile); IF Eof (cfile) THEN BEGIN EXIT; END; Read (cfile,inch); Read (cfile,m); IF (m <> n) THEN BEGIN {Vector lengths don't match, quit} EXIT; END; wp1 := BPN.ws; REPEAT {get net params} REPEAT {find command} Read (cfile,inch); UNTIL (inch = '!'); {} Read (cfile,inch); CASE UpCase (inch) OF 'C' : BEGIN {get weights} IF wp1 <> NIL THEN BEGIN Read (cfile,inch); FOR ii := 1 TO n DO BEGIN wp2 := Find_element_DVE (ii,wp1); Read (cfile,kk); wnp_(wp2^.dptr)^.connect := (kk = 1); END; {For ii} wp1 := wp1^.down; END; END; {} 'Z' : DONE := TRUE; ELSE BEGIN DONE := TRUE; END; END; UNTIL (done OR Eof (cfile)); {} END; {BP_set_net_connects_from_file} PROCEDURE Dump_node (net : BP_net_; node : BP_node_ptr_); TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; 125 BEGIN IF (debug) THEN {} BEGIN Writeln ('DUMP_NODE'); Writeln (NET.OUT_F,'DUMP_NODE'); END; IF (node <> NIL) THEN BEGIN CASE node^.nt OF Input : BEGIN Write (net.out_f,'INPUT'); Write ('INPUT'); END; hidden : BEGIN Write (net.out_f,'HIDDEN'); Write ('HIDDEN'); END; Output : BEGIN Write (net.out_f,'OUTPUT'); Write ('OUTPUT'); END; END; Write (net.out_f,tab,'LOC : ',NODE^.LOC); Write ('LOC : ',NODE^.LOC); Writeln (net.out_f,tab,'ADDR : ',Seg(NODE),':',Ofs(NODE)); Writeln (tab,'ADDR : ',Seg(NODE),':',Ofs(NODE)); Write (net.out_f,'NI : ',node^.ni:8); Write ('NI : ',node^.ni:8); Writeln (net.out_f,tab,'DELTA : ',NODE^.DELTA:8); Writeln (tab,'DELTA : ',NODE^.DELTA:8); Write (net.out_f,'BASE : ',NODE^.BASE:8); Write ('BASE : ',NODE^.BASE:8); Writeln (net.out_f,tab,'RANGE : ',NODE^.RANGE:8); Writeln (tab,'RANGE : ',NODE^.RANGE:8); Write (net.out_f,'THETA : ',NODE^.THETA:8); Write ('THETA : ',NODE^.THETA:8); Writeln (net.out_f,tab,'DTHETA : ',NODE^.DTHETA:8,tab, 'LDTHETA : ', NODE^.LDTHETA:8); Writeln (tab,'DTHETA : ',NODE^.DTHETA:8,tab, 'LDTHETA : ',NODE^.LDTHETA:8); Writeln (net.out_f,'FW : ',Seg(NODE^.FW),':', Ofs(NODE^.FW), tab,'BW : ', Seg(NODE^.BW),':',Ofs(NODE^.BW)); Writeln ('FW : ',Seg(NODE^.FW),':', Ofs(NODE^.FW),tab,'BW : ', Seg(NODE^.BW),':',Ofs(NODE^.BW)); Writeln (net.out_f); Writeln; END; IF (debug) THEN BEGIN Writeln ('END DUMP_NODE'); Writeln (NET.OUT_F,'END DUMP_NODE'); END; 126 END; PROCEDURE Display_weights (BPN : BP_net_); {Display of the current weight values for the network} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR twpd, twpa : DVE_ptr_; wptr : DVE_ptr_; BEGIN {} IF debug THEN BEGIN Writeln ('DISPLAY_WEIGHTS'); Writeln (BPN.OUT_F,'DISPLAY_WEIGHTS'); END; twpd := BPN.ws; twpa := BPN.ws; WHILE (twpd <> NIL) DO BEGIN{} WHILE (twpa <> NIL) DO BEGIN {} wptr := twpa^.dptr; IF (wnp_(wptr)^.connect) THEN BEGIN Write (BPN.out_f,wnp_(wptr)^.w:5:1,' '); Write (wnp_(wptr)^.w:5:1,' '); END ELSE {} BEGIN Write (BPN.out_f,' --- '); Write (' --- '); END; twpa := twpa^.right; END; {} Writeln (BPN.out_f); Writeln ; twpd := twpd^.down; twpa := twpd; END; {} Writeln (BPN.out_f,'End of weights'); Writeln ('End of weights'); Writeln; Writeln; IF (debug) THEN {} BEGIN Writeln ('END DISPLAY_WEIGHTS'); Writeln (BPN.OUT_F,'END DISPLAY_WEIGHTS'); END; 127 Flush (BPN.out_f); END; {} PROCEDURE Display_Vector (vp : DVE_ptr_; N : INTEGER; NET : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii : INTEGER; vptr : vector_node_ptr_; BEGIN {} IF debug THEN BEGIN Writeln ('DISPLAY_VECTOR'); Writeln (NET.OUT_F,'DISPLAY_VECTOR'); END; FOR II := 1 TO N DO BEGIN {} vptr := vp^.dptr; Write (net.out_f,vptr^.V:8,' '); Write (vptr^.V:8,' '); vp := vp^.right; END; Writeln (net.out_f); Writeln; IF (debug) THEN {} BEGIN Writeln ('END DISPLAY_VECTOR'); Writeln (NET.OUT_F,'END DISPLAY_VECTOR'); END; END; {} FUNCTION BP_net_error (VAR BPN : BP_net_):REAL; {Returns the largest error from the output nodes} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR rr : REAL; tp, vp1, vp2 : DVE_ptr_; ii, n : INTEGER; 128 BEGIN rr := 0; n := BPN.n_input + BPN.n_hidden + BPN.n_output; vp1 := BPN.vos; vp2 := BPN.vts; FOR ii := 1 TO BPN.n_output DO BEGIN IF (vp1^.dptr <> NIL) AND (vp2^.dptr <> NIL) THEN BEGIN rr := max (ABS(rr), ABS(vnp_(vp2^.dptr)^.v - vnp_(vp1^.dptr)^.v)); IF vp1^.right <> NIL THEN vp1 := vp1^.right; IF vp2^.right <> NIL THEN vp2 := vp2^.right; END ELSE BEGIN END; END; BP_net_error := ABS(rr); END; PROCEDURE Allocate_IO_vectors (VAR net : BP_net_); TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, N : WORD; BEGIN IF debug THEN BEGIN Writeln ('ALLOCATE_IO_VECTORS'); Writeln (NET.OUT_F,'ALLOCATE_IO_VECTORS'); END; N := net.n_input + net.n_hidden + net.n_output; net.vi := Create_DVE_vector (net.n_input, SizeOf(vector_node_)); net.vos := Create_DVE_vector (net.n_output, SizeOf(vector_node_)); net.vts := Create_DVE_vector (net.n_output, SizeOf(vector_node_)); net.voe := Find_element_DVE (net.n_output, net.vos); net.vte := Find_element_DVE (net.n_output, net.vts); IF (debug) THEN {} BEGIN Writeln ('END ALLOCATE_IO_VECTORS'); Writeln (NET.OUT_F,'END ALLOCATE_IO_VECTORS'); END; END; PROCEDURE Allocate_node_vector (VAR net : BP_net_); TYPE bpnp_ = BP_node_ptr_; 129 wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, cnt, N : INTEGER; Temp : BP_node_ptr_; nptr : DVE_ptr_; low, high : REAL; ns : BP_node_; BEGIN cnt := 1; N := net.n_input + net.n_hidden + net.n_output; IF debug THEN BEGIN Writeln ('ALLOCATE_NODE_VECTOR'); Writeln (NET.OUT_F,'ALLOCATE_NODE_VECTOR'); END; net.vs := Create_DVE_vector (net.n_input, SizeOf (BP_node_)); net.ve := Find_element_DVE (net.n_input, net.vs); nptr := net.vs; FOR ii := 1 TO net.n_input DO BEGIN temp := nptr^.dptr; Temp^.loc := cnt; cnt := cnt + 1; Temp^.nt := Input; {set node type} Temp^.ni := 0; {init net input} Temp^.fw := NIL; {no weight yet} Temp^.bw := NIL; {no weight yet} Temp^.delta := 0; Temp^.base := 0; Temp^.range := 1; Temp^.theta := 0; Temp^.dtheta := 0; Temp^.ldtheta := 0; nptr := nptr^.right; END; {FOR} net.ve^.right := Create_DVE_vector (net.n_hidden, SizeOf(BP_node_)); net.ve^.right^.left := net.ve; nptr := net.ve^.right; net.ve := Find_element_DVE (net.n_input + net.n_hidden, net.vs); FOR II := 1 TO net.n_hidden DO BEGIN temp := nptr^.dptr; Temp^.loc := cnt; cnt := cnt + 1; Temp^.nt := hidden; {set node type} Temp^.ni := 0; {init net input} Temp^.fw := NIL; {no weight yet} Temp^.bw := NIL; {no weight yet} Temp^.delta := 0; 130 Temp^.base := 0; Temp^.range := 1; Temp^.theta := gaussian_noise (0,0.25); Temp^.dtheta := 0; Temp^.ldtheta := 0; nptr := nptr^.right; END; net.ve^.right := Create_DVE_vector (net.n_output, SizeOf(BP_node_)); net.ve^.right^.left := net.ve; nptr := net.ve^.right; net.ve := Find_element_DVE (net.n_input + net.n_hidden + net.n_output,net.vs); FOR II := 1 TO net.n_output DO BEGIN temp := nptr^.dptr; Temp^.loc := cnt; cnt := cnt + 1; Temp^.nt := Output; {set node type} Temp^.ni := 0; {init net input} Temp^.fw := NIL; {no weight yet} Temp^.bw := NIL; {no weight yet} Temp^.delta := 0; Temp^.base := 0 {low}; Temp^.range := 1 {high - low}; Temp^.theta := gaussian_noise (0,0.25); Temp^.dtheta := 0; Temp^.ldtheta := 0; nptr := nptr^.right; END; IF (debug) THEN BEGIN Writeln ('END ALLOCATE_NODE_VECTOR'); Writeln (NET.OUT_F,'END ALLOCATE_NODE_VECTOR'); END; END; PROCEDURE Allocate_weight_matrix (VAR net : BP_net_); TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR ii, jj, N : WORD; temp : DVE_ptr_; tl, tc, tls, tcs : weight_node_ptr_; Vt : Vector_node_ptr_; cnt : INTEGER; 131 BEGIN IF debug THEN BEGIN Writeln ('ALLOCATE_WEIGHT_MATRIX'); Writeln (NET.OUT_F,'ALLOCATE_WEIGHT_MATRIX'); END; cnt := 1; N := net.n_input + net.n_hidden + net.n_output; net.ws := create_matrix (n,n,SizeOf(weight_node_)); FOR ii := 1 TO n DO BEGIN FOR jj := 1 TO n DO BEGIN temp := Find_element_matrix(ii, jj, net.ws); IF temp <> NIL THEN BEGIN wnp_(temp^.dptr)^.connect := FALSE; wnp_(temp^.dptr)^.w := 2 * Random - 1; {random weights, -1 < w < 1 } wnp_(temp^.dptr)^.dw := 0; wnp_(temp^.dptr)^.ldw := 0; END; {IF temp <> NIL} END; END; IF (debug) THEN {} BEGIN Writeln ('END ALLOCATE_WEIGHT_MATRIX'); Writeln (NET.OUT_F,'END ALLOCATE_WEIGHT_MATRIX'); END; END; PROCEDURE Link_weights_to_nodes (VAR net : BP_net_); TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR Start, TempD, TempA : DVE_ptr_; Vt : DVE_ptr_; ii, jj, N : INTEGER; BEGIN {Link to node vector} IF debug THEN BEGIN Writeln ('LINK_WEIGHTS_TO_NODES'); Writeln (NET.OUT_F,'LINK_WEIGHTS_TO_NODES'); END; N := net.n_input + net.n_hidden + net.n_output; Start := net.ws; TempD := Start; TempA := Start; Vt := net.vs; 132 FOR ii := 1 TO N DO BEGIN bpnp_(Vt^.dptr)^.fw := TempD; TempD := TempD^.down; bpnp_(Vt^.dptr)^.bw := TempA; TempA := TempA^.right; Vt := Vt^.right; END; IF (debug) THEN {} BEGIN Writeln ('END LINK_WEIGHTS_TO_NODES'); Writeln (NET.OUT_F,'END LINK_WEIGHTS_TO_NODES'); END; END; {Link_weights_to_nodes} PROCEDURE Display_node_type (VAR net : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR tp : DVE_ptr_; N, ii : INTEGER; BEGIN {} IF debug THEN BEGIN Writeln ('DISPLAY_NODE_TYPE'); Writeln (NET.OUT_F,'DISPLAY_NODE_TYPE'); END; tp := net.vs; N := net.n_input + net.n_hidden + net.n_output; FOR ii := 1 TO n DO BEGIN {} CASE bpnp_(tp^.dptr)^.nt OF Input : BEGIN Write (net.out_f,'I'); Write ('I'); END; hidden : BEGIN Write (net.out_f,'H'); Write ('H'); END; Output : BEGIN Write (net.out_f,'O'); Write ('O'); END; END; TP := tp^.right; END; {} Writeln (net.out_f); 133 Writeln ; IF (debug) THEN {} BEGIN Writeln ('END DISPLAY_NODE_TYPE'); Writeln (NET.OUT_F,'END DISPLAY_NODE_TYPE'); END; END; {} PROCEDURE Display_node_deltas (VAR net : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR tp : DVE_ptr_; N, ii : INTEGER; BEGIN {} IF debug THEN BEGIN Writeln ('DISPLAY_NODE_DELTAS'); Writeln (NET.OUT_F,'DISPLAY_NODE_DELTAS'); END; tp := net.vs; N := net.n_input + net.n_hidden + net.n_output; FOR ii := 1 TO n DO BEGIN {} TP := tp^.right; END; {} IF (debug) THEN {} BEGIN Writeln ('END DISPLAY_NODE_DELTAS'); Writeln (NET.OUT_F,'END DISPLAY_NODE_DELTAS'); END; END; {} PROCEDURE set_connectivity (VAR net : BP_net_); TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; VAR wpt : DVE_ptr_; tvd, tva : DVE_ptr_; ii, jj, n : INTEGER; BEGIN IF debug THEN BEGIN 134 Writeln ('SET_CONNECTIVITY'); Writeln (NET.OUT_F,'SET_CONNECTIVITY'); END; n := net.n_input + net.n_hidden + net.n_output; tvd := net.vs; {beginning of node vector} tva := net.vs; wpt := bpnp_(tvd^.dptr)^.fw; {display_node_type (net);} FOR ii := 1 TO n DO BEGIN FOR jj := 1 TO n DO BEGIN CASE bpnp_(tvd^.dptr)^.nt OF Input : BEGIN wnp_(wpt^.dptr)^.connect := FALSE; END; hidden : BEGIN IF (bpnp_(tva^.dptr)^.nt = Input) THEN BEGIN wnp_(wpt^.dptr)^.connect := TRUE; END ELSE BEGIN wnp_(wpt^.dptr)^.connect := FALSE; END; END; Output : BEGIN IF (bpnp_(tva^.dptr)^.nt = hidden) THEN BEGIN wnp_(wpt^.dptr)^.connect := TRUE; END ELSE BEGIN wnp_(wpt^.dptr)^.connect := FALSE; END; END; END; {case} wpt := wpt^.right; tva := tva^.right; END; tvd := tvd^.right; tva := net.vs; wpt := bpnp_(tvd^.dptr)^.fw; END; IF (debug) THEN {} BEGIN Writeln ('END SET_CONNECTIVITY'); Writeln (NET.OUT_F,'END SET_CONNECTIVITY'); END; END; PROCEDURE Display_output (VAR net : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; 135 vnp_ = vector_node_ptr_; BEGIN {} IF debug THEN BEGIN Writeln ('DISPLAY_OUTPUT'); Writeln (NET.OUT_F,'DISPLAY_OUTPUT'); END; display_vector (net.vos,net.n_output,net); IF (debug) THEN {} BEGIN Writeln ('END DISPLAY_OUTPUT'); Writeln (NET.OUT_F,'END DISPLAY_OUTPUT'); END; END; {} FUNCTION BP_error_measure (Output_vector_ptr : DVE_ptr_; Training_vector_ptr : DVE_ptr_; net : BP_net_): REAL; BEGIN END; {BP_error_measure} {---------------------------------------------------------------------} PROCEDURE BP_set_net_defaults (VAR net : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; BEGIN {} WITH net DO BEGIN {} vs := NIL; ve := NIL; ws := NIL; vi := NIL; vos := NIL; voe := NIL; vts := NIL; vte := NIL; maxerr := 0.2; errtol := 0.1; learning_rate := 0.5; alpha := 0.9; {factor for momentum term} n_input := 1; n_hidden := 1; n_output := 1; 136 { data_fname := 'BP.DAT';} training_iterations := 1; out_fname := 'BP.OUT'; wt_fname := ''; END; {} END; {} PROCEDURE BP_get_net_params_from_user (VAR net : BP_net_); {} TYPE bpnp_ = BP_node_ptr_; wnp_ = weight_node_ptr_; vnp_ = vector_node_ptr_; BEGIN {} Write ('File to output run data to : '); Readln (net.out_fname); Assign (net.out_f,net.out_fname); Rewrite (net.out_f); {learning rate} Write ('Learning rate? : '); Readln (net.learning_rate); Writeln (NET.OUT_F,'Learning rate : ',net.learning_rate:5:3); Writeln ('Learning rate : ',net.learning_rate:5:3); {momentum factor} Write ('Momentum factor? : '); Readln (net.alpha); Writeln (NET.OUT_F,'Momentum factor : ',net.alpha:5:3); Writeln ('Momentum factor : ',net.alpha:5:3); {# of input nodes?} {BP_INPUT_NODES_NUM} Write ('Number of input nodes? : '); Readln (net.n_input); Writeln (NET.OUT_F,'# of input nodes : ',net.n_input:3); Writeln ('# of input nodes : ',net.n_input:3); {# of hidden nodes in vector} Write ('Number of hidden nodes? : '); Readln (net.n_hidden); Writeln (NET.OUT_F,'# of hidden nodes : ',net.n_hidden:3); Writeln ('# of hidden nodes : ',net.n_hidden:3); {# of output nodes} Write ('Number of output nodes? : '); Readln (net.n_output); Writeln (NET.OUT_F,'# of output nodes : ',net.n_output:3); Writeln ('# of output nodes : ',net.n_output:3); {error tolerance} Write ('Error tolerance? : '); Readln (net.errtol); Writeln (NET.OUT_F,'Error tolerance : ',net.errtol:5:3); Writeln ('Error tolerance : ',net.errtol:5:3); Writeln; 137 Write ('Name of data file : '); Readln (net.data_fname); Writeln (NET.OUT_F,'Data file : ',net.data_fname:15); Writeln ('Data file : ',net.data_fname:15); Write ('Number of training runs? : '); Readln (net.training_iterations); Writeln (NET.OUT_F,'# of iterations : ', net.training_iterations:3); Writeln ('# of iterations : ',net.training_iterations:3); Writeln; END; {} PROCEDURE BP_set_net_params_from_file (VAR net : BP_net_; VAR Fname : STRING); { Use the following format for data entries in this section: !