HØit Nr. 1-97
Previous article Next article TOC: Nr. 1, 1997 Previous Issue Next Issue About HØit

A Versatile VME Digital Neural Network for Control and Pattern Recognition Applications


Maxim Minerskjöld* , Thomas Lindblad** , Clark S. Lindsey**

*) Østfold College and Department of Physics, Frescati
**) Royal Institute of Technology, Stockholm, Sweden.


Abstract: We describe a versatile neural network hardware implementation of the NeuraLogix NLX420 on a VME-module. This implementation provides a general purpose, digital network that can be used in a wide variety of scientific and technical applications yet is simple and cost-effective. Less than 20 µs is required for a neural computation with 16 bit precision. The computational time is independent of the number of neurons, up to 16 per layer, since all neurons run in parallel.

1. Introduction

Although neural networks have been studied for many years it is only through the use of modern computers (and to advances in neural network theory), that artificial implementations have become practical for study and application. However, although there are many examples of neural network codes running on von Neumann computers, there are still few commercially available neural networks implemented in hardware. Such devices would take full advantage of the neural network concept and provide powerful computational capabilities, including its massive parallelism, redundancy , robustness and desired "slack in operation".

Throughout the years, the family tree of artificial neural networks has grown to include a wide variety of architectures and training paradigms. Since most studies are done by computer simulations, the results have been biased somewhat by the features of serial processing. A very common type of neural net is the feed-forward net with two or more layers. In this architecture, the processed result of one neuron is sent to only these neurons in the subsequent layer. Such networks can be trained by algorithms such as the well-known back-propagation method [1].

In this paper we describe a versatile neural network hardware implementation on a VME-module. This implementation provides a general purpose, digital network that can be used in a wide variety of scientific and technical applications yet is simple and cost-effective. The VME-card is based on two NeuraLogix NLX420 Neural Processor Slice (NPS) [2]. The "slice" concept implies a versatile interconnectivity and expansion of networks with multiple chips. This chip can have 1, 4, 8 or 16-bit integers as inputs and a single chip can operate at 300 millions interconnections per second [2].

2. General considerations.

Although the manufacturer of the NLX 420 NPS circuit supplies a PC-AT evaluation card with 1 to 4 chips [4], there are several disadvantages with this card. Although it is well suited for getting to know the NPS, the system is very slow due to the fact that net inputs are presented to the chip one by one in a handshaking manner. This mode slows down considerably the performance of the chip. At least 100 µs is added for a transfer of 16 net inputs from the host CPU to the chip. Although, this may not be a problem during the training, the propagation time during forward processing will be slow. Therefore, we decided to build a new card that would take full advantage of the NPS's speed.

Due to the VME bus's wide use in data acquisition and control systems, we decided to implement the chips on a VME card. We use a single height Eurocard (100x160 mm). Because of the limited space, the "glue logic" and the address decoding and other necessary signals for the VME bus handling are generated in two programmable logic chips [5]. The base address and the access address mode can be changed at any time to resize and/or modify the neural network module.

Possible requirements for the interface included 8 and 16 bit data transfers and more than 30 instructions. The programmable logic gives the user the possibility to make the interface as fast or slow as the system requires. In this case, speed has been emphasized. We concluded that less than 32 neurons were required for most of our applications. Hence, the present VME card is equipped with only two NPS chips, one per 16 neuron layer.

3. The NLX420 Neural Processor Slice

The NPS is a CMOS VLSI circuit providing a building block for digital neural networks. Networks can be expanded by adding more chips to increase the number of processing elements or neurons (16 per chip) and/or layers. Each of the 16 neurons can be utilized once for maximum processing speed or used in a time multiplexed mode to emulate multiple neurons and minimize hardware. This will, of course, result in loss of processing speed. Transfer functions are fully configurable by the user via piece-wise linear functions in a lookup table. If a sigmoid transfer function is required, the user puts the function into the 16 internal lookup RAM locations with corresponding values. Since the transfer f unction is never ideal [2], a reduction of precision occurs. The neural processing cycle itself is initiated by setting up the external weight RAM and the control registers. Further details on the NPS are found in the data sheet [3] and pertinent application notes and manuals [3-4].

4. Description of the VME module and design

The module is configured as a 2 layer neural network with 16 neurons each. The configuration can also be described as: 1-32 inputs, 16 neurons in the hidden layer and 16 outputs. The neural network on the module can be configured in many ways within these limitations. The two chips are hardwired by a 16 bit bus with corresponding control signals and this can not be changed. However, if only a one layer network is desired, the second chip has a bypass option. The basic elements of the module are shown below.

Figure 1.
Figure 1. Schematic view of the 2 layer digital neural network VME module. The RAM buffer is located next to the left weight RAM. The two external RAM's hold the weight values and are controlled by CNT (count enable) and CLR (clear). These signals are switched off during the configuration setup. The outputs from the NLX420 are all tri-state and controlled by OE. The signals SEL(3:0) selects the 16 internal neuron output registers. The VME interface and the control circuits are programmable logic chips.

During the design, we wanted to speed up the operation of the NPS chip. It was found that choosing the NeuraLogix suggested structure [2] for synchronous mode and adding some electronics that an increase in speed was possible. The first step was to put a RAM buffer, which holds the net inputs, in front of the first chip. This external RAM buffer, together with the external status and control registers, behaves like a NPS chip as seen from the first layer. When the bit STB1 is asserted to logic "1", the contents of the RAM buffer are synchronously clocked into the NPS chip.

This is the fastest mode of operation of the NPS and it also does not require further control or supervision from the host computer. The idea is to start the neuron process and, as soon as the net inputs are downloaded, change the content in the RAM buffer and then read the neuron outputs that were processed meanwhile. This is a "push&pull" strategy for minimal time loss for loading and retrieving data to and from the neural network and the processing speed is virtually doubled compared with the original PC-card.

Each NPS has an external weight RAM with respective counters for address generation. The network and the weight RAM are configured through the VME interface. Once configured, the system reset is activated and each address generator is cleared with CLR to a starting address 00h. Both layers are synchronized to the system clock and a system reset MRST. Layer 2 receives net data from the first layer's neuron outputs results. The final output of the neural network is taken from the layer 2 chip.

Before a neuron cycle takes place, the number of inputs and neurons must be written to the internal counters. At the beginning of a cycle the STB1 must go high. At the end of a cycle the NPS has to be frozen by asserting the signal FRZ . Then the neuron output registers selected by the external OE and SEL(3:0) bits can be read.

5. The VME interface and the control registers

Since the VME bus is used, a small interface had to be designed and constructed. The VME interface with DTACK, IACK and interrupt [7] was made and tested with a development system for high-density programmable logic [6]. The programmable logic decodes the 24 bit wide VME address lines and allows both 8 and 16 bit transfers. The base address for the module is FE2000h and the least 8 significant address lines provide 256 instructions for the neural network module. A 8-bit status register checks for the signals coming from the NPS chips in layer 1 and 2. The control register asserts the following signals: FRZ, MRST, STB1 (freeze, master reset and stand-by).

The control register has two tasks. The first is to control the internal neural processes and the second task is to supervise the communication between the RAM buffer and the first NPS. This second feature makes it possible for the first layer to get the net information in a synchronous way for better performance and the host computer can deal with other tasks instead of being in a wait state to transfer data to the neural network.

6. Timing

Referring to the timing diagram in fig. 2, the signal marked with NORDY1 indicates the neuron busy signal. Point A indicates where a new neuron cycle begins. Here STB1 is asserted to logic '1' in the control register. The net inputs are loaded automatically. Points B and C show when the host computer checks the status of NORDY1. When NORDY1 is '1', at D, new net inputs are loaded to the RAM buffer and the neuron outputs are read. This also triggers the second chip to get the neuron output result from the first chip. At E the second NPS chip starts its neuron cycle at the same time as new input data is downloaded to the RAM buffer. Hence, no time is wasted waiting for the chips to finish the neuron process. Instead this time is used for other tasks. The procedure is repeated for the next new neuron cycle.

Figure 2.
Figure 2. Timing diagram of the VME module communication and processes. The signals AS, DS0, W/R and the clock are typical VME bus signals. STB1, NORDY1 and CTS1 are on-board signals. Letters A-E indicate points in time. The additional text with respective arrows marks periods in time.

7. Operation, programs and measurements of the neural network

The NPS has no on-chip learning. A back-propagation (BP) program in the host processor determines the proper weights. However, the chip can be used as a co-processor, for example, by doing the forward pass in the NPS.

Two assembler programs [9] were developed, one for initialization and configuration of the NNW module and one for operation. Some additional programs were also developed in QBASIC to rewrite the assembler program transfer function setup file. The assembler programs are compiled into S-format and transferred on a serial cable from the PC to the VME controller [10]. A higher level C program is in development.

Figure 3.
Figure 3. The derivative of the line is 1 µs, which is the time to collect one input. The interception with the Y axis is 16µs. The errors are within 0.2µs.

Tests indicate that the NPS chip has a constant processing time for different network configurations. The processing time did not alter with number of neurons, nor if the transfer function was bypassed or changed. The total processing time only depends on the number of inputs and scales linearly. It is easy to calculate the time consumption from fig. 3.

Measurements with a LogicScope [8] shows that one neuron cycle (cf. fig. 2) takes about 16 µs and is independent of the number of neurons used (up to 16). This paper shows that a simple yet powerful neuron network tool could be constructed. It is 2-5 times slower than systems like the ETANN but is considerably cheaper. Even if the NPS is not run with a 20 MHz clock [3], performance suitable for many signal processing and control applications is provided. The goal to build a fast network with a relatively old chip was achieved.

Acknowledgement

We would like to thank Paul Basehore and Mike Ziemacki for making available the source code for the NPS development system.

References

  1. McClelland, T.L., D.E. Rumerlhart, 1986. Parallel Distributed Processing. Cambridge: MIT Press and PDP Research Group.
  2. Neural Processor Slice Development System. NeuraLogix, Inc. 800 Charot Av., Suite 112, San Jose, CA, USA.
  3. Data Sheet NLX420. 1992 NeuraLogix, Inc.
  4. NPSDEVS PC card. 1992 NeuraLogix, Inc.
  5. Lattice Data Book 1994. Lattice ispLSI1016, p.2-61.
  6. pLSI/ispLSI Development System. Lattice Semiconductor Group.
  7. The VMEbus Handbook 3:rd edition by Wade D. Peterson. VFEA International Trade Association, 10229 N. Scottsdale Road, Suite B, Scottsdale, AZ 85253 USA.
  8. TLS216 LogicScope. Tektronix, Inc.
  9. Quelo A68K assembler.
  10. FORCE CPU-1. SYS68K/CPU-1 User's manual. First Edition March 1983. FORCE Computers advanced systems. Freischuetzstrasse 92, D-8000 Muenchen, Germany.

Previous article Next article TOC: Nr. 1, 1997 Previous Issue Next Issue About HØit
HØit Nr. 1-97


Copyright: 1996, Høgskolen i Østfold. Last Update: 28.06.97, Thomas Malt.