[TOC]
This project contains simple implementations of two well-known neural network algorithms: David Rumelhart's error back-propagation (EBP) supervised learning algorithm and Teuvo Kohonen's self-organising map (SOM) unsupervised learning algorithm.
My goal is to show programmers how mathematics in scientific papers is transformed into prototype code. This is an essential skill for all programmers who work in science and engineering. As such, the code here is a direct translation of the mathematical descriptions given by Rumelhart in his paper Learning Internal Representation by Error Propagation (LIR) and by Kohonen in his paper The Self-Organising Map (SOM). Do note that this project is a work in progress. I will be making regular updates to both the code and this description, until the project reaches its maturity.
To learn the theory of back-propagation neural networks, read LIR (about 30 pages) and SOM (about 20 pages). To understand the modern, vectorised implementation, read chapter 4 Multilayer Feedforward Networks (about 70 pages) and chapter 7 Matching and Self-Organizing Networks (about 60 pages) of Jacek Zurada's classic textbook Introduction to Artificial Neural Systems (ANS). LIR is written by a psychologist, SOM is written by a neuroscientist, and ANS is written by an electrical engineer. Studying these sources will give you a diverse perspective on this broad, multi-disciplinary subject.
This project is implemented in standard C, and it is self contained; it makes no references to third-party libraries. The EBP implementation includes XOR-2 and Encoder-8, two of the several example networks described by Rumelhart. The SOM implementation includes the minimum spanning tree (MST) example network described by Kohonen. For pedagogy and for clarity purposes, the code contains few error checks and even fewer code optimisations, because check and speed-up tricks obscure the algorithm. This intentional shoddiness is tolerable, since this code is meant for instructional use and not for production use.
An artificial neuron produces its output as follows: first, it sums up the weighted input values then, when the net input exceeds a certain threshold, it transforms the net input with an activation function. The adjustable threshold and the activation function are the sources of the neuron's non-linear behaviour. In implementations, the threshold is treated as though it is an input weight connected to a fixed-output, phantom neuron in the previous layer. The figure below shows the artificial neuron model. All practical neural network algorithms use this neuron arranged in layers. Some algorithms, like the SOM, use a single layer of neurons. But most algorithms use multiple layers of neurons, like the EBP.
LIR describes the EBP algorithm using the summation notation. For instance, the output of the neuron
In keeping with LIR description, lir.c
implements the algorithm using for
loops. Neural networks literature indexes neurons in layers using a[j][i]
, as opposed to the traditional C convention a[i][j]
. Programmers beware. In pseudocode, o[l][j]
is computed like this:
for p in P # data patterns
for l in L # processing layers
for j in N[l] # nodes in current layer l
net = 0.0
for i in N[l - 1] # nodes in upstream layer l - 1
net += w[l][j][i]
o[l][j] = f(net);
That is a whole lot of for
loops. But it is an honest translation of Rumelhart's algorithm to C.
On the other hand, every implementation of back-propagation you will find on the Internet, be it a prototype or a production version, are vectorised versions. Indeed, the most popular machine learning framework, TensorFlow, is named so because it is implemented using tensors. Simply put, tensors are dimensional extensions of matrices, just like matrices are extensions of vectors and vectors are extensions of scalars.
In vectorised version, like that described in ANS, the data pattern is the p
vector, the weights of the layer l
is the w[l]
matrix, the output of the layer l
is the o[l]
vector, and so on. Matrix-based implementation is well-suited to modern GPUs, which are equipped with powerful matrix manipulation pipelines (because transformations in 3D computer graphics are implemented using matrices). The matrix-based implementation is also far more compact and is easier to understand. The loopy pseudocode above reduces to the following vectorised pseudocode:
for p in P
for l in L
o[l] = f(w[l] * p)
Despite all the advantages of the matrix-based EBP implementation, I chose the for
loop version given in LIR, so as to show explicitly how the equations are realised in code. Unlike LIR, however, SOM presents its winner-takes-all competitive algorithm using the more common matrix notation. As such, I give here a vectorised implementation for SOM.
The project is structured thus:
~/nn/
LICENSE # MIT license
Makefile # build script
README.md # this document
bin/ # binaries directory
test.sh # test script
csv.[ch] # CSV utility
dat/ # CSV data directory
lir-enc8*.csv # encoder problem from LIR
lir-xor2*.csv # XOR problem from LIR
som-mst*.csv # minimum spanning tree problem from SOM
som-rgb*.csv # RGB colour classification problem
etc.[ch] # network utilities
lir.[ch] # LIR implementation
lirmain.c # LIR main()
som.[ch] # SOM implementation
sommain.c # SOM main()
vec.[ch] # vector algebra utilities
The programmes are written for Unix-like operating systems. I do not program on Windows; neither should you. To compile and run the programmes, type in the following at a Unix command prompt:
$ cd ~/nn
$ make clean all
...
$ ./lir lir-xor2
...
$ ./lir lir-enc8
...
$ ./som som-mst
...
$ ./som som-rgb
...
Almost every statement in lir.[ch]
and som.[ch]
modules is commented. The comments cite LIR, SOM, and ANS by chapter, section, equation, and page, thus allowing you to trace the C functions back to their source equations. And to aid tracing, I named the network parameters as close as practicable to the respective author's notation.
The procedure run()
in *main.c
first loads from the dat/
data directory the CSV configuration file of the specified network, say dat/lir-xor2.csv
. This configuration file specifies the network architecture and the training parameters:
-
LIR:
-
name
—name of the network (also the base name of the CSV files) -
C
—number of training cycles -
L
—number of processing layers -
I
—number of input taps -
N
—number of nodes per processing layer, separated by|
-
f
—layer-wide activation function (...u
for unipolar;...b
for bipolar) -
eta
—learning rate -
alpha
—momentum factor -
epsilon
—RMS error criterion -
P
—number of data patterns -
shuffle
—shuffle pattern presentation order
-
-
SOM:
-
name
—name of the network (also the base name of the CSV files) -
C
—number of training cycles -
I
—number of input taps -
W
—number of nodes in the$x$ direction -
H
—number of nodes in the$y$ direction -
dist
—distance measure (inner
for inner product;euclidean
for Euclidean distance) -
alpha
—learning factor -
epsilon
—RMS error criterion -
P
—number of data patterns -
shuffle
—shuffle pattern presentation order
-
Using these network parameters, run()
creates a network, loads the pattern vectors, and train the network. During training, the current RMS error is reported every few cycles. Upon completion of training, run()
prints out the final weights. The pattern vectors are specified in their respective CSV files, one row per pattern.
The module etc.[ch]
implements utilities common to both EBP and SOM networks, such as the initial weights randomiser. This module also contains the various activation functions used by the EBP network. Each activation function has a unipolar version and a bipolar version.
The SOM network does not use activation functions; instead, it uses vector-space distance measures. The inner product (similarity cosine) measure is implemented by the vecinner()
function and the Euclidean distance measure is implemented by the veceuclidean()
function, which are defined in the vec.[ch]
module. The module vec.[ch]
implements vector and matrix manipulation utilities. Refer to chapter 7 Vector Algebra and chapter 8 Matrices and Vector Spaces of Mathematical Methods for Physics and Engineering, Riley (2006).
The module csv.[ch]
implements a simple CSV parser described in section 4.1 Comma-Separated Values of The Practice of Programming, Kernighan (1999).
I chose the C programming language for these reasons. C is a small, simple imperative language, so there is little or no abstractions to distract us from our main purpose. C is also very close to hardware; it is but a thin coat of syntactic sugar atop assembly, so it is the fastest high-level language. Direct access to hardware, speed, and simplicity are why C became the canonical system programming language over the decades. Exploiting C's strengths and coping with its many traps makes the programmer more mechanically sympathetic, a trait modern programmers have lost long ago. Those who use modern, GPU-based deep learning frameworks will benefit from knowing C. Lastly, this year 2022, is C's 50th anniversary, and I wish to honour this long-lived language that is still blazing trails, despite its age. It is remarkable that C has change very little over the past five decades. This unrivalled stability is a testament to the far-reaching vision of its designer, Dennis Ritchie.
If you want to experiment with your own EBP network, create lir-yours.csv
, lir-yours-i.csv
, and lir-yours-t.csv
as described above, place the CSV files in the directory ./dat/
, and type ./lir lir-yours
at the command prompt. For SOM network, use som-yours.csv
and som-yours-i.csv
. The lir-
and som-
prefixes are there only to keep the CSV files organised by network architecture.
When creating your own data files for EBP networks, normalise the input and target vector components to the closed interval
Remember that both lir
and som
programmes in this project accepts only CSV-formatted UNIX text files. Given the minimal error checking in the code, non-CSV files, binary files, and Windows text files will likely crash the programmes.
I present an informal description of EBP and SOM algorithms in this section. I give just enough explanation to enable you to navigate the code. If you want more details, see my article How Artificial Intelligence Works. Better still, read the original papers by Rumelhart and Kohonen.
EBP is a connectionist reimagining of the well known gradient descent optimisation technique invented by Cauchy in 1847. The innovation that EBP brings is the efficient, parallel implementation, which is a common trait of neural network algorithms. By "parallel", I mean the parallelism within each layer of the network, which is exploited by all modern, GPU-based, vectorised implementations. Such optimisation techniques are beyond the scope, and contrary to the pedagogical purpose, of this project.
EBP, like all other neural networks, is constructed from artificial neurons arranged in layers, as shown in the figure below. A neuron is just a non-linear function that maps the weighted sum of its inputs to an output value, based on a bias value:
EBP, like all neural networks, learns incrementally the structure and distribution of the patterns
When the neuron responses reach the last (output) layer of the network, the output vector
This commences the backward pass of the learning process. We use this output-layer error vector to compute that layer's weight adjustment matrix:
Next, we propagate the output-layer error signal, which was caused by a training data pattern, backward through the network, one layer at a time. We shall denote the output layer's adjacent upstream layer with the subscript
Using the weight adjustment equation given above, we accumulate the
SOM is a connectionist version of the well known k-means clustering technique. The innovation that SOM brings, not surprisingly, is the efficient, parallel implementation. The
EBP is a simple algorithm. SOM is at least an order of magnitude simpler. SOM consists of one layer of neurons, called the map, as shown in the figure below. Being an unsupervised algorithm, it requires no target vectors. The neurons in the map are arranged either in a rectangular grid or a hexagonal grid. For simplicity, we shall consider a rectangular grid.
The underlying principle of SOM is the idea of winner-takes-all. An input pattern vector
In this way, each input pattern picks out a winning neuron, and the winner's weights
In a simple competitive learning network, we update only the winning neuron's weights. But in an SOM network, we deal with a neighbourhood
Repeating this process for all the patterns in the data set completes one learning cycle. Usually, an SOM network requires about
My goal for this project is to show IT programmers how to convert equations into lines of code, using a subject that is near and dear to them—AI programming.
The most effective way to learn AI programming and to be proficient practically is to study the AI algorithms and their theoretical foundations. The code shows you how the implementation works, but to understand the theory, you must read the original sources. Armed with the informal, theoretical overview given above, you might find tackling scientific publications somewhat easier. In any event, you must study on your own the theoretical concepts of AI; start with the classic publications listed below.
There is a trend among young STEMers to under estimate the value of old publications, mistakenly believing that old means irrelevant. If that were so, no one would hold Archimedes, Newton, or Einstein in high esteem. STEM textbooks can be categorised as modern, classic, and vintage. Most modern texts strive to cover all the important, recent advances and applications, so their theoretical presentations necessary take a less prominent role. Modern texts are an excellent way to keep abreast with the latest developments in the field. Some classic texts, especially those that were published just after the emergence of a groundbreaking idea, tend to be the best if one wishes to study that idea, in-depth. And given that they were published at the birth of an idea, their pages are not polluted with application examples and practice guides, so they are easier to read for novices interested in the underlying theory. But after about a century, all texts begin to show their age and they become vintage: their then-current examples become outdated and their stale notations grate modern sensibilities. Still, there are several vintage texts that are good-reads, much like Shakespeare still is. So, do not dismiss an old textbook by its publication date imprinted inside the front cover.
- Theory:
- Learning Internal Representations by Error Propagation, Rumelhart (1986)
- The Self-Organising Map, Kohonen (1990)
- Introduction to Artificial Neural Systems, Zurada (1992)
- Mathematical Methods for Physics and Engineering, Riley (2006)
- Practice:
- The Practice of Programming, Kernighan (1999)
- The C Programming Language, Kernighan (1989)