units. a sentence).The first layers within a CNN detect

units. Subsequently, an error is calculated (for example the mean squared error associatedwith the network’s prediction) and minimised (e.g. by adjusting the weights and biases ofthe neurons) through techniques such as stochastic gradient descent powered backpropagation.Finally, upon training completion, only the forward pass is used to make predictions.2.5 Convolutional Neural Networks : Theory and Appli-cationWhile MLPs are powerful models, they are limited in their ability to observe context betweeninput variables. With variable context being considered a key predicate to the research a moresophisticated neural network model was required. Accordingly, substantial investigation intoCNN approaches and architectures; as a precursor to the thesis’ architecture.2.5.1 General BackgroundA relatively recent innovation, CNNs are a class of deep, neural networks that have been suc-cessfully applied to computer vision problems with state of the art results, best demonstratedin the ILSVRC ImageNet competition (Russakovsky et al. 2015)3. The neuron connection pat-terns are modelled to approximate the structure/organisation of the animal visual cortex. Asin animal models, in order to better identify local patterns, individual cortical neurons onlyrespond to stimuli in a confined spatial region, the receptive field; these receptive fields overlappartially such that they cover the entire visual input, and hence can process a picture in ag-gregate (Wikipedia 2017). Much like in MLPs each layers purpose within a CNN is to identifypatterns within the data. The key difference being that because of the filter, convolution andpooling architecture these neuron clusters have the ability to recognise contextual data, suchas adjacent pixels within an image (or adjacent words within a sentence).The first layers within a CNN detect simple features that can be recognised and interpretedrelatively easy, e.g. an edge of an object. Subsequent layers detect features of features, e.g.a corner, and finally are aggregated to form a representation of the object as a whole e.g. atable. The precise location of a feature is of no consequence as filters are designed to sweep(or convolve) over the image until its entirety has been examined. Pooling serves a relatedfunction, combining the outputs of filters/neuron clusters from one layer into a single neuroninput for the subsequent layer. Max pooling, for example, identifies the maximum value withineach convolution and then presents it to the next neuron layer, sometimes as part of a furtherconvolution and filter set. Finally, a set of dense layers, of the same basic construction as amulti-layer perceptron, are used to make final predictions (typically classifications) of an input.A significant benefit of CNNs is that, when compared to other computer vision approaches,they require relatively little manual feature engineering (adapted from (Wikipedia 2017)). Thisindependence from human intervention, feature design and engineering is of substantial practicaland theoretical benefit. Given this thesis’ subject matter, and time constraints, this capabilitywas a particularly attractive.42.5.2 ArchitecturesA variety of architectures exist within the CNN family with convolutions, pooling, filters andnon-sigmoidal activations prevalent. One of the first successful iterations was developed byLeCun et al. (1990), with the best known the LeNet architecture applied to the task of readingaddresses, postal codes etc.(CS 2017). Development continued sporadically in the field untilthe AlexNet, developed by Krizhevsky et al. (2012). In 2012 the AlexNet entered the ImageNetILSVRC (Russakovsky et al. 2015) challenge and demonstrated performance far in excess o