Understanding Deep Learning Convolutional Neural Network
This tutorial is echoing a post of our dear partner Tan Chin Luh on Linkedin:
--
I believe a lot of you might not agree to use software like Scilab, Matlab or Octave for Deep-Learning, which I agree to a certain extent. Tools like Theano, torch or tensorflow are much better in deep-learning network, for their capabilities of using GPU, flow-chart like programming concept, and also strong supports at the back-end.
However, Scilab could be good for understanding the basic of deep-learning network and also to create quick prototypes for a system. In this post, I will share some Scilab codes to create a simple CNN, and implement it in a GUI to detect handwriting in an image.
The zip file above contains Scilab scripts for creating CNN. The codes are modified from the DeepLearning Toolbox by Rasmus Berg Palm. Do take note that this module is outdated and no longer maintained by the author as the tools such as the few mentioned above are much better for this purpose. Again, I convert this module to Scilab purely for understanding CNN better, from scratch, how the convolutions for each layers work, how the feed-forward and back-propagation works to tune the kernels or filter coefficients to make the network usable. A good explanation for the original toolbox could be found at the article : Understanding the DeepLearnToolbox CNN Example by Chris McCormick. As the article explained the codes well, I would just add in some missing parts in this post as the complement of the above article.
Training The CNN
The provided example is using MNIST dataset Please be noted that the training run slow, it might take up to minutes for 1 epoch, and hours for multiple epochs. In the zip file, I attached the CNN which has been trained for 1 epoch and 33 epochs for testing purpose.
With IPCV module, we could visualize the CNN filters in both spatial and frequency domains.
Figure 1: CNN layer 2 filter coefficients in frequency domain - initial values assigned randomly (left), after 1 epoch for all data (middle), and after 33 epochs for all data (right).
Figure 2: CNN layer 4 filter coefficients in frequency domain - initial values assigned randomly (left), after 1 epoch for all data (middle), and after 33 epochs for all data (right).
Running The Handwriting Digits Recognition GUI
Figure 3 : A GUI to load the pre-trained CNN, image, and trying to detect the recognize the digits in the image
There are 2 pre-trained CNN provided in the zip file, one trained for 1 epoch, another trained for 33 epochs. Also, there are 2 sample images, test1.jpg and test2.jpg. You would notice that the test2.jpg as shown in figure 3 works perfectly, while the test1.jpg would work if you increase the pre-process parameters to around 7 in the edit box. This is actually the dilation structural element size for the thin handwriting. For more reading on the dilation operation or other morphological operations in Scilab, you could refer to the IPCV blogs.