Why deep learning performs better than classical machine learning approaches

25 junio, 2020 Artzai Picón Ruiz


During the latest years, deep learning techniques have demonstrated their capability on surpassing traditional machine learning methods on performing complex pattern recognition tasks. In this post we will try to explain the reasons for that.

Traditional machine learning paradigm is based on feature extraction and feature selection. These features are normally designed by experts with a good domain knowledge. In this sense, computer vision based machine learning systems made use of, for example, textural features such as Gabor filter banks, Local Binary Patterns (LBP) in order to extract textural information or histogram based features or color model transformations in order to describe the color on the image.

For example, in our seminal work for [plant disease identification], we extracted both textural (LBP) and color (mean and variance of the Lab color model) of automatically extracted candidate regions that appeared to be damaged. This algorithm required to manually design disease visual descriptors that could transform the high dimensionality of an image into a set of numbers to describe each blob of the image that might be susceptible of having a disease.

As the reader can imagine, adding new diseases started to be gradually more complicated and, besides this, the descriptive level of the designed descriptors was not strong enough to deal with more complex or subtle diseases.

We will detail the evolution of plant disease algorithms on a future post. By now, just it is worth mentioning that our plant disease algorithm diagram before moving to deep learning needed a DIN-A1 size paper to be printed and obtained 0.80 balanced accuracy for three different diseases whereas, when moving it into [deep learning system], we greatly simplified the algorithm diagram and obtained 0.91 balanced accuracy. Now we are achieving 0.98 accuracy for this task.

Let’s go back to the main question: why, domain tailored features, together with strong machine learning algorithms such as XGBoost, SVM or Random Forests cannot compete with deep learning feature learning representation capabilities?

The answer is on the primal definition of deep learning. According to Yoshua Bengio, “deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. Automatically learning features at multiple levels of abstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features.”. This means that deep learning methods can learn the most relevant features for the requested task and thus, be able to solve the problem more optimally.

In the case of natural images, the [ImageNet] dataset consists of more than 1000 classes and more than 1000 images per class and the classification task. This dataset aims to identify which of the 1000 classes is present on the image. Under this task training, the features that were automatically learnt by the model are, at least, curious.

On the figure below you can see the filters that are learnt by a convolutional neural network when it is being trained to fulfill the ImageNet task. As it can be seen, the filters extracted from the first network correspond to image edge detection filters and color opposition channels that remind us the opponent processes theory and frequency analysis performed on the visual cortex. Moreover, the filters learned by the network resemble the Gabor Filter Banks method for texture detection and Lab color channel that were used on classical computer vision systems to manually design visual features to feed machine learning models.

The capability of the stacked convolutional layers of a Deep Neural Network for extracting filters that can hierarchically describe a given input data to fulfil a given task can be applied to many other data different from images. Recently we have published a [work] where we analysed the benefits of applying a deep learning approach rather than a machine learning one to analyse the electrocardiogram signal of a patient presenting a cardiac arrest to identify if their ECG signal is suitable for defibrillation by an automated external defibrillator (AED).

In our study, we repeated a previous study that analysed the best combination of classical signal features that had been defined during the last decades for shockable rhythms identification by many research groups and companies working on resuscitation. These manually designed features had been designed to detect humanly detected features such as waveform irregularity, the absence of narrow or wide QRS complexes (the ECG waveform deflections associated to ventricular contraction), smaller bandwidths, and higher ventricular rates among others.

We decided to, instead of using the pre-existing classical features, to design a new Deep Neural Network based of Convolutional layers to automatically learn the filters that are optimal to solve our problem and recurrent layers to model the temporal dependences of the extracted filters.

The first aspect of interest we found was that, the deep learning method surpassed, by far, the accuracy of other methods requiring less time to achieve a correct precision. This was of even more importance for the detection of out-of-hospital cardiac arrests that present much more complicated data due to emergency services non immediate arrival.

But the most important finding was founded when analysing the separability of the high-level description (embedding) that the Deep Neural Network had learnt during the training process. This is clearly seen on the figure 3. When we project the 20 tailored features that were selected as the best ones for machine learning approach into a 2D map and compare them with the 20 features automatically extracted by the Deep Neural Network, we can appreciate that the DNN extracted features present a greater separability than the crafted images.


The great descriptive power of DNN architectures and their capabilities to hierarchically describe complex and unstructured data into higher level of abstraction make them capable of extracting features that are optimized to fulfil the given task.

These algorithms are able to extract those features best fulfilling the task without human intervention which might be of critical importance: 1) Some months ago, MIT publish a  [paper] for breast cancer detection where they have used mammography images that had already been analysed by doctors. However, to train the DNN network they do not use as ground-truth diagnosis the radiologist report but the patient health status 5 years later.

The generated models extracted patterns and a high-level description of the image that allows diagnosing cancer risk 5 years before the diagnostics could me made. This is impossible with hand-tailored features that are based on previous human knowledge. In a totally different scenario, the reinforcement learning systems (also deep learning) that beats humans on playing chess, Go or Starcraft learn and exploit totally unexpected strategies for humans making them unbeatable.

Deep learning techniques allow automatically extracting the best internal representation of the data for the given tasks. This representation can exploit information currently unknown or ignored by humans with high flexibility and performance.

The most amazing thing is that this is not only valid for the Agricultural image domain and medical signal domain but for any other domain such as Industry, Quality Control among others.

You can find more details.

Sobre Artzai Picón Ruiz

Dr. Artzai Picón (Ingeniero Industrial (2002), Ingeniero Biomédico (2012), Doctor europeo por la Universidad del País Vasco (2009). Es investigador principal en TECNALIA en el ámbito de la visión e inteligencia artificial tanto en proyectos internacionales de investigación como en desarrollos de investigación aplicada para la empresa a medida. Su actividad investigadora se centra en el desarrollo e implantación real de tecnologías de inteligencia artificial especialmente en el ámbito de la fotónica (visión artificial, análisis espectral, espectroscopía remota,…) basadas en técnicas de aprendizaje profundo y modelos bayesianos.

Ha generado numerosas publicaciones además de diez patentes, cinco de las cuales se encuentran explotadas por empresas internacionales. En 2003 obtiene el premio mejor proyecto fin de carrera por el desarrollo de un sistema de reconocimiento facial. En 2006 recibió el premio internacional ONCE de investigación en nuevas tecnologías para personas con discapacidad visual. En 2007 trabajó en el Centre for Image Processing and Analysis (CIPA) donde finalizó gran parte de su tesis doctoral en el área de procesamiento avanzado y segmentación de imágenes vectoriales, concretamente en imágenes hiperespectrales.

En 2011 la aplicación industrial de los desarrollos realizados en la tesis reciben el tercer premio europeo EARTO a la innovación mientras que en 2014, el desarrollo de un sistema inteligente de búsqueda de imágenes biológicas del que forma parte obtiene el primer premio EARTO.