Transfer Learning with Deep Convolutional Networks

Rodrigo Sierra Vargas
4 min readJun 27, 2020


Transfer Learning Using ResNet50 and CIFAR-10 –


Image classification has been widely studied since humans want to automate repetitive tasks and make them more accurate than ever. From that enormous work, we have now the possibility to use pre-trained models to perform classification on different data sets and leverage the previous achievements to build new models that can be trained faster and with far better results compared with those that can be obtained from models created from scratch. This work is intended to create a new model from a well known Convolutional Neural Network like VGG-16 [2] to classify images from the data set CIFAR-10 [1].


The main objective when taking a pretrained model is to extract and use features learned in the base model and adapt new layers that can specific to the new problem. In that aim, it is necessary to understand the base model to have clear which layers can contribute in the resolution of the new images, and how many and how big we need the new layers to learn the new features. Here is shown a series of trials with VGG-16 [4] network to improve transfer learning to classify CIFAR-10 images.

Materials and Methods

The following four techniques for transfer learning are commonly used to achieve good results:

  • The first is to freeze the source CNN’s weights and then remove the original fully connected layers and add a new fully connected layer to use the original weights for feature extraction.
  • The second technique is to fine-tune the top layers of the source CNN and freeze the bottom layers, assuming the bottom layers are very generic and can be used for any kind of image dataset [3].
  • The third technique is to fine-tune the entire network’s weights using a very small learning rate to avoid losing the source weights, then remove the last fully connected layers, and add another layer to suit the target dataset.
  • The last technique is to use the CNN’s original architecture without importing weights, which means, to initialize the weights from scratch. The point of this technique is using a well-known architecture that has been used with large datasets and performed well.

Another useful technique to increase the speed of the training is batch normalization, which leave features with an average of zero, and we can do this adding a layer after the base model with keras [6].


Trials were run in Kaggle’s Notebooks to take advantage of free GPU usage to accelerate trainings and have quicker results and improve the new network’s faster.

The first experiments were made following the first technique, using all the 5 blocks from the original VGG-16 model, and only replacing the last dense layer, ending up with next results:

Blue line (Training loss-accuracy) and Red line (Validation loss-accuracy)

Loss and Accuracy from VGG-16 with all layers

The final numbers for this training were:

loss: 0.2242 - accuracy: 0.9223 - val_loss: 0.5086 - val_accuracy: 0.8456

As we can see, the behavior of the model with the validation set must be improved since loss is growing and accuracy is lower at the end of the process.

After that, using the second technique, the last two layers of the base model were cut off and add the same last layers of the previous trial having the next results:

Blue line (Training loss-accuracy) and Red line (Validation loss-accuracy)

Loss and Accuracy from first three blocks of VGG-16

with the next final results:

loss: 0.0025 - accuracy: 0.9993 - val_loss: 0.6435 - val_accuracy: 0.8923

Details of this training can be seen in the next notebook:


The trials made show that we need to explore different characteristics of models when trying to make transfer learning due to the specificity that the features of the base model can have specially in the last layers, where has been proved that models resolve tasks, and store features that can lead us to poor performances in new sets of images.

Despite the second trial ended up with better results, the behavior of the validation loss is better in the first due to the fact that is closer to the training loss, but in the second after the 10th epoch validation loss increases linearly, which is not desireable.

Although the last model improve results of the first, there’s still work to do since 89% of validation accuracy is not the best result for transfer learning with VGG-16 over the CIFAR-10 data set, there are many notebooks in Kaggle and other platforms with validation accuracy greater than 95%. However, this can be the first iteration of other transfer learning tasks.

Literature cited: