KISS: Backbones in Pytorch
Keep It Simple Stupid: CNN Backbones
Many modern CNNs (convolutional neural networks) use a backbone to extract feature maps as the first step in their architecture: for example, some flavour of Resnet50.
Helpfully, many of these backbones are implemented for us in torchvision, even with pre-trained models made available. This means we can often write simple code that looks like the following:
|
|
This makes it very easy to get started with building models that use the feature maps from the final layer of Resnet. However, in many cases, network architectures need to key into other layers from Resnet. One example is in the use of feature pyramids, where different scales are extracted from different layers of the backbone.
Many models implemented on Github will thus re-implement e.g. a Resnet backbone from scratch and copy it into their repo. However, there is an easier way!
As long as we trust the API stability of the torchvision
package (and we can pin the version of our dependencies to guarantee this) it’s possible to actually re-use the backbones provided by torch vision.
To do this, we simply create our own module holding the pretrained model internally, and then modify the forwards pass to call sequentially the same layers as the original model. The only difference is that we return more of the layers than the original
Here is an example of extracting multiple layers of feature maps from the torchvision resnet backbone without re-implement resnet from scratch.
|
|
In this way, we can spend a lot less time debugging our backbones and get on with solving the problem at hand. Furthermore, users of the code can be confident that the backbone is implemented in the same way across different models.
For a full worked example, this is the same technique I used in my implementation of FCOS, which uses a feature pyramid.