Model training tips & tricks#

Limiting a GPU’s memory consumption#

All GPU memory is allocated to training by default, preventing other Tensorflow processes from being run on the same machine.

A flexible solution to limiting memory usage is to call deeplabcut.train(..., allow_growth=True), which dynamically grows the GPU memory region as it is needed. Another, stricter option is to explicitly cap GPU usage to only a fraction of the available memory. For example, allocating a maximum of 1/4 of the total memory could be done as follows:

import tensorflow as tf

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.25)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

Using custom image augmentation#

Image augmentation is the process of artificially expanding the training set by applying various transformations to images (e.g., rotation or rescaling) in order to make models more robust and more accurate (read our primer for more information). Although data augmentation is automatically accomplished by DeepLabCut, default values (see the augmentation variables in the default pose_cfg.yaml file) can be readily overwritten prior to training.

Another option we discuss is a different data-efficient approach based on a method called active learning. See this this blog post for further details.

When you create_training_dataset you have several options on what types of augmentation to use.

deeplabcut.create_training_dataset(configpath, augmenter_type='imgaug')

When you do this (i.e. pass augmenter_type) what underlying files you are calling are these: DeepLabCut/DeepLabCut You can look at what types of augmentation are available to you (or edit those files to add more). Moreover, you can add more options to the pose_cfg.yaml file. Here is a simple script you can modify and run to automatically edit the correct pose_cfg.yaml to add more augmentation to the imgaug loader (or open it and edit yourself).

But, you can add more:

import deeplabcut

train_pose_config, _ = deeplabcut.return_train_network_path(config_path)
augs = {
    "gaussian_noise": True,
    "elastic_transform": True,
    "rotation": 180,
    "covering": True,
    "motion_blur": True,
}
deeplabcut.auxiliaryfunctions.edit_config(
    train_pose_config,
    augs,
)

An in-depth tutorial on image augmentation and training hyperparameters can be found here.

Evaluating intermediate (and all) snapshots#

The latest snapshot stored during training may not necessarily be the one that yields the highest performance. Therefore, you should analyze ALL snapshots, and select the best. Put ‘all’ in the snapshots section of the config.yaml to do this.

What neural network should I use? (Trade offs, speed performance, and considerations)#

With the release of even more network options, you now have to decide what to use! This additionally flexibility is hopefully helpful, but we want to give you some guidance on where to start.#

TL;DR - your best performance for most everything is ResNet-50; MobileNetV2-1 is much faster, needs less memory on your GPU to train and nearly as accurate.

You always select the network type when you create a training data set: i.e., standard dlc: deeplabcut.create_training_dataset(config, net_type=resnet_50) , or maDLC: deeplabcut.create_multianimaltraining_dataset(config, net_type=dlcrnet_ms5). There is nothing else you should change.


ResNets:#

In Mathis et al. 2018 we benchmarked three networks: ResNet-50, ResNet-101, and ResNet-101ws. For ALL lab applications, ResNet-50 was enough. For all the demo videos on www.deeplabcut.org the backbones are ResNet-50’s. Thus, we recommend making this your go-to workhorse for data analysis. Here is a figure from the paper, see panel “B” (they are all within a few pixels of each other on the open-field dataset):

This is also one of the main result figures, generated with ResNet-50. BLUE is training - RED is testing - BLACK is our best human-level performance, and 10 pixels is the width of the mouse nose -so anything under that is good performance for us on this task!

Here are also some speed stats for analyzing videos with ResNet-50, see https://www.biorxiv.org/content/early/2018/10/30/457242 for more details:

So, why use a ResNet-101 or even 152? if you have a much more challenging problem, like multiple humans dancing, this is a good option. You should then also set intermediate_supervision=True for best performance in the pose_config.yaml of that shuffle folder ( before you train). Note, for ResNet-50 this does NOT help, and can hurt.

When should I use a MobileNet?#

MobileNets are fast to run, fast to train, more memory efficient, and faster for analysis (inference) - e.g. on CPUs they are 4 times faster, on GPUs up to 2x! So, if you don’t have a GPU (or a GPU with little memory), and don’t want to use Google COLAB, etc, then these are a great starting point.

They are smaller/shallower networks though, so you don’t want to be pushing in very large images. So, be sure to use deeplabcut.DownSampleVideo on your data (which is frankly never a bad idea).

Additionally, these are good options for running on “live” videos, i.e. if you want to give real-time feedback in an experiment, you can run a video around a smaller cropped area, and run this rather fast!

So, how fast are they?

Here are comparisons of 4 MobileNetV2 variants to ResNet-50 and ResNet-101 (darkest red): read more here: https://arxiv.org/abs/1909.11229

When should I use an EfficientNet?#

Built with inverse residual blocks like MobileNets, but more powerful than ResNets, due to optimal depth/width/resolution scaling, EfficientNet are an excellent choice if you want speed and performance. They do require more careful handling though! Especially for small datasets, you will need to tune the batch size and learning rates. So, we suggest these for more advanced users, or those willing to run experiments to find the best settings. Here is the speed comparison, and for performance see our latest work at: http://horse10.deeplabcut.org

How can I compare them?#

Great question! So, the best way to do this is to use the same test/train split (that is generated in create_training_dataset) with different models. Here, as of 2.1+, we have a new function that lets you do this easily. Instead of using create_training_dataset you will run create_training_model_comparison (see the docstrings by deeplabcut.create_training_model_comparison? or run the Project Manager GUI - deeplabcut.launch_dlc()- for assistance.