DeepLabCut Model Zoo: SuperAnimal models

DeepLabCut Model Zoo: SuperAnimal models#

alt text

🦄 SuperAnimal in DeepLabCut PyTorch! 🔥#

This notebook demos how to use our SuperAnimal models within DeepLabCut 3.0! Please read more in Ye et al. Nature Communications 2024 about the available SuperAnimal models, and follow along below!

Let’s get going: install the latest version of DeepLabCut into COLAB:#

Also, be sure you are connected to a GPU: go to menu, click Runtime > Change Runtime Type > select “GPU”

!pip install --pre deeplabcut

PLEASE, click “restart runtime” from the output above before proceeding!

import os
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image

import deeplabcut
import deeplabcut.utils.auxiliaryfunctions as auxiliaryfunctions
from deeplabcut.pose_estimation_pytorch.apis import (
    superanimal_analyze_images,
)
from deeplabcut.modelzoo import build_weight_init
from deeplabcut.modelzoo.utils import (
    create_conversion_table,
    read_conversion_table_from_csv,
)
from deeplabcut.modelzoo.video_inference import video_inference_superanimal
from deeplabcut.utils.pseudo_label import keypoint_matching

Zero-shot Image & Video Inference#

SuperAnimal models are foundation animal pose models. They can be used for zero-shot predictions without further training on the data. In this section, we show how to use SuperAnimal models to predict pose from images (given an image folder) and output the predicted images (with pose) into another destination folder.

Zero-shot image inference#

If you have a single Image you want to test, upload it here!

Upload the images you want to predict#

from google.colab import files

uploaded = files.upload()
for filepath, content in uploaded.items():
    print(f"User uploaded file '{filepath}' with length {len(content)} bytes")
image_path = os.path.abspath(filepath)
image_name = os.path.splitext(image_path)[0]

# If this cell fails (e.g., when using Safari in place of Google Chrome),
# manually upload your video via the Files menu to the left
# and define `image_path` yourself with right click > copy path on the image:
#
# image_path = "/path/to/my/image.png"
# image_name = os.path.splitext(image_path)[0]

Select a SuperAnimal name and corresponding model architecture#

Check Our Docs on SuperAnimals to learn more!

# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_topviewmouse" #@param ["superanimal_topviewmouse", "superanimal_quadruped"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

# @markdown ---
# @markdown What is the maximum number of animals you expect to have in an image
max_individuals = 3  # @param {type:"slider", min:1, max:30, step:1}

# Note you need to enter max_individuals correctly to get the correct number of predictions in the image.
_ = superanimal_analyze_images(
    superanimal_name,
    model_name,
    detector_name,
    image_path,
    max_individuals,
    out_folder="/content/",
)

Zero-shot Video Inference#

This can be done with or without video adaptation (faster, but not self-supervised fine-tuned on your data!).

Upload a video you want to predict#

from google.colab import files

uploaded = files.upload()
for filepath, content in uploaded.items():
    print(f"User uploaded file '{filepath}' with length {len(content)} bytes")
video_path = os.path.abspath(filepath)
video_name = os.path.splitext(video_path)[0]

# If this cell fails (e.g., when using Safari in place of Google Chrome),
# manually upload your video via the Files menu to the left
# and define `video_path` yourself with right click > copy path on the video.

Choose the superanimal and the model name#

# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_topviewmouse" #@param ["superanimal_topviewmouse", "superanimal_quadruped"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

# @markdown ---
# @markdown What is the maximum number of animals you expect to have in an image
max_individuals = 3  # @param {type:"slider", min:1, max:30, step:1}

Zero-shot Video Inference without video adaptation#

The labeled video (and pose predictions for the video) are saved in "/content/", with the labeled video name being {your_video_name}_superanimal_{superanimal_name}_hrnetw32_labeled.mp4.

_ = video_inference_superanimal(
    videos=video_path,
    superanimal_name=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    video_adapt=False,
    max_individuals=max_individuals,
    dest_folder="/content/",
)

Zero-shot Video Inference with video adaptation (unsupervised)#

The labeled video (and pose predictions for the video) are saved in "/content/", with the labeled video name being {your_video_name}_superanimal_{superanimal_name}_hrnetw32_labeled_after_adapt.mp4.

_ = video_inference_superanimal(
    videos=[video_path],
    superanimal_name=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    video_adapt=True,
    max_individuals=max_individuals,
    pseudo_threshold=0.1,
    bbox_threshold=0.9,
    detector_epochs=1,
    pose_epochs=1,
    dest_folder="/content/"
)

Training with SuperAnimal#

In this section, we compare different ways to train models in DeepLabCut 3.0, with or without using SuperAnimal-pretrained models. You can compare the evaluation results and get a sense of each baseline. We have following baselines:

ImageNet transfer learning (training without superanimal)
SuperAnimal transfer learning (baseline 1)
SuperAnimal naive fine-tuning (baseline 2)
SuperAnimal memory-replay fine-tuning (baseline3)

This is done on one of your DeepLabCut projects! If you don’t have a DeepLabCut project that you can use SuperAnimal models with, you can always using the example openfield dataset available in the DeepLabCut repository or the Tri-Mouse dataset available on Zenodo.

Preparing the DeepLabCut Project#

First, place your DeepLabCut project folder into you google drive! “i.e. move the folder named “Project-YourName-TheDate” into Google Drive.

# Now, let's link to your GoogleDrive. Run this cell and follow the
# authorization instructions:

from google.colab import drive
drive.mount('/content/drive')

You will need to edit the project path in the config.yaml file to be set to your Google Drive link!

Typically, this will be in the format: /content/drive/MyDrive/yourProjectFolderName. You can obtain this path by going to the file navigator in the left pane, finding your DeepLabCut project folder, clicking on the vertical ... next to the folder name and selecting “Copy path”.

If the drive folder is not immediately visible after mounting the drive, refresh the available files!

# TODO: Update the `project_path` to be the path of your DeepLabCut project!
project_path = Path("/content/drive/MyDrive/my-project-2024-07-17")
config_path = str(project_path / "config.yaml")

Then, use the panel below to select the appropriate SuperAnimal model for your project (don’t forget to run the cell)!

# @markdown ---
# @markdown SuperAnimal Configurations
superanimal_name = "superanimal_topviewmouse" #@param ["superanimal_topviewmouse", "superanimal_quadruped"]
model_name = "hrnet_w32" #@param ["hrnet_w32", "resnet_50"]
detector_name = "fasterrcnn_resnet50_fpn_v2" #@param ["fasterrcnn_resnet50_fpn_v2", "fasterrcnn_mobilenet_v3_large_fpn"]

Comparison between different training baselines#

Definition of data split: the unique combination of training images and testing images. We create a data split named split 0. All baselines will share the data split to make fair comparisons.

split 0 -> shared by all baselines
shuffle 0 (split0) -> imagenet transfer learning
shuffle 1 (split0) -> superanimal transfer learning
shuffle 2 (split0) -> superanimal naive fine-tuning
shuffle 3 (split0) -> superanimal memory-replay fine-tuning

What is the difference between baselines?#

Transfer learning For canonical task-agnostic transfer learning, the encoder learns universal visual features from a large pre-training dataset, and a randomly initialized decoder is used to learn the pose from the downstream dataset.

Fine-tuning For task aware fine-tuning, both encoder and decoder learn task-related visual-pose features in the pre-training datasets, and the decoder is fine-tuned to update pose priors in downstream datasets. Crucially, the network has pose-estimation-specific weights

ImageNet transfer-learning The encoder was pre-trained from ImageNet. The decoder is trained from scratch in the downstream tasks

SuperAnimal transfer-learning The encoder was pre-trained first from ImageNet, then in pose datasets we colleceted. Then decoder is trained from scratch in downstream tasks.

SuperAnimal naive fine-tuning Both the encoder and the decoder were pre-trained in pose datasets we collected. In downstream datasets, we only finetune convolutional channels that correspond to the annotated keypoints in the downstream datasets. This introduces catastrophic forgetting in keypoints that are not annotated in the downstream datasets.

SuperAnimal memory-replay fine-tuning If we apply fine-tuning with SuperAnimal without further cares, the models will forget about keypoints that are not annotated in the downstream datasets. To mitigate this, we mix the annotations and zero-shot predictions of SuperAnimal models to create a dataset that ‘replays’ the memory of the SuperAnimal keypoints.

imagenet_transfer_learning_shuffle = 0
superanimal_transfer_learning_shuffle = 1
superanimal_naive_finetune_shuffle = 2
superanimal_memory_replay_shuffle = 3

deeplabcut.create_training_dataset(
    config_path,
    Shuffles=[imagenet_transfer_learning_shuffle],
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    engine=deeplabcut.Engine.PYTORCH,
    userfeedback=False,
)

ImageNet transfer learning#

Historically, the transfer learning using ImageNet weights strategies assumed no “animal pose task priors” in the pretrained model, a paradigm adopted from previous task-agnostic transfer learning.

You can change the number of epochs you want to train for. How long training will take depends on many parameters, including the number of images in your dataset, the resolution of the images, and the number of epochs you train for.

# Note we skip the detector training to save time.
# For Top-Down models, the evaluation is by default using ground-truth bounding
#  boxes. But to train a model that can be used to inference videos and images,
#  you have to set detector_epochs > 0.

deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=imagenet_transfer_learning_shuffle,
)

Now let’s evaluate the performance of our trained models.

deeplabcut.evaluate_network(config_path, Shuffles=[imagenet_transfer_learning_shuffle])

Transfer learning with SuperAnimal weights#

First, we prepare training shuffle for transfer-learning with SuperAnimal weights. As we’ve already create a shuffle with a train/test split that we want to reuse, we use deeplabcut.create_training_dataset_from_existing_split to keep the same train/test indices as in the ImageNet transfer learning shuffle.

We specify that we want to initialize the model weights with the selected SuperAnimal model, but without keeping the decoding layers (this is called transfer learning)!

weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path), 
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=False,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_transfer_learning_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

Then, we launch the training for transfer-learning with SuperAnimal weights.

deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_transfer_learning_shuffle,
)

Finally, we evaluate the model obtained by transfer-learning with SuperAnimal weights.

deeplabcut.evaluate_network(config_path, Shuffles=[superanimal_transfer_learning_shuffle])

Fine-tuning with SuperAnimal (without keeping full SuperAnimal keypoints)#

Setup the weight init and dataset#

First we do keypoint matching. This steps make it possible to understand the correspondence between the existing annotations and SuperAnimal annotations. This step produces 3 outputs

The confusion matrix
The conversion table
Pseudo predictions over the whole dataset

What is keypoint matching?#

Because SuperAnimal models have their pre-defined keypoints that are potentially different from your annotations, we proposed this algorithm to minimize the gap between the model and the dataset. We use our model to perform zero-shot inference on the whole dataset. This gives pairs of predictions and ground truth for every image. Then, we cast the matching between models’ predictions (2D coordinates) and ground truth as bipartitematching using the Euclidean distance as the cost between paired of keypoints. We then solve the matching using the Hungarian algorithm. Thus for every image, we end up getting a matching matrix where 1 counts formatch and 0 counts for non-matching. Because the models’ predictions can be noisy from image to image, we average the aforementioned matching matrix across all the images and perform another bipartite matching, resulting in the final keypoint conversion table between the model and the dataset. Note that the quality of thematching will impact the performance of the model, especially for zero-shot. In the case where, e.g., the annotation nose is mistakenly converted to keypoint tail and vice versa, the model will have to unlearn the channel that corresponds to nose and tail (see also case study in Mathis et al.).

keypoint_matching(
    config_path,
    superanimal_name,
    model_name,
    detector_name,
    copy_images=True,
)

conversion_table_path = project_path / "memory_replay" / "conversion_table.csv"
confusion_matrix_path = project_path / "memory_replay" / "confusion_matrix.png"

# You can visualize the pseudo predictions, or do pose embedding clustering etc.
pseudo_prediction_path = project_path / "memory_replay" / "pseudo_predictions.json"

Display the confusion matrix#

The x axis lists the keypoints in the existing annotations. The y axis lists the keypoints in SuperAnimal keypoint space. Darker color encodes stronger correspondence between the human annotation and SuperAnimal annotations.

confusion_matrix_image = Image.open(confusion_matrix_path)

plt.imshow(confusion_matrix_image)
plt.axis('off')  # Hide the axes for better view
plt.show()

Display the conversion table#

The gt columns represents the keypoint names in the existing dataset. The MasterName represents the corresponding keypoints in SuperAnimal keypoint space.

df = pd.read_csv(conversion_table_path)
df = df.dropna()

df

Adding the Conversion Table to your project’s `config.yaml` file#

Once you’ve run keypoint matching, you can add the conversion table to your project’s config.yaml file, and edit it if there are some matches you think are wrong. As an example, for a top-view mouse dataset with 4 bodyparts labeled ('snout', 'leftear', 'rightear', 'tailbase'), the conversion table mapping project bodyparts to SuperAnimal bodyparts would be added as:

# Conversion tables to fine-tune SuperAnimal weights
SuperAnimalConversionTables:
  superanimal_topviewmouse:
    snout: nose
    leftear: left_ear
    rightear: right_ear
    tailbase: tail_base

create_conversion_table(
    config=config_path,
    super_animal=superanimal_name,
    project_to_super_animal=read_conversion_table_from_csv(
        conversion_table_path
    ),
)

Prepare the training shuffle and weight initialization for (naive) fine-tuning with SuperAnimal weights#

Then, when you call build_weight_init with with_decoder=True, the conversion table in your project’s config.yaml is used to get predictions for the correct bodyparts.

weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path), 
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=True,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_naive_finetune_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

Launch the training for (naive) fine-tuning with SuperAnimal#

deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_naive_finetune_shuffle,
)

Evaluate the model obtained by (naive) fine-tuning with SuperAnimal#

deeplabcut.evaluate_network(
    config_path,
    Shuffles=[superanimal_naive_finetune_shuffle],
)

Memory-replay fine-tuning with SuperAnimal (keeping full SuperAnimal keypoints)#

Catastrophic forgetting describes a classic problemin continual learning. Indeed, amodel gradually loses its ability to solve previous tasks after it learns to solve new ones. Fine-tuning a SuperAnimal models falls into the category of continual learning: the downstream dataset defines potentially different keypoints than those learned by the models. Thus, the models might forget the keypoints they learned and only pick up those defined in the target dataset. Here, retraining with the original dataset and the new one, is not a feasible option as datasets cannot be easily shared and more computational resources would be required. To counter that, we treat zero-shot inference of the model as a memory buffer that stores knowledge from the original model. When we fine-tune a SuperAnimal model, we replace the model predicted keypoints with the ground-truth annotations, resulting in hybrid learning of old and new knowledge. The quality of the zero-shot predictions can vary and we use the confidence of prediction (0.7) as a threshold to filter out low-confidence predictions. With the threshold set to 1, memory replay fine-tuning becomes naive-fine-tuning.

Prepare training shuffle and weight initialization for memory-replay finetuning with SuperAnimal#

weight_init = build_weight_init(
    cfg=auxiliaryfunctions.read_config(config_path), 
    super_animal=superanimal_name,
    model_name=model_name,
    detector_name=detector_name,
    with_decoder=True,
    memory_replay=True,
)

deeplabcut.create_training_dataset_from_existing_split(
    config_path,
    from_shuffle=imagenet_transfer_learning_shuffle,
    shuffles=[superanimal_memory_replay_shuffle],
    engine=deeplabcut.Engine.PYTORCH,
    net_type=f"top_down_{model_name}",
    detector_type=detector_name,
    weight_init=weight_init,
    userfeedback=False,
)

Launch the training for memory-replay fine-tuning with SuperAnimal#

deeplabcut.train_network(
    config_path,
    detector_epochs=0,
    epochs=50,
    save_epochs=10,
    batch_size=64,  # if you get a CUDA OOM error when training on a GPU, reduce to 32, 16, ...!
    displayiters=10,
    shuffle=superanimal_memory_replay_shuffle,
)

Evaluate the model obtained by memory-replay finetuning with SuperAnimal#

deeplabcut.evaluate_network(config_path, Shuffles=[superanimal_memory_replay_shuffle])