DeepLabCut benchmark

DeepLabCut benchmark#

For further information and the leaderboard, see the official homepage.

High Level API#

When implementing your own benchmarks, the most important functions are directly accessible under the deeplabcut.benchmark package.

deeplabcut.benchmark.evaluate(include_benchmarks: Container[str] | None = None, results: ResultCollection | None = None, on_error='return') → ResultCollection#

Run evaluation for all benchmarks and methods.

Note that in order for your custom benchmark to be included during evaluation, the following conditions need to be met:

The benchmark subclassed one of the benchmark definitions in in benchmark.benchmarks

The benchmark is registered by applying the @benchmark.register decorator to the class

The benchmark was imported. This is done automatically for all benchmarks that are defined in submodules or subpackages of the benchmark.submissions module. For all other locations, make sure to manually import the packages before calling the evaluate() function.

Args:

include_benchmarks:: If None, run all benchmarks that were discovered. If a container is passed, only include methods that were defined on benchmarks with the specified names. E.g., include_benchmarks = ["trimouse"] would only evaluate methods of the trimouse benchmark dataset.
on_error:: see documentation in benchmark.base.Benchmark.evaluate()

Returns:

A collection of all results, which can be printed or exported to pd.DataFrame or json file formats.

deeplabcut.benchmark.register(cls)#

Add a benchmark to the list of evaluations to run.

Apply this function as a decorator to a class. Note that the class needs to be a subclass of the benchmark.base.Benchmark base class.

In most situations, it will be a subclass of one of the pre-defined benchmarks in benchmark.benchmarks.

Throws:: ValueError if the decorator is applied to a class that is not a subclass of benchmark.base.Benchmark.

Available benchmark definitions#

See the official benchmark page for a full overview of the available datasets. A benchmark submission should contain a result for at least one of these benchmarks. For an example of how to implement a benchmark submission, refer to the baselines in the DeepLabCut benchmark repo.

Definition for official DeepLabCut benchmark tasks.

See benchmark.deeplabcut.org for a current leaderboard with models and metrics for each of these benchmarks. Submissions can be done by opening a PR in the benchmark reporistory:

DeepLabCut/benchmark

class deeplabcut.benchmark.benchmarks.FishBenchmark#

Bases: Benchmark

Dataset with multiple fish, filmed from top-view

Schools of inland silversides (Menidia beryllina, n=14 individuals per school) were recorded in the Lauder Lab at Harvard University while swimming at 15 speeds (0.5 to 8 BL/s, body length, at 0.5 BL/s intervals) in a flow tank with a total working section of 28 x 28 x 40 cm as described in previous work, at a constant temperature (18±1°C) and salinity (33 ppt), at a Reynolds number of approximately 10,000 (based on BL). Dorsal views of steady swimming across these speeds were recorded by high-speed video cameras (FASTCAM Mini AX50, Photron USA, San Diego, CA, USA) at 60-125 frames per second (feeding videos at 60 fps, swimming alone 125 fps). The dorsal view was recorded above the swim tunnel and a floating Plexiglas panel at the water surface prevented surface ripples from interfering with dorsal view videos. Five keypoints were labeled (tip, gill, peduncle, dorsal fin tip, caudal tip). 100 frames were labeled, making this a real-world sized laboratory dataset.

Introduced in Lauer et al. “Multi-animal pose estimation, identification and tracking with DeepLabCut.” Nature Methods 19, no. 4 (2022): 496-504.

Methods

`evaluate`(name[, on_error])	Evaluate this benchmark with all registered methods.
`get_predictions`()	Return predictions for all images in the benchmark.
`names`()	A unique key to describe this submission, e.g. the model name.

compute_pose_map
compute_pose_rmse

class deeplabcut.benchmark.benchmarks.MarmosetBenchmark#

Bases: Benchmark

Dataset with two marmosets.

All animal procedures are overseen by veterinary staff of the MIT and Broad Institute Department of Comparative Medicine, in compliance with the NIH guide for the care and use of laboratory animals and approved by the MIT and Broad Institute animal care and use committees. Video of common marmosets (Callithrix jacchus) was collected in the laboratory of Guoping Feng at MIT. Marmosets were recorded using Kinect V2 cameras (Microsoft) with a resolution of 1080p and frame rate of 30 Hz. After acquisition, images to be used for training the network were manually cropped to 1000 x 1000 pixels or smaller. The dataset is 7,600 labeled frames from 40 different marmosets collected from 3 different colonies (in different facilities). Each cage contains a pair of marmosets, where one marmoset had light blue dye applied to its tufts. One human annotator labeled the 15 marker points on each animal present in the frame (frames contained either 1 or 2 animals).

Introduced in Lauer et al. “Multi-animal pose estimation, identification and tracking with DeepLabCut.” Nature Methods 19, no. 4 (2022): 496-504.

Methods

`evaluate`(name[, on_error])	Evaluate this benchmark with all registered methods.
`get_predictions`()	Return predictions for all images in the benchmark.
`names`()	A unique key to describe this submission, e.g. the model name.

compute_pose_map
compute_pose_rmse

class deeplabcut.benchmark.benchmarks.ParentingMouseBenchmark#

Bases: Benchmark

Datasets with three mice, one parenting, two pups.

Parenting behavior is a pup directed behavior observed in adult mice involving complex motor actions directed towards the benefit of the offspring. These experiments were carried out in the laboratory of Catherine Dulac at Harvard University. The behavioral assay was performed in the homecage of singly housed adult female mice in dark/red light conditions. For these videos, the adult mice was monitored for several minutes in the cage followed by the introduction of pup (4 days old) in one corner of the cage. The behavior of the adult and pup was monitored for a duration of 15 minutes. Video was recorded at 30Hz using a Microsoft LifeCam camera (Part#: 6CH-00001) with a resolution of 1280 x 720 pixels or a Geovision camera (model no.: GV-BX4700-3V) also acquired at 30 frames per second at a resolution of 704 x 480 pixels. A human annotator labeled on the adult animal the same 12 body points as in the tri-mouse dataset, and five body points on the pup along its spine. Initially only the two ends were labeled, and intermediate points were added by interpolation and their positions was manually adjusted if necessary. All surgical and experimental procedures for mice were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and approved by the Harvard Institutional Animal Care and Use Committee. 542 frames were labeled, making this a real-world sized laboratory dataset.

Introduced in Lauer et al. “Multi-animal pose estimation, identification and tracking with DeepLabCut.” Nature Methods 19, no. 4 (2022): 496-504.

Methods

`evaluate`(name[, on_error])	Evaluate this benchmark with all registered methods.
`get_predictions`()	Return predictions for all images in the benchmark.
`names`()	A unique key to describe this submission, e.g. the model name.

compute_pose_map
compute_pose_rmse

class deeplabcut.benchmark.benchmarks.TriMouseBenchmark#

Bases: Benchmark

Datasets with three mice with a top-view camera.

Three wild-type (C57BL/6J) male mice ran on a paper spool following odor trails (Mathis et al 2018). These experiments were carried out in the laboratory of Venkatesh N. Murthy at Harvard University. Data were recorded at 30 Hz with 640 x 480 pixels resolution acquired with a Point Grey Firefly FMVU-03MTM-CS. One human annotator was instructed to localize the 12 keypoints (snout, left ear, right ear, shoulder, four spine points, tail base and three tail points). All surgical and experimental procedures for mice were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and approved by the Harvard Institutional Animal Care and Use Committee. 161 frames were labeled, making this a real-world sized laboratory dataset.

Introduced in Lauer et al. “Multi-animal pose estimation, identification and tracking with DeepLabCut.” Nature Methods 19, no. 4 (2022): 496-504.

Methods

`evaluate`(name[, on_error])	Evaluate this benchmark with all registered methods.
`get_predictions`()	Return predictions for all images in the benchmark.
`names`()	A unique key to describe this submission, e.g. the model name.

compute_pose_map
compute_pose_rmse

Metric calculation#

Evaluation metrics for the DeepLabCut benchmark.

deeplabcut.benchmark.metrics.calc_map_from_obj(eval_results_obj, h5_file, metadata_file, oks_sigma=0.1, margin=0, symmetric_kpts=None, drop_kpts=None)#: Calculate mean average precision (mAP) based on predictions.

deeplabcut.benchmark.metrics.calc_rmse_from_obj(eval_results_obj, h5_file, metadata_file, drop_kpts=None)#: Calc prediction errors for submissions.

deeplabcut.benchmark.metrics.conv_obj_to_assemblies(eval_results_obj, keypoint_names)#: Convert predictions to deeplabcut assemblies.

deeplabcut.benchmark.metrics.load_test_images(h5file: str, metadata: str) → List[str]#: Returns the names of the test images for the benchmark, in the order corresponding to the test indices.

DeepLabCut benchmark

Contents

DeepLabCut benchmark#

High Level API#

Available benchmark definitions#

Metric calculation#