holocron.models¶
The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification.
The following datasets are available:
Classification¶
Classification models expect a 4D image tensor as an input (N x C x H x W) and returns a 2D output (N x K). The output represents the classification scores for each output classes.
import holocron.models as models
darknet19 = models.darknet19(num_classes=10)
ResNet¶
- holocron.models.resnet18(pretrained=False, progress=True, **kwargs)[source]¶
ResNet-18 from “Deep Residual Learning for Image Recognition”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnet34(pretrained=False, progress=True, **kwargs)[source]¶
ResNet-34 from “Deep Residual Learning for Image Recognition”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnet50(pretrained=False, progress=True, **kwargs)[source]¶
ResNet-50 from “Deep Residual Learning for Image Recognition”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnet101(pretrained=False, progress=True, **kwargs)[source]¶
ResNet-101 from “Deep Residual Learning for Image Recognition”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnet152(pretrained=False, progress=True, **kwargs)[source]¶
ResNet-152 from “Deep Residual Learning for Image Recognition”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnext50_32x4d(pretrained=False, progress=True, **kwargs)[source]¶
ResNeXt-50 from “Aggregated Residual Transformations for Deep Neural Networks”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.resnext101_32x8d(pretrained=False, progress=True, **kwargs)[source]¶
ResNeXt-101 from “Aggregated Residual Transformations for Deep Neural Networks”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.rexnet1_0x(pretrained=False, progress=True, **kwargs)[source]¶
ReXNet-1.0x from “ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.rexnet1_3x(pretrained=False, progress=True, **kwargs)[source]¶
ReXNet-1.3x from “ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.rexnet1_5x(pretrained=False, progress=True, **kwargs)[source]¶
ReXNet-1.5x from “ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.rexnet2_0x(pretrained=False, progress=True, **kwargs)[source]¶
ReXNet-2.0x from “ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.rexnet2_2x(pretrained=False, progress=True, **kwargs)[source]¶
ReXNet-2.2x from “ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network”
- Parameters:
- Returns:
classification model
- Return type:
Res2Net¶
- class holocron.models.Res2Net(block, layers, num_classes=1000, zero_init_residual=False, groups=1, width_per_group=26, scale=4, replace_stride_with_dilation=None, norm_layer=None)[source]¶
Implements a Res2Net model as described in https://arxiv.org/pdf/1904.01169.pdf
- Parameters:
block (torch.nn.Module) – class constructor to be used for residual blocks
layers (list<python:int>) – layout of layers
num_classes (int) – number of output classes
zero_init_residual (bool) – whether the residual connections should be initialized at zero
groups (int) – number of convolution groups
width_per_group (int) – number of channels per group
scale (int) – scaling ratio within blocks
replace_stride_with_dilation (list<bool>) – whether stride should be traded for dilation
norm_layer (torch.nn.Module) – norm layer to be used
- holocron.models.res2net(depth, num_classes, width_per_group=26, scale=4, pretrained=False, progress=True, **kwargs)[source]¶
Instantiate a Res2Net model
- Parameters:
depth (int) – depth of the model
num_classes (int) – number of output classes
scale (int) – number of branches for cascade convolutions
pretrained (bool) – whether the model should load pretrained weights (ImageNet training)
progress (bool) – whether a progress bar should be displayed while downloading pretrained weights
**kwargs – optional arguments of torchvision.models.resnet.ResNet
- Returns:
loaded Pytorch model
- Return type:
model (torch.nn.Module)
Res2NeXt¶
- holocron.models.res2next(depth, num_classes, width_per_group=4, scale=4, pretrained=False, progress=True, **kwargs)[source]¶
Instantiate a Res2NeXt model
- Parameters:
depth (int) – depth of the model
num_classes (int) – number of output classes
scale (int) – number of branches for cascade convolutions
pretrained (bool) – whether the model should load pretrained weights (ImageNet training)
progress (bool) – whether a progress bar should be displayed while downloading pretrained weights
**kwargs – optional arguments of torchvision.models.resnet.ResNet
- Returns:
loaded Pytorch model
- Return type:
model (torch.nn.Module)
Darknet¶
- holocron.models.darknet24(pretrained=False, progress=True, **kwargs)[source]¶
Darknet-24 from “You Only Look Once: Unified, Real-Time Object Detection”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.darknet19(pretrained=False, progress=True, **kwargs)[source]¶
Darknet-19 from “YOLO9000: Better, Faster, Stronger”
- Parameters:
- Returns:
classification model
- Return type:
- holocron.models.darknet53(pretrained=False, progress=True, **kwargs)[source]¶
Darknet-53 from “YOLOv3: An Incremental Improvement”
- Parameters:
- Returns:
classification model
- Return type:
Object Detection¶
Object detection models expect a 4D image tensor as an input (N x C x H x W) and returns a list of dictionaries. Each dictionary has 3 keys: box coordinates, classification probability, classification label.
import holocron.models as models
yolov2 = models.yolov2(num_classes=10)
YOLO¶
- holocron.models.yolov1(pretrained=False, progress=True, pretrained_backbone=True, **kwargs)[source]¶
YOLO model from “You Only Look Once: Unified, Real-Time Object Detection”.
YOLO’s particularity is to make predictions in a grid (same size as last feature map). For each grid cell, the model predicts classification scores and a fixed number of boxes (default: 2). Each box in the cell gets 5 predictions: an objectness score, and 4 coordinates. The 4 coordinates are composed of: the 2-D coordinates of the predicted box center (relative to the cell), and the width and height of the predicted box (relative to the whole image).
For training, YOLO uses a multi-part loss whose components are computed by:
\[\mathcal{L}_{coords} = \sum\limits_{i=0}^{S^2} \sum\limits_{j=0}^{B} \mathbb{1}_{ij}^{obj} \Big[ (x_{ij} - \hat{x}_{ij})² + (y_{ij} - \hat{y}_{ij})² + (\sqrt{w_{ij}} - \sqrt{\hat{w}_{ij}})² + (\sqrt{h_{ij}} - \sqrt{\hat{h}_{ij}})² \Big]\]where \(S\) is size of the output feature map (7 for an input size \((448, 448)\)), \(B\) is the number of anchor boxes per grid cell (default: 2), \(\mathbb{1}_{ij}^{obj}\) equals to 1 if a GT center falls inside the i-th grid cell and among the anchor boxes of that cell, has the highest IoU with the j-th box else 0, \((x_{ij}, y_{ij}, w_{ij}, h_{ij})\) are the coordinates of the ground truth assigned to the j-th anchor box of the i-th grid cell, and \((\hat{x}_{ij}, \hat{y}_{ij}, \hat{w}_{ij}, \hat{h}_{ij})\) are the coordinate predictions for the j-th anchor box of the i-th grid cell.
\[\mathcal{L}_{objectness} = \sum\limits_{i=0}^{S^2} \sum\limits_{j=0}^{B} \Big[ \mathbb{1}_{ij}^{obj} \Big(C_{ij} - \hat{C}_{ij} \Big)^2 + \lambda_{noobj} \mathbb{1}_{ij}^{noobj} \Big(C_{ij} - \hat{C}_{ij} \Big)^2 \Big]\]where \(\lambda_{noobj}\) is a positive coefficient (default: 0.5), \(\mathbb{1}_{ij}^{noobj} = 1 - \mathbb{1}_{ij}^{obj}\), \(C_{ij}\) equals the Intersection Over Union between the j-th anchor box in the i-th grid cell and its matched ground truth box if that box is matched with a ground truth else 0, and \(\hat{C}_{ij}\) is the objectness score of the j-th anchor box in the i-th grid cell..
\[\mathcal{L}_{classification} = \sum\limits_{i=0}^{S^2} \mathbb{1}_{i}^{obj} \sum\limits_{c \in classes} (p_i(c) - \hat{p}_i(c))^2\]where \(\mathbb{1}_{i}^{obj}\) equals to 1 if a GT center falls inside the i-th grid cell else 0, \(p_i(c)\) equals 1 if the assigned ground truth to the i-th cell is classified as class \(c\), and \(\hat{p}_i(c)\) is the predicted probability of class \(c\) in the i-th cell.
And the full loss is given by:
\[\mathcal{L}_{YOLOv1} = \lambda_{coords} \cdot \mathcal{L}_{coords} + \mathcal{L}_{objectness} + \mathcal{L}_{classification}\]where \(\lambda_{coords}\) is a positive coefficient (default: 5).
- Parameters:
- Returns:
detection module
- Return type:
- holocron.models.yolov2(pretrained=False, progress=True, pretrained_backbone=True, **kwargs)[source]¶
YOLOv2 model from “YOLO9000: Better, Faster, Stronger”.
YOLOv2 improves upon YOLO by raising the number of boxes predicted by grid cell (default: 5), introducing bounding box priors and predicting class scores for each anchor box in the grid cell.
For training, YOLOv2 uses the same multi-part loss as YOLO apart from its classification loss:
\[\mathcal{L}_{classification} = \sum\limits_{i=0}^{S^2} \sum\limits_{j=0}^{B} \mathbb{1}_{ij}^{obj} \sum\limits_{c \in classes} (p_{ij}(c) - \hat{p}_{ij}(c))^2\]where \(S\) is size of the output feature map (13 for an input size \((416, 416)\)), \(B\) is the number of anchor boxes per grid cell (default: 5), \(\mathbb{1}_{ij}^{obj}\) equals to 1 if a GT center falls inside the i-th grid cell and among the anchor boxes of that cell, has the highest IoU with the j-th box else 0, \(p_{ij}(c)\) equals 1 if the assigned ground truth to the j-th anchor box of the i-th cell is classified as class \(c\), and \(\hat{p}_{ij}(c)\) is the predicted probability of class \(c\) for the j-th anchor box in the i-th cell.
- Parameters:
- Returns:
detection module
- Return type:
Semantic Segmentation¶
Semantic segmentation models expect a 4D image tensor as an input (N x C x H x W) and returns a classification score tensor of size (N x K x Ho x Wo).
import holocron.models as models
unet = models.unet(num_classes=10)
U-Net¶
- holocron.models.unet(pretrained=False, progress=True, **kwargs)[source]¶
U-Net from “U-Net: Convolutional Networks for Biomedical Image Segmentation”
- Parameters:
- Returns:
semantic segmentation model
- Return type:
- holocron.models.unetp(pretrained=False, progress=True, **kwargs)[source]¶
UNet+ from “UNet++: A Nested U-Net Architecture for Medical Image Segmentation”
- Parameters:
- Returns:
semantic segmentation model
- Return type:
- holocron.models.unetpp(pretrained=False, progress=True, **kwargs)[source]¶
UNet++ from “UNet++: A Nested U-Net Architecture for Medical Image Segmentation”
- Parameters:
- Returns:
semantic segmentation model
- Return type:
- holocron.models.unet3p(pretrained=False, progress=True, **kwargs)[source]¶
UNet3+ from “UNet 3+: A Full-Scale Connected UNet For Medical Image Segmentation”
- Parameters:
- Returns:
semantic segmentation model
- Return type: