holocron.nn¶
An addition to the torch.nn module of Pytorch to extend the range of neural networks building blocks.
Non-linear activations¶
HardMish
¶
HardMish(inplace: bool = False)
Bases: _Activation
Implements the Had Mish activation module from "H-Mish".
This activation is computed as follows:
| PARAMETER | DESCRIPTION |
|---|---|
inplace
|
should the operation be performed inplace
TYPE:
|
Source code in holocron/nn/modules/activation.py
NLReLU
¶
Bases: _Activation
Implements the Natural-Logarithm ReLU activation module from "Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks".
This activation is computed as follows:
| PARAMETER | DESCRIPTION |
|---|---|
beta
|
beta used for NReLU
TYPE:
|
inplace
|
should the operation be performed inplace
TYPE:
|
Source code in holocron/nn/modules/activation.py
FReLU
¶
Bases: Module
Implements the Funnel activation module from "Funnel Activation for Visual Recognition".
This activation is computed as follows:
where the \(\mathbb{T}\) is the spatial contextual feature extraction. It is a convolution filter of size
kernel_size, same padding and groups equal to the number of input channels, followed by a batch normalization.
| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
number of input channels
TYPE:
|
kernel_size
|
size of the convolution filter
TYPE:
|
Source code in holocron/nn/modules/activation.py
Loss functions¶
Loss
¶
Loss(weight: float | list[float] | Tensor | None = None, ignore_index: int = -100, reduction: str = 'mean')
Bases: Module
Base loss class.
| PARAMETER | DESCRIPTION |
|---|---|
weight
|
class weight for loss computation |
ignore_index
|
specifies target value that is ignored and do not contribute to gradient
TYPE:
|
reduction
|
type of reduction to apply to the final loss
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
if the reduction method is not supported |
Source code in holocron/nn/modules/loss.py
FocalLoss
¶
Bases: Loss
Implementation of Focal Loss as described in "Focal Loss for Dense Object Detection".
While the weighted cross-entropy is described by:
where \(\alpha_t\) is the loss weight of class \(t\), and \(p_t\) is the predicted probability of class \(t\).
the focal loss introduces a modulating factor
where \(\gamma\) is a positive focusing parameter.
| PARAMETER | DESCRIPTION |
|---|---|
gamma
|
exponent parameter of the focal loss
TYPE:
|
**kwargs
|
keyword args of
TYPE:
|
Source code in holocron/nn/modules/loss.py
MultiLabelCrossEntropy
¶
ComplementCrossEntropy
¶
Bases: Loss
Implements the complement cross entropy loss from "Imbalanced Image Classification with Complement Cross Entropy"
| PARAMETER | DESCRIPTION |
|---|---|
gamma
|
smoothing factor
TYPE:
|
**kwargs
|
keyword args of
TYPE:
|
Source code in holocron/nn/modules/loss.py
MutualChannelLoss
¶
MutualChannelLoss(weight: float | list[float] | Tensor | None = None, ignore_index: int = -100, reduction: str = 'mean', xi: int = 2, alpha: float = 1)
Bases: Loss
Implements the mutual channel loss from "The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification".
| PARAMETER | DESCRIPTION |
|---|---|
weight
|
class weight for loss computation |
ignore_index
|
specifies target value that is ignored and do not contribute to gradient
TYPE:
|
reduction
|
type of reduction to apply to the final loss
TYPE:
|
xi
|
num of features per class
TYPE:
|
alpha
|
diversity factor
TYPE:
|
Source code in holocron/nn/modules/loss.py
DiceLoss
¶
DiceLoss(weight: float | list[float] | Tensor | None = None, gamma: float = 1.0, eps: float = 1e-08)
Bases: Loss
Implements the dice loss from "V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation".
| PARAMETER | DESCRIPTION |
|---|---|
weight
|
class weight for loss computation |
gamma
|
recall/precision control param
TYPE:
|
eps
|
small value added to avoid division by zero
TYPE:
|
Source code in holocron/nn/modules/loss.py
PolyLoss
¶
Bases: Loss
Implements the Poly1 loss from "PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions".
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
args of
TYPE:
|
eps
|
epsilon 1 from the paper
TYPE:
|
**kwargs
|
keyword args of
TYPE:
|
Source code in holocron/nn/modules/loss.py
Loss wrappers¶
ClassBalancedWrapper
¶
Bases: Module
Implementation of the class-balanced loss as described in "Class-Balanced Loss Based on Effective Number of Samples".
Given a loss function \(\mathcal{L}\), the class-balanced loss is described by:
where \(p\) is the predicted probability for class \(y\), \(n_y\) is the number of training samples for class \(y\), and \(\beta\) is exponential factor.
| PARAMETER | DESCRIPTION |
|---|---|
criterion
|
loss module
TYPE:
|
num_samples
|
number of samples for each class
TYPE:
|
beta
|
rebalancing exponent
TYPE:
|
Source code in holocron/nn/modules/loss.py
Convolution layers¶
NormConv2d
¶
NormConv2d(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, bias: bool = True, padding_mode: Literal['zeros', 'reflect', 'replicate', 'circular'] = 'zeros', eps: float = 1e-14)
Bases: _NormConvNd
Implements the normalized convolution module from "Normalized Convolutional Neural Network".
In the simplest case, the output value of the layer with input size \((N, C_{in}, H, W)\) and output \((N, C_{out}, H_{out}, W_{out})\) can be precisely described as:
where \(\star\) is the valid 2D cross-correlation operator, \(\mu(N_i, k)\) and \(\sigma²(N_i, k)\) are the mean and variance of \(input(N_i, k)\) over all slices, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
Number of channels in the input image
TYPE:
|
out_channels
|
Number of channels produced by the convolution
TYPE:
|
kernel_size
|
Size of the convolving kernel
TYPE:
|
stride
|
Stride of the convolution.
TYPE:
|
padding
|
Zero-padding added to both sides of the input.
TYPE:
|
dilation
|
Spacing between kernel elements.
TYPE:
|
groups
|
Number of blocked connections from input channels to output channels.
TYPE:
|
bias
|
If
TYPE:
|
padding_mode
|
TYPE:
|
eps
|
a value added to the denominator for numerical stability.
TYPE:
|
Source code in holocron/nn/modules/conv.py
Add2d
¶
Add2d(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, bias: bool = True, padding_mode: Literal['zeros', 'reflect', 'replicate', 'circular'] = 'zeros', normalize_slices: bool = False, eps: float = 1e-14)
Bases: _NormConvNd
Implements the adder module from "AdderNet: Do We Really Need Multiplications in Deep Learning?".
In the simplest case, the output value of the layer at position \((m, n)\) in channel \(c\) with filter F of spatial size \((d, d)\), intput size \((C_{in}, H, W)\) and output \((C_{out}, H, W)\) can be precisely described as:
where \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.

| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
Number of channels in the input image
TYPE:
|
out_channels
|
Number of channels produced by the convolution
TYPE:
|
kernel_size
|
Size of the convolving kernel
TYPE:
|
stride
|
Stride of the convolution.
TYPE:
|
padding
|
Zero-padding added to both sides of the input.
TYPE:
|
dilation
|
Spacing between kernel elements.
TYPE:
|
groups
|
Number of blocked connections from input channels to output channels.
TYPE:
|
bias
|
If
TYPE:
|
padding_mode
|
TYPE:
|
normalize_slices
|
whether slices should be normalized before performing cross-correlation.
TYPE:
|
eps
|
a value added to the denominator for numerical stability.
TYPE:
|
Source code in holocron/nn/modules/conv.py
SlimConv2d
¶
SlimConv2d(in_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, bias: bool = True, padding_mode: Literal['zeros', 'reflect', 'replicate', 'circular'] = 'zeros', r: int = 32, L: int = 2)
Bases: Module
Implements the convolution module from "SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping".
First, we compute channel-wise weights as follows:
where \(X \in \mathbb{R}^{C \times H \times W}\) is the input tensor, \(H\) is height in pixels, and \(W\) is width in pixels.
where \(z \in \mathbb{R}^{C}\) contains channel-wise statistics, \(\sigma\) refers to the sigmoid function, \(\delta\) refers to the ReLU function, \(F_{fc1}\) is a convolution operation with kernel of size \((1, 1)\) with \(max(C/r, L)\) output channels followed by batch normalization, and \(F_{fc2}\) is a plain convolution operation with kernel of size \((1, 1)\) with \(C\) output channels.
We then proceed with reconstructing and transforming both pathways:
where \(\odot\) refers to the element-wise multiplication and \(\check{w}\) is the channel-wise reverse-flip of \(w\).
where \(X^{(1)}\) and \(X^{(2)}\) are the channel-wise first and second halves of \(X\), \(F_{top}\) is a convolution of kernel size \((3, 3)\), and \(F_{bot}\) is a convolution of kernel size \((1, 1)\) reducing channels by half, followed by a convolution of kernel size \((3, 3)\).
Finally we fuse both pathways to yield the output:
where \(\oplus\) is the channel-wise concatenation.

| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
Number of channels in the input image
TYPE:
|
kernel_size
|
Size of the convolving kernel
TYPE:
|
stride
|
Stride of the convolution.
TYPE:
|
padding
|
Zero-padding added to both sides of the input.
TYPE:
|
dilation
|
Spacing between kernel elements.
TYPE:
|
groups
|
Number of blocked connections from input channels to output channels.
TYPE:
|
bias
|
If
TYPE:
|
padding_mode
|
TYPE:
|
r
|
squeezing divider.
TYPE:
|
L
|
minimum squeezed channels.
TYPE:
|
Source code in holocron/nn/modules/conv.py
PyConv2d
¶
PyConv2d(in_channels: int, out_channels: int, kernel_size: int, num_levels: int = 2, padding: int = 0, groups: list[int] | None = None, **kwargs: Any)
Bases: ModuleList
Implements the convolution module from "Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition".

| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
Number of channels in the input image
TYPE:
|
out_channels
|
Number of channels produced by the convolution
TYPE:
|
kernel_size
|
Size of the convolving kernel
TYPE:
|
num_levels
|
number of stacks in the pyramid.
TYPE:
|
padding
|
Zero-padding added to both sides of the input.
TYPE:
|
groups
|
Number of blocked connections from input channels to output channels. |
kwargs
|
keyword args of
TYPE:
|
Source code in holocron/nn/modules/conv.py
Involution2d
¶
Involution2d(in_channels: int, kernel_size: int, padding: int = 0, stride: int = 1, groups: int = 1, dilation: int = 1, reduction_ratio: float = 1)
Bases: Module
Implements the convolution module from "Involution: Inverting the Inherence of Convolution for Visual Recognition", adapted from the proposed PyTorch implementation in the paper.

| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
Number of channels in the input image
TYPE:
|
kernel_size
|
Size of the convolving kernel
TYPE:
|
padding
|
Zero-padding added to both sides of the input.
TYPE:
|
stride
|
Stride of the convolution.
TYPE:
|
groups
|
Number of blocked connections from input channels to output channels.
TYPE:
|
dilation
|
Spacing between kernel elements.
TYPE:
|
reduction_ratio
|
reduction ratio of the channels to generate the kernel
TYPE:
|
Source code in holocron/nn/modules/conv.py
Regularization layers¶
DropBlock2d
¶
Bases: Module
Implements the DropBlock module from "DropBlock: A regularization method for convolutional networks"

| PARAMETER | DESCRIPTION |
|---|---|
p
|
probability of dropping activation value
TYPE:
|
block_size
|
size of each block that is expended from the sampled mask
TYPE:
|
inplace
|
whether the operation should be done inplace
TYPE:
|
Source code in holocron/nn/modules/dropblock.py
Downsampling¶
ConcatDownsample2d
¶
ConcatDownsample2d(scale_factor: int)
Bases: Module
Implements a loss-less downsampling operation described in "YOLO9000: Better, Faster, Stronger" by stacking adjacent information on the channel dimension.
| PARAMETER | DESCRIPTION |
|---|---|
scale_factor
|
spatial scaling factor
TYPE:
|
Source code in holocron/nn/modules/downsample.py
GlobalAvgPool2d
¶
GlobalAvgPool2d(flatten: bool = False)
Bases: Module
Fast implementation of global average pooling from "TResNet: High Performance GPU-Dedicated Architecture"
| PARAMETER | DESCRIPTION |
|---|---|
flatten
|
whether spatial dimensions should be squeezed
TYPE:
|
Source code in holocron/nn/modules/downsample.py
GlobalMaxPool2d
¶
GlobalMaxPool2d(flatten: bool = False)
Bases: Module
Fast implementation of global max pooling from "TResNet: High Performance GPU-Dedicated Architecture"
| PARAMETER | DESCRIPTION |
|---|---|
flatten
|
whether spatial dimensions should be squeezed
TYPE:
|
Source code in holocron/nn/modules/downsample.py
BlurPool2d
¶
Bases: Module
Ross Wightman's implementation of blur pooling module as described in "Making Convolutional Networks Shift-Invariant Again".

| PARAMETER | DESCRIPTION |
|---|---|
channels
|
Number of input channels
TYPE:
|
kernel_size
|
binomial filter size for blurring. currently supports 3 (default) and 5.
TYPE:
|
stride
|
downsampling filter stride
TYPE:
|
Source code in holocron/nn/modules/downsample.py
SPP
¶
Bases: ModuleList
SPP layer from "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition".
| PARAMETER | DESCRIPTION |
|---|---|
kernel_sizes
|
kernel sizes of each pooling |
Source code in holocron/nn/modules/downsample.py
ZPool
¶
ZPool(dim: int = 1)
Bases: Module
Z-pool layer from "Rotate to Attend: Convolutional Triplet Attention Module".
| PARAMETER | DESCRIPTION |
|---|---|
dim
|
dimension to pool across
TYPE:
|
Source code in holocron/nn/modules/downsample.py
Attention¶
SAM
¶
SAM(in_channels: int)
Bases: Module
SAM layer from "CBAM: Convolutional Block Attention Module" modified in "YOLOv4: Optimal Speed and Accuracy of Object Detection".
| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
input channels
TYPE:
|
Source code in holocron/nn/modules/attention.py
LambdaLayer
¶
LambdaLayer(in_channels: int, out_channels: int, dim_k: int, n: int | None = None, r: int | None = None, num_heads: int = 4, dim_u: int = 1)
Bases: Module
Lambda layer from "LambdaNetworks: Modeling long-range interactions without attention". The implementation was adapted from lucidrains.

| PARAMETER | DESCRIPTION |
|---|---|
in_channels
|
input channels
TYPE:
|
out_channels
|
output channels
TYPE:
|
dim_k
|
key dimension
TYPE:
|
n
|
number of input pixels
TYPE:
|
r
|
receptive field for relative positional encoding
TYPE:
|
num_heads
|
number of attention heads
TYPE:
|
dim_u
|
intra-depth dimension
TYPE:
|
Source code in holocron/nn/modules/lambda_layer.py
TripletAttention
¶
Bases: Module
Triplet attention layer from "Rotate to Attend: Convolutional Triplet Attention Module". This implementation is based on the one from the paper's authors.