holocron.nn¶
An addition to the torch.nn
module of Pytorch to extend the range of neural networks building blocks.
Non-linear activations¶
- class holocron.nn.SiLU[source]¶
Implements the SiLU activation from “Gaussian Error Linear Units (GELUs)” (also known as Swish).
This activation is computed as follows:
\[f(x) = x \cdot \sigma(x)\]
- class holocron.nn.Mish[source]¶
Implements the Mish activation module from “Mish: A Self Regularized Non-Monotonic Neural Activation Function”
This activation is computed as follows:
\[f(x) = x \cdot \tanh(ln(1 + e^x))\]
- class holocron.nn.NLReLU(inplace=False)[source]¶
Implements the Natural-Logarithm ReLU activation module from “Natural-Logarithm-Rectified Activation Function in Convolutional Neural Networks”
This activation is computed as follows:
\[f(x) = ln(1 + \beta \cdot max(0, x))\]- Parameters:
inplace (bool) – should the operation be performed inplace
Loss functions¶
- class holocron.nn.FocalLoss(gamma=2, **kwargs)[source]¶
Implementation of Focal Loss as described in “Focal Loss for Dense Object Detection”.
While the weighted cross-entropy is described by:
\[CE(p_t) = -\alpha_t log(p_t)\]where \(\alpha_t\) is the loss weight of class \(t\), and \(p_t\) is the predicted probability of class \(t\).
the focal loss introduces a modulating factor
\[FL(p_t) = -\alpha_t (1 - p_t)^\gamma log(p_t)\]where \(\gamma\) is a positive focusing parameter.
- Parameters:
gamma (float, optional) – exponent parameter of the focal loss
weight (torch.Tensor[K], optional) – class weight for loss computation
ignore_index (int, optional) – specifies target value that is ignored and do not contribute to gradient
reduction (str, optional) – type of reduction to apply to the final loss
- class holocron.nn.MultiLabelCrossEntropy(**kwargs)[source]¶
Implementation of the cross-entropy loss for multi-label targets
- Parameters:
weight (torch.Tensor[K], optional) – class weight for loss computation
ignore_index (int, optional) – specifies target value that is ignored and do not contribute to gradient
reduction (str, optional) – type of reduction to apply to the final loss
- class holocron.nn.LabelSmoothingCrossEntropy(eps=0.1, **kwargs)[source]¶
Implementation of the cross-entropy loss with label smoothing on hard target as described in “Attention Is All You Need”
- Parameters:
eps (float, optional) – smoothing factor
weight (torch.Tensor[K], optional) – class weight for loss computation
ignore_index (int, optional) – specifies target value that is ignored and do not contribute to gradient
reduction (str, optional) – type of reduction to apply to the final loss
Loss wrappers¶
- class holocron.nn.MixupLoss(criterion)[source]¶
Implements a Mixup wrapper as described in “mixup: Beyond Empirical Risk Minimization”
- Parameters:
criterion (callable) – initial criterion to be used on normal sample & targets
Convolution layers¶
- class holocron.nn.NormConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', eps=1e-14)[source]¶
Implements the normalized convolution module from “Normalized Convolutional Neural Network”.
In the simplest case, the output value of the layer with input size \((N, C_{in}, H, W)\) and output \((N, C_{out}, H_{out}, W_{out})\) can be precisely described as:
\[out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star \frac{input(N_i, k) - \mu(N_i, k)}{\sqrt{\sigma^2(N_i, k) + \epsilon}}\]where \(\star\) is the valid 2D cross-correlation operator, \(\mu(N_i, k)\) and \(\sigma²(N_i, k)\) are the mean and variance of \(input(N_i, k)\) over all slices, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
- Parameters:
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
padding_mode (string, optional) –
'zeros'
,'reflect'
,'replicate'
or'circular'
. Default:'zeros'
eps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-14
- class holocron.nn.Add2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', normalize_slices=False, eps=1e-14)[source]¶
Implements the adder module from “AdderNet: Do We Really Need Multiplications in Deep Learning?”.
In the simplest case, the output value of the layer at position \((m, n)\) in channel \(c\) with filter F of spatial size \((d, d)\), intput size \((C_{in}, H, W)\) and output \((C_{out}, H, W)\) can be precisely described as:
\[out(m, n, c) = - \sum\limits_{i=0}^d \sum\limits_{j=0}^d \sum\limits_{k=0}^{C_{in}} |X(m + i, n + j, k) - F(i, j, k, c)|\]where \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
- Parameters:
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
padding_mode (string, optional) –
'zeros'
,'reflect'
,'replicate'
or'circular'
. Default:'zeros'
normalize_slices (bool, optional) – whether slices should be normalized before performing cross-correlation. Default: False
eps (float, optional) – a value added to the denominator for numerical stability. Default: 1e-14
- class holocron.nn.SlimConv2d(in_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', r=32, L=2)[source]¶
Implements the convolution module from “SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping”.
First, we compute channel-wise weights as follows:
\[z(c) = \frac{1}{H \cdot W} \sum\limits_{i=1}^H \sum\limits_{j=1}^W X_{c,i,j}\]where \(X \in \mathbb{R}^{C \times H \times W}\) is the input tensor, \(H\) is height in pixels, and \(W\) is width in pixels.
\[w = \sigma(F_{fc2}(\delta(F_{fc1}(z))))\]where \(z \in \mathbb{R}^{C}\) contains channel-wise statistics, \(\sigma\) refers to the sigmoid function, \(\delta\) refers to the ReLU function, \(F_{fc1}\) is a convolution operation with kernel of size \((1, 1)\) with \(max(C/r, L)\) output channels followed by batch normalization, and \(F_{fc2}\) is a plain convolution operation with kernel of size \((1, 1)\) with \(C\) output channels.
We then proceed with reconstructing and transforming both pathways:
\[X_{top} = X \odot w\]\[X_{bot} = X \odot \check{w}\]where \(\odot\) refers to the element-wise multiplication and \(\check{w}\) is the channel-wise reverse-flip of \(w\).
\[T_{top} = F_{top}(X_{top}^{(1)} + X_{top}^{(2)})\]\[T_{bot} = F_{bot}(X_{bot}^{(1)} + X_{bot}^{(2)})\]where \(X^{(1)}\) and \(X^{(2)}\) are the channel-wise first and second halves of \(X\), \(F_{top}\) is a convolution of kernel size \((3, 3)\), and \(F_{bot}\) is a convolution of kernel size \((1, 1)\) reducing channels by half, followed by a convolution of kernel size \((3, 3)\).
Finally we fuse both pathways to yield the output:
\[Y = T_{top} \oplus T_{bot}\]where \(\oplus\) is the channel-wise concatenation.
- Parameters:
in_channels (int) – Number of channels in the input image
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If
True
, adds a learnable bias to the output. Default:True
padding_mode (string, optional) –
'zeros'
,'reflect'
,'replicate'
or'circular'
. Default:'zeros'
r (int, optional) – squeezing divider. Default: 32
L (int, optional) – minimum squeezed channels. Default: 8
Regularization layers¶
- class holocron.nn.DropBlock2d(p, block_size, inplace=False)[source]¶
Implements the DropBlock module from “DropBlock: A regularization method for convolutional networks”
Downsampling¶
- class holocron.nn.ConcatDownsample2d(scale_factor)[source]¶
Implements a loss-less downsampling operation described in “YOLO9000: Better, Faster, Stronger” by stacking adjacent information on the channel dimension.
- Parameters:
scale_factor (int) – spatial scaling factor