Interpretability methods¶
Class activation map¶
The class activation map gives you the importance of each region of a feature map on a model's output. More specifically, a class activation map is relative to:
- the layer at which it is computed (e.g. the N-th layer of your model)
- the model's classification output (e.g. the raw logits of the model)
- the class index to focus on
With TorchCAM, the target layer is selected when you create your CAM extractor. You will need to pass the model logits to the extractor and a class index for it to do its magic!
Activation-based methods¶
Methods related to activation-based class activation maps.
CAM
¶
CAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, fc_layer: Module | str | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Learning Deep Features for Discriminative Localization".
The Class Activation Map (CAM) is defined for image classification models that have global pooling at the end of the visual feature extraction block. The localization map is computed as follows:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), and \(w_k^{(c)}\) is the weight corresponding to class \(c\) for unit \(k\) in the fully connected layer.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import CAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with CAM(model, 'layer4', 'fc') as cam_extractor:
with torch.inference_mode(): out = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
fc_layer
|
either the fully connected layer itself or its name |
input_shape
|
shape of the expected input tensor excluding the batch dimension |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
if the argument is invalid |
TypeError
|
if the argument type is invalid |
Source code in torchcam/methods/activation.py
torchcam.methods.CAM
¶
CAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, fc_layer: Module | str | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Learning Deep Features for Discriminative Localization".
The Class Activation Map (CAM) is defined for image classification models that have global pooling at the end of the visual feature extraction block. The localization map is computed as follows:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), and \(w_k^{(c)}\) is the weight corresponding to class \(c\) for unit \(k\) in the fully connected layer.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import CAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with CAM(model, 'layer4', 'fc') as cam_extractor:
with torch.inference_mode(): out = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
fc_layer
|
either the fully connected layer itself or its name |
input_shape
|
shape of the expected input tensor excluding the batch dimension |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
if the argument is invalid |
TypeError
|
if the argument type is invalid |
Source code in torchcam/methods/activation.py
torchcam.methods.ScoreCAM
¶
ScoreCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, batch_size: int = 32, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), \(Y^{(c)}(X)\) is the model output score for class \(c\) before softmax for input \(X\), \(X_b\) is a baseline image, and \(M_k\) is defined as follows:
where \(\odot\) refers to the element-wise multiplication and \(U\) is the upsampling operation.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import ScoreCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with ScoreCAM(model, 'layer4') as cam_extractor:
with torch.inference_mode(): out = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
batch_size
|
batch size used to forward masked inputs
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/activation.py
torchcam.methods.SSCAM
¶
SSCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, batch_size: int = 32, num_samples: int = 35, std: float = 2.0, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "SS-CAM: Smoothed Score-CAM for Sharper Visual Feature Localization".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(N\) is the number of samples used to smooth the weights, \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), \(Y^{(c)}(X)\) is the model output score for class \(c\) before softmax for input \(X\), \(X_b\) is a baseline image, and \(M_k\) is defined as follows:
where \(\odot\) refers to the element-wise multiplication, \(U\) is the upsampling operation, \(\delta \sim \mathcal{N}(0, \sigma^2)\) is the random noise that follows a 0-mean gaussian distribution with a standard deviation of \(\sigma\).
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import SSCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with SSCAM(model, 'layer4') as cam_extractor:
with torch.inference_mode(): out = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
batch_size
|
batch size used to forward masked inputs
TYPE:
|
num_samples
|
number of noisy samples used for weight computation
TYPE:
|
std
|
standard deviation of the noise added to the normalized activation
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/activation.py
torchcam.methods.ISCAM
¶
ISCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, batch_size: int = 32, num_samples: int = 10, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "IS-CAM: Integrated Score-CAM for axiomatic-based explanations".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(N\) is the number of samples used to smooth the weights, \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), \(Y^{(c)}(X)\) is the model output score for class \(c\) before softmax for input \(X\), \(X_b\) is a baseline image, and \(M_i\) is defined as follows:
where \(\odot\) refers to the element-wise multiplication, \(U\) is the upsampling operation.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import ISCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with ISCAM(model, 'layer4') as cam_extractor:
with torch.inference_mode(): out = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
batch_size
|
batch size used to forward masked inputs
TYPE:
|
num_samples
|
number of noisy samples used for weight computation
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/activation.py
Gradient-based methods¶
Methods related to gradient-based class activation maps.
torchcam.methods.GradCAM
¶
GradCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), and \(Y^{(c)}\) is the model output score for class \(c\) before softmax.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import GradCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with GradCAM(model, 'layer4') as cam_extractor:
scores = model(input_tensor)
cam = cam_extractor(class_idx=100, scores=scores)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/gradient.py
torchcam.methods.GradCAMpp
¶
GradCAMpp(model: Module, target_layer: Module | str | list[Module | str] | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), \(Y^{(c)}\) is the model output score for class \(c\) before softmax, and \(\alpha_k^{(c)}(i, j)\) being defined as:
if \(\frac{\partial Y^{(c)}}{\partial A_k(i, j)} = 1\) else \(0\).
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import GradCAMpp
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with GradCAMpp(model, 'layer4') as cam_extractor:
scores = model(input_tensor)
cam = cam_extractor(class_idx=100, scores=scores)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/gradient.py
torchcam.methods.SmoothGradCAMpp
¶
SmoothGradCAMpp(model: Module, target_layer: Module | str | list[Module | str] | None = None, num_samples: int = 4, std: float = 0.3, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models" with a personal correction to the paper (alpha coefficient numerator).
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), \(Y^{(c)}\) is the model output score for class \(c\) before softmax, and \(\alpha_k^{(c)}(i, j)\) being defined as:
if \(\frac{\partial Y^{(c)}}{\partial A_k(i, j)} = 1\) else \(0\). Here \(D^{(c, p)}_k(i, j)\) refers to the p-th partial derivative of the class score of class \(c\) relatively to the activation in layer \(k\) at position \((i, j)\), and \(n\) is the number of samples used to get the gradient estimate.
Please note the difference in the numerator of \(\alpha_k^{(c)}(i, j)\), which is actually \(\frac{1}{n} \sum\limits_{k=1}^n D^{(c, 1)}_k(i,j)\) in the paper.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import SmoothGradCAMpp
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with SmoothGradCAMpp(model, 'layer4') as cam_extractor:
scores = model(input_tensor)
cam = cam_extractor(class_idx=100)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
num_samples
|
number of samples to use for smoothing
TYPE:
|
std
|
standard deviation of the noise
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/gradient.py
torchcam.methods.XGradCAM
¶
XGradCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), and \(Y^{(c)}\) is the model output score for class \(c\) before softmax.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import XGradCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with XGradCAM(model, 'layer4') as cam_extractor:
scores = model(input_tensor)
cam = cam_extractor(class_idx=100, scores=scores)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |
Source code in torchcam/methods/gradient.py
torchcam.methods.LayerCAM
¶
LayerCAM(model: Module, target_layer: Module | str | list[Module | str] | None = None, input_shape: tuple[int, ...] = (3, 224, 224), **kwargs: Any)
Implements a class activation map extractor as described in "LayerCAM: Exploring Hierarchical Class Activation Maps for Localization".
The localization map is computed as follows:
with the coefficient \(w_k^{(c)}(x, y)\) being defined as:
where \(A_k(x, y)\) is the activation of node \(k\) in the target layer of the model at position \((x, y)\), and \(Y^{(c)}\) is the model output score for class \(c\) before softmax.
Example
from torchvision.models import get_model, get_model_weights
from torchcam.methods import LayerCAM
model = get_model("resnet18", weights=get_model_weights("resnet18").DEFAULT).eval()
with LayerCAM(model, 'layer4') as cam_extractor:
scores = model(input_tensor)
cams = cam_extractor(class_idx=100, scores=scores)
fused_cam = cam_extractor.fuse_cams(cams)
| PARAMETER | DESCRIPTION |
|---|---|
model
|
input model
TYPE:
|
target_layer
|
either the target layer itself or its name, or a list of those
TYPE:
|
input_shape
|
shape of the expected input tensor excluding the batch dimension |