mmlearn.modules.layers.embedding.ConvEmbed

class ConvEmbed(channels, strides, img_size=224, in_chans=3, batch_norm=True)[source]

Bases: Module

3x3 Convolution stems for ViT following ViTC models.

This module builds convolutional stems for Vision Transformers (ViT) with intermediate batch normalization and ReLU activation.

Parameters:
  • channels (list[int]) – list of channel sizes for each convolution layer.

  • strides (list[int]) – list of stride sizes for each convolution layer.

  • img_size (int, optional, default=224) – Size of the input image (assumed to be square).

  • in_chans (int, optional, default=3) – Number of input channels in the image.

  • batch_norm (bool, optional, default=True) – Whether to include batch normalization after each convolution layer.

Methods

Attributes

forward(x)[source]

Forward pass through the convolutional embedding layers.

Return type:

Tensor