mmlearn.modules.layers.embedding

Embedding layers.

Functions

get_1d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)[source]

Generate 1D sine-cosine positional embeddings.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid_size (int) – The size of the grid.

  • cls_token (bool, optional, default=False) – Whether to include a class token in the embeddings.

Returns:

pos_embed – Positional embeddings with shape [grid_size, embed_dim] or [1 + grid_size, embed_dim] if cls_token is True.

Return type:

np.ndarray

get_1d_sincos_pos_embed_from_grid(embed_dim, pos)[source]

Generate 1D sine-cosine positional embeddings from a grid.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • pos (np.ndarray) – A list of positions to be encoded, with shape [M,].

Returns:

emb – Positional embeddings with shape [M, embed_dim].

Return type:

np.ndarray

get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)[source]

Generate 2D sine-cosine positional embeddings.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid_size (int) – The size of the grid (both height and width).

  • cls_token (bool, optional, default=False) – Whether to include a class token in the embeddings.

Returns:

pos_embed – Positional embeddings with shape [grid_size*grid_size, embed_dim] or [1 + grid_size*grid_size, embed_dim] if cls_token is True.

Return type:

np.ndarray

get_2d_sincos_pos_embed_from_grid(embed_dim, grid)[source]

Generate 2D sine-cosine positional embeddings from a grid.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid (np.ndarray) – The grid of positions with shape [2, 1, grid_size, grid_size].

Returns:

emb – Positional embeddings with shape [grid_size*grid_size, embed_dim].

Return type:

np.ndarray

Classes

ConvEmbed

3x3 Convolution stems for ViT following ViTC models.

PatchEmbed

Image to Patch Embedding.

class ConvEmbed(channels, strides, img_size=224, in_chans=3, batch_norm=True)[source]

3x3 Convolution stems for ViT following ViTC models.

This module builds convolutional stems for Vision Transformers (ViT) with intermediate batch normalization and ReLU activation.

Parameters:
  • channels (list[int]) – list of channel sizes for each convolution layer.

  • strides (list[int]) – list of stride sizes for each convolution layer.

  • img_size (int, optional, default=224) – Size of the input image (assumed to be square).

  • in_chans (int, optional, default=3) – Number of input channels in the image.

  • batch_norm (bool, optional, default=True) – Whether to include batch normalization after each convolution layer.

forward(x)[source]

Forward pass through the convolutional embedding layers.

Return type:

Tensor

class PatchEmbed(img_size=224, patch_size=16, in_chans=3, embed_dim=768)[source]

Image to Patch Embedding.

This module divides an image into patches and embeds them as a sequence of vectors.

Parameters:
  • img_size (int, optional, default=224) – Size of the input image (assumed to be square).

  • patch_size (int, optional, default=16) – Size of each image patch (assumed to be square).

  • in_chans (int, optional, default=3) – Number of input channels in the image.

  • embed_dim (int, optional, default=768) – Dimension of the output embeddings.

forward(x)[source]

Forward pass to convert an image into patch embeddings.

Return type:

Tensor

get_1d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)[source]

Generate 1D sine-cosine positional embeddings.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid_size (int) – The size of the grid.

  • cls_token (bool, optional, default=False) – Whether to include a class token in the embeddings.

Returns:

pos_embed – Positional embeddings with shape [grid_size, embed_dim] or [1 + grid_size, embed_dim] if cls_token is True.

Return type:

np.ndarray

get_1d_sincos_pos_embed_from_grid(embed_dim, pos)[source]

Generate 1D sine-cosine positional embeddings from a grid.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • pos (np.ndarray) – A list of positions to be encoded, with shape [M,].

Returns:

emb – Positional embeddings with shape [M, embed_dim].

Return type:

np.ndarray

get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False)[source]

Generate 2D sine-cosine positional embeddings.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid_size (int) – The size of the grid (both height and width).

  • cls_token (bool, optional, default=False) – Whether to include a class token in the embeddings.

Returns:

pos_embed – Positional embeddings with shape [grid_size*grid_size, embed_dim] or [1 + grid_size*grid_size, embed_dim] if cls_token is True.

Return type:

np.ndarray

get_2d_sincos_pos_embed_from_grid(embed_dim, grid)[source]

Generate 2D sine-cosine positional embeddings from a grid.

Parameters:
  • embed_dim (int) – The dimension of the embeddings.

  • grid (np.ndarray) – The grid of positions with shape [2, 1, grid_size, grid_size].

Returns:

emb – Positional embeddings with shape [grid_size*grid_size, embed_dim].

Return type:

np.ndarray