mmlearn.datasets.processors.tokenizers.Img2Seq

class Img2Seq(img_size, patch_size, n_channels, d_model)[source]

Bases: Module

Convert a batch of images to a batch of sequences.

Parameters:
  • img_size (tuple of int) – The size of the input image.

  • patch_size (tuple of int) – The size of the patch.

  • n_channels (int) – The number of channels in the input image.

  • d_model (int) – The dimension of the output sequence.

Methods

Attributes

__call__(batch)[source]

Convert a batch of images to a batch of sequences.

Parameters:

batch (torch.Tensor) – Batch of images of shape (b, h, w, c) where b is the batch size, h is the height, w is the width, and c is the number of channels.

Returns:

Batch of sequences of shape (b, s, d) where b is the batch size, s is the sequence length, and d is the dimension of the output sequence.

Return type:

torch.Tensor