mmlearn.datasets.processors.masking

Token mask generators.

Functions

apply_masks(x, masks)[source]

Apply masks to the input tensor by selecting the patches to keep based on the masks.

This function is primarily intended to be used for the i-JEPA.

Parameters:
Returns:

The masked tensor where only the patches indicated by the masks are kept. The output tensor has shape (B * num_masks, N', D), where N' is the number of patches kept.

Return type:

torch.Tensor

Classes

BlockwiseImagePatchMaskGenerator

Blockwise image patch mask generator.

IJEPAMaskGenerator

Generates encoder and predictor masks for preprocessing.

RandomMaskGenerator

Random mask generator.

class BlockwiseImagePatchMaskGenerator(input_size, num_masking_patches, min_num_patches=4, max_num_patches=None, min_aspect_ratio=0.3, max_aspect_ratio=None)[source]

Blockwise image patch mask generator.

This is primarily intended for the data2vec method.

Parameters:
  • input_size (Union[int, tuple[int, int]]) – The size of the input image. If an integer is provided, the image is assumed to be square.

  • num_masking_patches (int) – The number of patches to mask.

  • min_num_patches (int, default=4) – The minimum number of patches to mask.

  • max_num_patches (int, default=None) – The maximum number of patches to mask.

  • min_aspect_ratio (float, default=0.3) – The minimum aspect ratio of the patch.

  • max_aspect_ratio (float, default=None) – The maximum aspect ratio of the patch.

__call__()[source]

Generate a random mask.

Returns a random mask of shape (nb_patches, nb_patches) based on the configuration where the number of patches to be masked is num_masking_patches.

Returns:

mask – A mask of shape (nb_patches, nb_patches)

Return type:

torch.Tensor

get_shape()[source]

Get the shape of the input.

Returns:

The shape of the input as a tuple (height, width).

Return type:

tuple[int, int]

class IJEPAMaskGenerator(input_size=(224, 224), patch_size=16, min_keep=10, allow_overlap=False, enc_mask_scale=(0.85, 1.0), pred_mask_scale=(0.15, 0.2), aspect_ratio=(0.75, 1.0), nenc=1, npred=4)[source]

Generates encoder and predictor masks for preprocessing.

This class generates masks dynamically for batches of examples.

Parameters:
  • input_size (tuple[int, int], default=(224, 224)) – Input image size.

  • patch_size (int, default=16) – Size of each patch.

  • min_keep (int, default=10) – Minimum number of patches to keep.

  • allow_overlap (bool, default=False) – Whether to allow overlap between encoder and predictor masks.

  • enc_mask_scale (tuple[float, float], default=(0.85, 1.0)) – Scale range for encoder mask.

  • pred_mask_scale (tuple[float, float], default=(0.15, 0.2)) – Scale range for predictor mask.

  • aspect_ratio (tuple[float, float], default=(0.75, 1.0)) – Aspect ratio range for mask blocks.

  • nenc (int, default=1) – Number of encoder masks to generate.

  • npred (int, default=4) – Number of predictor masks to generate.

__call__(batch_size=1)[source]

Generate encoder and predictor masks for a batch of examples.

Parameters:

batch_size (int, default=1) – The batch size for which to generate masks.

Returns:

A dictionary of encoder masks and predictor masks.

Return type:

dict[str, Any]

allow_overlap: bool = False
aspect_ratio: tuple[float, float] = (0.75, 1.0)
enc_mask_scale: tuple[float, float] = (0.85, 1.0)
input_size: tuple[int, int] = (224, 224)
min_keep: int = 10
nenc: int = 1
npred: int = 4
patch_size: int = 16
pred_mask_scale: tuple[float, float] = (0.15, 0.2)
class RandomMaskGenerator(probability)[source]

Random mask generator.

Returns a random mask of shape (nb_patches, nb_patches) based on the configuration where the number of patches to be masked is num_masking_patches. This is intended to be used for tasks like masked language modeling.

Parameters:

probability (float) – Probability of masking a token.

__call__(inputs, tokenizer, special_tokens_mask=None)[source]

Generate a random mask.

Returns a random mask of shape (nb_patches, nb_patches) based on the configuration where the number of patches to be masked is num_masking_patches.

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

  • inputs (torch.Tensor) – The encoded inputs.

  • tokenizer (PreTrainedTokenizer) – The tokenizer.

  • special_tokens_mask (Optional[torch.Tensor], default=None) – Mask for special tokens.

apply_masks(x, masks)[source]

Apply masks to the input tensor by selecting the patches to keep based on the masks.

This function is primarily intended to be used for the i-JEPA.

Parameters:
Returns:

The masked tensor where only the patches indicated by the masks are kept. The output tensor has shape (B * num_masks, N', D), where N' is the number of patches kept.

Return type:

torch.Tensor