mmlearn.datasets.processors.masking¶
Token mask generators.
Functions
- apply_masks(x, masks)[source]¶
Apply masks to the input tensor by selecting the patches to keep based on the masks.
This function is primarily intended to be used for the
i-JEPA
.- Parameters:
x (torch.Tensor) – Input tensor of shape
(B, N, D)
.masks (Union[torch.Tensor, list[torch.Tensor]]) – A list of mask tensors of shape
(N,)
,(1, N)
, or(B, N)
.
- Returns:
The masked tensor where only the patches indicated by the masks are kept. The output tensor has shape
(B * num_masks, N', D)
, whereN'
is the number of patches kept.- Return type:
Classes
Blockwise image patch mask generator. |
|
Generates encoder and predictor masks for preprocessing. |
|
Random mask generator. |
- class BlockwiseImagePatchMaskGenerator(input_size, num_masking_patches, min_num_patches=4, max_num_patches=None, min_aspect_ratio=0.3, max_aspect_ratio=None)[source]¶
Blockwise image patch mask generator.
This is primarily intended for the data2vec method.
- Parameters:
input_size (Union[int, tuple[int, int]]) – The size of the input image. If an integer is provided, the image is assumed to be square.
num_masking_patches (int) – The number of patches to mask.
min_num_patches (int, default=4) – The minimum number of patches to mask.
max_num_patches (int, default=None) – The maximum number of patches to mask.
min_aspect_ratio (float, default=0.3) – The minimum aspect ratio of the patch.
max_aspect_ratio (float, default=None) – The maximum aspect ratio of the patch.
- class IJEPAMaskGenerator(input_size=(224, 224), patch_size=16, min_keep=10, allow_overlap=False, enc_mask_scale=(0.85, 1.0), pred_mask_scale=(0.15, 0.2), aspect_ratio=(0.75, 1.0), nenc=1, npred=4)[source]¶
Generates encoder and predictor masks for preprocessing.
This class generates masks dynamically for batches of examples.
- Parameters:
input_size (tuple[int, int], default=(224, 224)) – Input image size.
patch_size (int, default=16) – Size of each patch.
min_keep (int, default=10) – Minimum number of patches to keep.
allow_overlap (bool, default=False) – Whether to allow overlap between encoder and predictor masks.
enc_mask_scale (tuple[float, float], default=(0.85, 1.0)) – Scale range for encoder mask.
pred_mask_scale (tuple[float, float], default=(0.15, 0.2)) – Scale range for predictor mask.
aspect_ratio (tuple[float, float], default=(0.75, 1.0)) – Aspect ratio range for mask blocks.
nenc (int, default=1) – Number of encoder masks to generate.
npred (int, default=4) – Number of predictor masks to generate.
- class RandomMaskGenerator(probability)[source]¶
Random mask generator.
Returns a random mask of shape (nb_patches, nb_patches) based on the configuration where the number of patches to be masked is num_masking_patches. This is intended to be used for tasks like masked language modeling.
- Parameters:
probability (float) – Probability of masking a token.
- apply_masks(x, masks)[source]¶
Apply masks to the input tensor by selecting the patches to keep based on the masks.
This function is primarily intended to be used for the
i-JEPA
.- Parameters:
x (torch.Tensor) – Input tensor of shape
(B, N, D)
.masks (Union[torch.Tensor, list[torch.Tensor]]) – A list of mask tensors of shape
(N,)
,(1, N)
, or(B, N)
.
- Returns:
The masked tensor where only the patches indicated by the masks are kept. The output tensor has shape
(B * num_masks, N', D)
, whereN'
is the number of patches kept.- Return type: