mmlearn.modules.encoders.clip module¶
Wrappers and interfaces for CLIP models.
- class HFCLIPTextEncoder(model_name_or_path, pretrained=True, pooling_layer=None, freeze_layers=False, freeze_layer_norm=True, peft_config=None, model_config_kwargs=None)[source]¶
Bases:
Module
Wrapper around the
CLIPTextModel
from HuggingFace.- Parameters:
model_name_or_path (str) – The huggingface model name or a local path from which to load the model.
pretrained (bool, default=True) – Whether to load the pretrained weights or not.
pooling_layer (Optional[torch.nn.Module], optional, default=None) – Pooling layer to apply to the last hidden state of the model.
freeze_layers (Union[int, float, list[int], bool], default=False) – Whether to freeze layers of the model and which layers to freeze. If
True
, all model layers are frozen. If it is an integer, the firstN
layers of the model are frozen. If it is a float, the firstN
percent of the layers are frozen. If it is a list of integers, the layers at the indices in the list are frozen.freeze_layer_norm (bool, default=True) – Whether to freeze the layer normalization layers of the model.
peft_config (Optional[PeftConfig], optional, default=None) – The configuration from the peft library to use to wrap the model for parameter-efficient finetuning.
model_config_kwargs (Optional[dict[str, Any]], optional, default=None) – Additional keyword arguments to pass to the model configuration.
- Warns:
UserWarning – If both
peft_config
andfreeze_layers
are set. Thepeft_config
will override thefreeze_layers
setting.
- class HFCLIPTextEncoderWithProjection(model_name_or_path, pretrained=True, use_all_token_embeddings=False, freeze_layers=False, freeze_layer_norm=True, peft_config=None, model_config_kwargs=None)[source]¶
Bases:
Module
Wrapper around the
CLIPTextModelWithProjection
from HuggingFace.- Parameters:
model_name_or_path (str) – The huggingface model name or a local path from which to load the model.
pretrained (bool, default=True) – Whether to load the pretrained weights or not.
use_all_token_embeddings (bool, default=False) – Whether to use all token embeddings for the text. If
False
the first token embedding will be used.freeze_layers (Union[int, float, list[int], bool], default=False) – Whether to freeze layers of the model and which layers to freeze. If
True
, all model layers are frozen. If it is an integer, the firstN
layers of the model are frozen. If it is a float, the firstN
percent of the layers are frozen. If it is a list of integers, the layers at the indices in the list are frozen.freeze_layer_norm (bool, default=True) – Whether to freeze the layer normalization layers of the model.
peft_config (Optional[PeftConfig], optional, default=None) –
The configuration from the peft library to use to wrap the model for parameter-efficient finetuning.
- Warns:
UserWarning – If both
peft_config
andfreeze_layers
are set. Thepeft_config
will override thefreeze_layers
setting.
- class HFCLIPVisionEncoder(model_name_or_path, pretrained=True, pooling_layer=None, freeze_layers=False, freeze_layer_norm=True, patch_dropout_rate=0.0, patch_dropout_shuffle=False, patch_dropout_bias=None, peft_config=None, model_config_kwargs=None)[source]¶
Bases:
Module
Wrapper around the
CLIPVisionModel
from HuggingFace.- Parameters:
model_name_or_path (str) – The huggingface model name or a local path from which to load the model.
pretrained (bool, default=True) – Whether to load the pretrained weights or not.
pooling_layer (Optional[torch.nn.Module], optional, default=None) – Pooling layer to apply to the last hidden state of the model.
freeze_layers (Union[int, float, list[int], bool], default=False) – Whether to freeze layers of the model and which layers to freeze. If
True
, all model layers are frozen. If it is an integer, the firstN
layers of the model are frozen. If it is a float, the firstN
percent of the layers are frozen. If it is a list of integers, the layers at the indices in the list are frozen.freeze_layer_norm (bool, default=True) – Whether to freeze the layer normalization layers of the model.
patch_dropout_rate (float, default=0.0) – The proportion of patch embeddings to drop out.
patch_dropout_shuffle (bool, default=False) – Whether to shuffle the patches while applying patch dropout.
patch_dropout_bias (Optional[float], optional, default=None) – The bias to apply to the patch dropout mask.
peft_config (Optional[PeftConfig], optional, default=None) –
The configuration from the peft library to use to wrap the model for parameter-efficient finetuning.
model_config_kwargs (Optional[dict[str, Any]], optional, default=None) – Additional keyword arguments to pass to the model configuration.
- Warns:
UserWarning – If both
peft_config
andfreeze_layers
are set. Thepeft_config
will override thefreeze_layers
setting.
- class HFCLIPVisionEncoderWithProjection(model_name_or_path, pretrained=True, use_all_token_embeddings=False, patch_dropout_rate=0.0, patch_dropout_shuffle=False, patch_dropout_bias=None, freeze_layers=False, freeze_layer_norm=True, peft_config=None, model_config_kwargs=None)[source]¶
Bases:
Module
Wrapper around the
CLIPVisionModelWithProjection
class from HuggingFace.- Parameters:
model_name_or_path (str) – The huggingface model name or a local path from which to load the model.
pretrained (bool, default=True) – Whether to load the pretrained weights or not.
use_all_token_embeddings (bool, default=False) – Whether to use all token embeddings for the text. If
False
the first token embedding will be used.freeze_layers (Union[int, float, list[int], bool], default=False) – Whether to freeze layers of the model and which layers to freeze. If
True
, all model layers are frozen. If it is an integer, the firstN` layers of the model are frozen. If it is a float, the first ``N
percent of the layers are frozen. If it is a list of integers, the layers at the indices in the list are frozen.freeze_layer_norm (bool, default=True) – Whether to freeze the layer normalization layers of the model.
patch_dropout_rate (float, default=0.0) – The proportion of patch embeddings to drop out.
patch_dropout_shuffle (bool, default=False) – Whether to shuffle the patches while applying patch dropout.
patch_dropout_bias (float, optional, default=None) – The bias to apply to the patch dropout mask.
peft_config (Optional[PeftConfig], optional, default=None) –
The configuration from the peft library to use to wrap the model for parameter-efficient finetuning.
model_config_kwargs (dict[str, Any], optional, default=None) – Additional keyword arguments to pass to the model configuration.
- Warns:
UserWarning – If both
peft_config
andfreeze_layers
are set. Thepeft_config
will override thefreeze_layers
setting.