mmlearn.modules.encoders.vision.TimmViT¶
- class TimmViT(model_name, modality='RGB', projection_dim=768, pretrained=True, freeze_layers=False, freeze_layer_norm=True, peft_config=None, model_kwargs=None)[source]¶
Bases:
Module
Vision Transformer model from timm.
- Parameters:
model_name (str) – The name of the model to use.
modality (str, default="RGB") – The modality of the input data. This allows this model to be used with different image modalities e.g. RGB, Depth, etc.
projection_dim (int, default=768) – The dimension of the projection head.
pretrained (bool, default=True) – Whether to use the pretrained weights.
freeze_layers (Union[int, float, list[int], bool], default=False) – Whether to freeze the layers.
freeze_layer_norm (bool, default=True) – Whether to freeze the layer norm.
peft_config (Optional[PeftConfig], optional, default=None) – The configuration from the peft library to use to wrap the model for parameter-efficient finetuning.
model_kwargs (Optional[dict[str, Any]], default=None) – Additional keyword arguments for the model.
Methods
Attributes