mmlearn.modules.layers.attention.Attention¶
- class Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶
Bases:
Module
Multi-head Self-Attention Mechanism.
- Parameters:
dim (int) – Number of input dimensions.
num_heads (int, optional, default=8) – Number of attention heads.
qkv_bias (bool, optional, default=False) – If True, adds a learnable bias to the query, key, value projections.
qk_scale (Optional[float], optional, default=None) – Override the default scale factor for the dot-product attention.
attn_drop (float, optional, default=0.0) – Dropout probability for the attention weights.
proj_drop (float, optional, default=0.0) – Dropout probability for the output of the attention layer.
Methods
Attributes
- forward(x)[source]¶
Forward pass through the multi-head self-attention module.
- Parameters:
x (torch.Tensor) – Input tensor of shape
(batch_sz, seq_len, dim)
.- Returns:
The output tensor and the attention weights.
- Return type: