mmlearn.modules.layers.attention.Attention¶

class Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]¶

Multi-head Self-Attention Mechanism.

Parameters:

dim (int) – Number of input dimensions.
num_heads (int, optional, default=8) – Number of attention heads.
qkv_bias (bool, optional, default=False) – If True, adds a learnable bias to the query, key, value projections.
qk_scale (Optional[float], optional, default=None) – Override the default scale factor for the dot-product attention.
attn_drop (float, optional, default=0.0) – Dropout probability for the attention weights.
proj_drop (float, optional, default=0.0) – Dropout probability for the output of the attention layer.

Methods

Attributes

Forward pass through the multi-head self-attention module.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_sz, seq_len, dim).
Returns:: The output tensor and the attention weights.
Return type:: tuple[torch.Tensor, torch.Tensor]