mmlearn.modules.layers.attention.Attention

class Attention(dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]

Bases: Module

Multi-head Self-Attention Mechanism.

Parameters:
  • dim (int) – Number of input dimensions.

  • num_heads (int, optional, default=8) – Number of attention heads.

  • qkv_bias (bool, optional, default=False) – If True, adds a learnable bias to the query, key, value projections.

  • qk_scale (Optional[float], optional, default=None) – Override the default scale factor for the dot-product attention.

  • attn_drop (float, optional, default=0.0) – Dropout probability for the attention weights.

  • proj_drop (float, optional, default=0.0) – Dropout probability for the output of the attention layer.

Methods

Attributes

forward(x)[source]

Forward pass through the multi-head self-attention module.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_sz, seq_len, dim).

Returns:

The output tensor and the attention weights.

Return type:

tuple[torch.Tensor, torch.Tensor]