mmlearn.modules.layers.transformer_block.Block¶

class Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=<class 'torch.nn.modules.activation.GELU'>, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>)[source]¶

Bases: Module

Transformer Block.

This module represents a Transformer block that includes self-attention, normalization layers, and a feedforward multi-layer perceptron (MLP) network.

Parameters:

dim (int) – The input and output dimension of the block.
num_heads (int) – Number of attention heads.
mlp_ratio (float, optional, default=4.0) – Ratio of hidden dimension to the input dimension in the MLP.
qkv_bias (bool, optional, default=False) – If True, add a learnable bias to the query, key, value projections.
qk_scale (Optional[float], optional, default=None) – Override default qk scale of head_dim ** -0.5 if set.
drop (float, optional, default=0.0) – Dropout probability for the output of attention and MLP layers.
attn_drop (float, optional, default=0.0) – Dropout probability for the attention scores.
drop_path (float, optional, default=0.0) – Stochastic depth rate, a form of layer dropout.
act_layer (Callable[..., torch.nn.Module], optional, default=nn.GELU) – Activation layer in the MLP.
norm_layer (Callable[..., torch.nn.Module], optional, default=torch.nn.LayerNorm) – Normalization layer.

Methods

Attributes

forward(x, return_attention=False)[source]¶

Forward pass through the Transformer Block.

Return type:: Tensor