Skip to content

API

transformers_lab.feed_forward

Feed forward neural network module.

FeedForward

Position-wise Feed Forward Network used in Transformer.

__init__(d_model, d_ff, init_weight_fn=xavier_init)

Initialize the feed-forward module.

Parameters

d_model : int Model dimensionality d_ff : int Hidden layer dimensionality init_weight_fn : Callable[[int, int], np.ndarray] Weight initialization function. Defaults to xavier_init.

forward(x)

Run forward pass.

Parameters

x : np.ndarray Shape (seq_len, d_model)

Returns:

np.ndarray Shape (seq_len, d_model)

relu(x)

ReLU activation function.

xavier_init(n1, n2, n_heads=None)

Xavier Glorot initialization for weight matrices.

Parameters

n1 : int Input dimension. n2 : int Output dimension. n_heads : int, optional If provided, returns shape (n_heads, n1, n2). If None, returns shape (n1, n2).

Returns:

np.ndarray Shape (n_heads, n1, n2) or (n1, n2).

transformers_lab.layer_norm

layer normalisation implementation.

LayerNorm

Layer normalization implementation.

__call__(x)

Apply layer normalization.

Parameters

x : np.ndarray Input tensor of shape (..., d_model).

Returns:

np.ndarray Normalized tensor with the same shape as input.

__init__(d_model, eps=1e-06)

Initialize the LayerNorm module.

Parameters

d_model : int Dimensionality of the input. eps : float, optional Value added to the denominator for numerical stability. Default is 1e-6.

Attributes:

gamma : np.ndarray Scale parameter of shape (d_model,). beta : np.ndarray Shift parameter of shape (d_model,). eps : float Numerical stability constant.

transformers_lab.multihead_attention

Compute multi-head-attention.

multi_head_attention(x, w_q, w_k, w_v, w_o, n_heads, x_cross=None, mask=None)

Computes multi-head attention.

Parameters

x : np.ndarray Input tensor of shape (seq_len, d_model)

np.ndarray

Query weights of shape (n_heads, d_model, d_k)

np.ndarray

Key weights of shape (n_heads, d_model, d_k)

np.ndarray

Value weights of shape (n_heads, d_model, d_k)

np.ndarray

Output projection matrix of shape (d_model, d_model)

int

Number of attention heads

np.ndarray, optional

Encoder output of shape (src_seq_len, d_model). If provided, keys and values come from x_cross (cross-attention). If None, keys and values come from x (self-attention).

mask : np.ndarray, optional Mask of shape (seq_len, seq_len). Use make_causal_mask() for masked self-attention in the decoder.

Returns:

np.ndarray Output tensor of shape (seq_len, d_model)

scaled_dot_product_attention(q, k, v, mask=None)

Compute scaled dot-product attention.

Parameters

q : np.ndarray Query matrix of shape (seq_len, d_k)

np.ndarray

Key matrix of shape (seq_len, d_k)

np.ndarray

Value matrix of shape (seq_len, d_k)

mask: np.ndarray or None Mask to apply during attention computation. Must be of shape (seq_len, seq_len)

Returns:

np.ndarray Output matrix of shape (seq_len, d_k)

transformers_lab.positional_encoding

Positional encoding.

sinusoidal_positional_encoding(seq_len, d_model)

Compute sinusoidal positional encoding.

Parameters

seq_len : int Length of the input sequence

int

Dimensionality of the model

Returns:

np.ndarray Positional encoding matrix of shape (seq_len, d_model)

transformers_lab.self_attention

Compute self-attention.

scaled_dot_product_attention(q, k, v, mask=None)

Compute scaled dot-product attention.

Parameters

q : np.ndarray Query matrix of shape (seq_len, d_k)

np.ndarray

Key matrix of shape (seq_len, d_k)

np.ndarray

Value matrix of shape (seq_len, d_k)

mask: np.ndarray or None Mask to apply during attention computation. Must be of shape (seq_len, seq_len)

Returns:

np.ndarray Output matrix of shape (seq_len, d_k)

self_attention(x, w_q, w_k, w_v, mask=None)

Compute single-head self-attention.

Parameters

x : np.ndarray Input embeddings of shape (seq_len, d_model)

np.ndarray

Query projection matrix of shape (d_model, d_k)

np.ndarray

Key projection matrix of shape (d_model, d_k)

np.ndarray

Value projection matrix of shape (d_model, d_k)

np.ndarray, optional

Mask of shape (seq_len, seq_len). Defaults to None.

Returns:

np.ndarray Output of shape (seq_len, d_k)

transformers_lab.softmax

Softmax is used to!

  • transform arbitrary real-valued scores into probabilities
  • ensure all values are positive
  • ensure the probabilities sum to 1
  • normalize scores along a specified axis

In Transformers, softmax is used to convert similarity scores (attention scores) into attention weights.

softmax(x, axis=-1)

Compute the softmax function in a numerically stable way.

The softmax function transforms arbitrary real-valued scores into positive probabilities that sum to 1 along the specified axis.

In Transformers, it is commonly used to convert attention similarity scores into attention weights.

:param x: NumPy array containing the input scores. :param axis: Axis along which the normalization is applied. Defaults to the last axis. :return: NumPy array of the same shape as x containing the normalized probabilities.