paddlets.models.forecasting.dl.transformer

class TransformerModel(in_chunk_len: int, out_chunk_len: int, skip_chunk_len: int = 0, sampling_stride: int = 1, loss_fn: ~typing.Callable[[...], ~paddle.Tensor] = <function mse_loss>, optimizer_fn: ~typing.Callable[[...], ~paddle.optimizer.optimizer.Optimizer] = <class 'paddle.optimizer.adam.Adam'>, optimizer_params: ~typing.Dict[str, ~typing.Any] = {'learning_rate': 0.001}, eval_metrics: ~typing.List[str] = [], callbacks: ~typing.List[~paddlets.models.common.callbacks.callbacks.Callback] = [], batch_size: int = 128, max_epochs: int = 10, verbose: int = 1, patience: int = 4, seed: ~typing.Union[None, int] = None, d_model: int = 8, nhead: int = 4, num_encoder_layers: int = 1, num_decoder_layers: int = 1, dim_feedforward: int = 64, activation: str = 'relu', dropout_rate: float = 0.1, custom_encoder: ~typing.Optional[~paddle.fluid.dygraph.layers.Layer] = None, custom_decoder: ~typing.Optional[~paddle.fluid.dygraph.layers.Layer] = None)[source]

Bases: PaddleBaseModelImpl

Transformer[1] is a state-of-the-art deep learning model introduced in 2017. It is an encoder-decoder architecture whose core feature is the multi-head attention mechanism, which is able to draw intra-dependencies within the input vector and within the output vector (self-attention) as well as inter-dependencies between input and output vectors (encoder-decoder attention).

[1] Vaswani A, et al. “Attention Is All You Need”, https://arxiv.org/abs/1706.03762

Parameters
  • in_chunk_len (int) – The size of the loopback window, i.e. the number of time steps feed to the model.

  • out_chunk_len (int) – The size of the forecasting horizon, i.e. the number of time steps output by the model.

  • skip_chunk_len (int) – Optional, the number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default it will NOT skip any time steps.

  • sampling_stride (int) – Sampling intervals between two adjacent samples.

  • loss_fn (Callable[..., paddle.Tensor]|None) – Loss function.

  • optimizer_fn (Callable[..., Optimizer]) – Optimizer algorithm.

  • optimizer_params (Dict[str, Any]) – Optimizer parameters.

  • eval_metrics (List[str]) – Evaluation metrics of model.

  • callbacks (List[Callback]) – Customized callback functions.

  • batch_size (int) – Number of samples per batch.

  • max_epochs (int) – Max epochs during training.

  • verbose (int) – Verbosity mode.

  • patience (int) – Number of epochs to wait for improvement before terminating.

  • seed (int|None) – Global random seed.

  • d_model (int) – The expected feature size for the input/output of the transformer’s encoder/decoder.

  • nhead (int) – The number of heads in the multi-head attention mechanism.

  • num_encoder_layers (int) – The number of encoder layers in the encoder.

  • num_decoder_layers (int) – The number of decoder layers in the decoder.

  • dim_feedforward (int) – The dimension of the feedforward network model.

  • activation (str) – The activation function of encoder/decoder intermediate layer, [“relu”, “gelu”] is optional.

  • dropout_rate (float) – Fraction of neurons affected by Dropout.

  • custom_encoder (paddle.nn.Layer|None) – A custom user-provided encoder module for the transformer.

  • custom_decoder (paddle.nn.Layer|None) – A custom user-provided decoder module for the transformer.

_in_chunk_len

The size of the loopback window, i.e. the number of time steps feed to the model.

Type

int

_out_chunk_len

The size of the forecasting horizon, i.e. the number of time steps output by the model.

Type

int

_skip_chunk_len

Optional, the number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default it will NOT skip any time steps.

Type

int

_sampling_stride

Sampling intervals between two adjacent samples.

Type

int

_loss_fn

Loss function.

Type

Callable[…, paddle.Tensor]|None

_optimizer_fn

Optimizer algorithm.

Type

Callable[…, Optimizer]

_optimizer_params

Optimizer parameters.

Type

Dict[str, Any]

_eval_metrics

Evaluation metrics of model.

Type

List[str]

_callbacks

Customized callback functions.

Type

List[Callback]

_batch_size

Number of samples per batch.

Type

int

_max_epochs

Max epochs during training.

Type

int

_verbose

Verbosity mode.

Type

int

_patience

Number of epochs to wait for improvement before terminating.

Type

int

_seed

Global random seed.

Type

int|None

_stop_training
Type

bool

_d_model

The expected feature size for the input/output of the transformer’s encoder/decoder.

Type

int

_nhead

The number of heads in the multi-head attention mechanism.

Type

int

_num_encoder_layers

The number of encoder layers in the encoder.

Type

int

_num_decoder_layers

The number of decoder layers in the decoder.

Type

int

_dim_feedforward

The dimension of the feedforward network model.

Type

int

_activation

The activation function of encoder/decoder intermediate layer. [“relu”, “gelu”] is optional.

Type

str

_dropout_rate

Fraction of neurons affected by Dropout.

Type

float

_custom_encoder

A custom user-provided encoder module for the transformer.

Type

paddle.nn.Layer|None

_custom_decoder

A custom user-provided decoder module for the transformer.

Type

paddle.nn.Layer|None