Get Started

1. Install PaddleTS

PaddleTS is built upon PaddlePaddle and the minimum version required is 2.3. Please refer to the official guide for installing PaddlePaddle. If you have PaddlePaddle installed already, please skip this step.

Next, install PaddleTS with pip command:

pip install paddlets

Besides, we can install PaddlePaddle and PaddleTS at the same time with pip command:

pip install paddlets[all]

Once installed successfully, you can import PaddleTS in your code:

import paddlets
print(paddlets.__version__)

2. Build TSDataset

TSDataset is one of the fundamental classes in PaddleTS, which is designed as the first-class citizen in the library to represent the time series data and is widely used by other modules. Currently, it supports the representation of :

Univariate time series w/wo covariates
Multivariate time series w/wo covariates

TSDataset needs a proper time index which can either be of type pandas.DatetimeIndex or of type pandas.RangeIndex (representing sequential data without specific timestamps).

2.1. Built-in TSDataset

PaddleTS comes with a set of publicly available time series datasets, which can be easily accessed as TSDataset objects.

from paddlets.datasets.repository import get_dataset, dataset_list
print(f"built-in datasets: {dataset_list()}")

#built-in datasets: ['UNI_WTH', 'ETTh1', 'ETTm1', 'ECL', 'WTH']

Simply call get_dataset to access a built-in dataset, which will return a TSDataset object built from the dataset. The UNI_WTH dataset is a univariate dataset containing weather from 2010 to 2014, where WetBulbCelsuis represents the wet bulb temperature.

dataset = get_dataset('UNI_WTH')
print(type(dataset))

#<class 'paddlets.datasets.tsdataset.TSDataset'>

dataset.plot()

UNI_WTH

2.2. Customized TSDataset

One can also build a TSDataset from a pandas.DataFrame or a CSV file.

import pandas as pd
import numpy as np
from paddlets import TSDataset

x = np.linspace(-np.pi, np.pi, 200)
sinx = np.sin(x) * 4 + np.random.randn(200)

df = pd.DataFrame(
    {
        'time_col': pd.date_range('2022-01-01', periods=200, freq='1h'),
        'value': sinx
    }
)
custom_dataset = TSDataset.load_from_dataframe(
    df,  #Also can be path to the CSV file
    time_col='time_col',
    target_cols='value',
    freq='1h'
)
custom_dataset.plot()

CUS_DATASET_2

To learn more about the TSDataset, refer to Dataset

3. Explore Data

To get a brief overview, simply call TSDataset.summary.

dataset.summary()

From the above output, we can have a broad picture about the dataset. Particularly, when the missing is not zero, it’s usually necessary to fill the missing values before feeding the dataset to a model.

We can also do some advanced analysis about the dataset with the functionalities from the analysis module. For example, we can perform FFT on a column of the dataset as shown below.

#FFT
from paddlets.analysis import FFT
fft = FFT()
res = fft(dataset, columns='WetBulbCelsius')
fft.plot()

To learn more about the FFT, refer to Analysis

4. Train Model and Make Forecasting

This section shows how to train a deep neural network model for time series forecasting and how to make forecasting using the trained model.

4.1. Create the training, validation, and testing datasets

train_dataset, val_test_dataset = dataset.split(0.7)
val_dataset, test_dataset = val_test_dataset.split(0.5)
train_dataset.plot(add_data=[val_dataset,test_dataset], labels=['Val', 'Test'])

T_V_T

4.2. Train the model

We will use the built-in MLPRegressor model as an example to show how to train model.

Initialize a MLPRegressor instance with two required parameters:

in_chunk_len: the size of loopback window, i.e. the number of time steps feed to the model
out_chunk_len: the size of forecasting horizon, i..e. the number of time steps output by the model

There are also optional parameters when initializing the MLPRegressor instance., such as max_epochs, optimizer_params, etc.

from paddlets.models.forecasting import MLPRegressor
mlp = MLPRegressor(
    in_chunk_len = 7 * 24,
    out_chunk_len = 24,
    max_epochs=100
)

Now, we can train the model with train_dataset and optional val_dataset.

mlp.fit(train_dataset, val_dataset)

To learn more about the MLPRegressor, refer to Models

4.3. Make Forecasting

Next we can make forecasting using the trained model, and the length of predicted results is equal to out_chunk_len.

subset_test_pred_dataset = mlp.predict(val_dataset)
subset_test_pred_dataset.plot()

PRED_SUBSET

Plot the predicted results and ground-truth values for comparison.

subset_test_dataset, _ = test_dataset.split(len(subset_test_pred_dataset.target))
subset_test_dataset.plot(add_data=subset_test_pred_dataset, labels=['Pred'])

PRED_TRUE_SUBSET

Consider the case where the expected prediction length is longer than the forecasting horizong of the fitted model (i.e. out_chunk_len), we can call recursive_predict to fulfill this task. As an example, given the previously loaded UNI_WTH dataset, suppose we want to forecast the WetBulbCelsuis of the future 96 hours, while the forecasting horizong of the fitted model is set to 24 hours. The following code example illustrates the usage of recurive_predict regarding this case:

subset_test_pred_dataset = mlp.recursive_predict(val_dataset, 24 * 4)
subset_test_dataset, _ = test_dataset.split(len(subset_test_pred_dataset.target))
subset_test_dataset.plot(add_data=subset_test_pred_dataset, labels=['Pred'])

LONG_PRED_TRUE_SUBSET

5. Evaluation and Backtest

In addition to visually compare the predicted results and the ground-truth values, we can also evaluate the performance of the model by computing some built-in metrics.

from paddlets.metrics import MAE
mae = MAE()
mae(subset_test_dataset, subset_test_pred_dataset)

#{'WetBulbCelsius': 0.6734366664042076}

We can also evaluate the performance of the model on the whole test dataset by calling backtest.

from paddlets.utils import backtest
metrics_score = backtest(
    data=val_test_dataset,
    model=mlp,
    start=0.5,
    predict_window=24,
    stride=24,
    metric=mae
)
print(f"mae: {metrics_score}")

#mae: 1.3767653357878213

To learn more about the backtest, refer to backtest

6. Covariates

In addition to the univariate or multivariate target time series, PaddleTS also allows user to supply more contextual information in the form of covariates.

Covariates can be one of the following 3 types:

known_covariate Variables which can be forecasted for future time steps, e.g. weather forecasts
observed_covariate Variable which can only be observed in historical data, e.g. measured temperatures
static_covariate Constant variables

6.2. Customized Covariates

We can also build a TSDataset with only covariates from a pandas.DataFrame or a CSV file.

import pandas as pd
from paddlets import TSDataset
df = pd.DataFrame(
    {
        'time_col': pd.date_range(
            dataset.target.time_index[0],
            periods=len(dataset.target),
            freq=dataset.freq
        ),
        'cov1': [i for i in range(len(dataset.target))]
    }
)
dataset_cus_cov = TSDataset.load_from_dataframe(
    df,
    time_col='time_col',
    known_cov_cols='cov1',
    freq=dataset.freq
)
print(dataset_cus_cov)

COV

Then we can concatenate this TSDataset with an existing TSDataset to produce a new TSDataset with both the target and covariate time series.

dataset_cus_target_cov = TSDataset.concat([dataset, dataset_cus_cov])
print(dataset_cus_target_cov)

COVTARGET

7. Train Model with Covariates

Take RNNBlockRegressor as an example to show how to build a model using TSDataset with covariates.

from paddlets.models.forecasting import RNNBlockRegressor
rnn_reg = RNNBlockRegressor(
    in_chunk_len = 7 * 24,
    out_chunk_len = 24,
    skip_chunk_len = 0,
    sampling_stride = 24,
    max_epochs = 100
)

Create the training, validation and testing dataset:

train_dataset, val_test_dataset = dataset_gen_target_cov.split(0.8)
val_dataset, test_dataset = val_test_dataset.split(0.5)

Normalize the dataset by StandardScaler from paddlets.transform:

from paddlets.transform import StandardScaler
scaler = StandardScaler()
scaler.fit(train_dataset)
train_dataset_scaled = scaler.transform(train_dataset)
val_test_dataset_scaled = scaler.transform(val_test_dataset)
val_dataset_scaled = scaler.transform(val_dataset)
test_dataset_scaled = scaler.transform(test_dataset)

Now, we can fit the model and evaluate the performance:

rnn_reg.fit(train_dataset_scaled, val_dataset_scaled)

from paddlets.utils import backtest
metrics_score = backtest(
    data=val_test_dataset_scaled,
    model=rnn_reg,
    start=0.5,
    predict_window=24,
    stride=24,
    metric=mae
)
print(f"mae: {metrics_score}")

#mae: 0.3021404146482557

8. Pipeline

Let’s wrap up everything from the previous sections into a pipeline to create an end-to-end solution.

from paddlets.pipeline import Pipeline

train_dataset, val_test_dataset = dataset.split(0.8)
val_dataset, test_dataset = val_test_dataset.split(0.5)

Here we initialize a Pipeline instance to accommodate the date-related covariate generation, data normalization, and model training.

pipe = Pipeline([
    (TimeFeatureGenerator, {"feature_cols": ['dayofyear', 'weekofyear', 'is_workday'], "extend_points": 24}),
    (StandardScaler, {}),
    (RNNBlockRegressor, {
        "in_chunk_len": 7 * 24,
        "out_chunk_len": 24,
        "skip_chunk_len": 0,
        "sampling_stride": 24,
        "max_epochs": 100
    })
])

Next, we can fit the pipeline and evaluate the performance:

pipe.fit(train_dataset, val_dataset)

from paddlets.utils import backtest
metrics_score = backtest(
    data=val_test_dataset,
    model=pipe,
    start=0.5,
    predict_window=24,
    stride=24,
    metric=mae
)
print(f"mae: {metrics_score}")

#mae: 4.992150762390378

To learn more about the Pipeline, refer to Pipeline

9. AutoTS

AutoTS is an automated machine learning tool for PaddleTS.

It frees the user from selecting hyperparameters for PaddleTS models or PaddleTS pipelines.

from paddlets.automl.autots import AutoTS
from paddlets.models.forecasting import MLPRegressor
from paddlets.datasets.repository import get_dataset
tsdataset = get_dataset("UNI_WTH")

Here we initialize an AutoTS model with MLPRegressor, while its in_chunk_len is 96 and out_chunk_len is 2.

autots_model = AutoTS(MLPRegressor, 96, 2)

Next, we can train the AutoTS model and use it to make predictions, just like a PaddleTS model.

AutoTS has a built-in recommended search space for the PaddleTS models, so this MLPRegressor performs hyperparameter optimization in the default built-in search space and uses the best parameters found during the optimization process to fit the MLPRegressor.

autots_model.fit(tsdataset)
predicted_tsdataset = autots_model.predict(tsdataset)

AutoTS also allows us to obtain the best parameters found during the optimization process.

best_param = autots_model.best_param

To learn more about the AutoTS, refer to AutoTS