Representation Model Tutorial

The representation model (TS2Vect) is one of the self-supervised models, mainly hoping to learn a general feature expression for downstream tasks. The current mainstream self-supervised learning mainly includes Generative-based and Contrastive-based methods, TS2Vect is a Self-Supervised Model Based on Contrastive Method

The use of self-supervised models is divided into two phases:

Pre-training with unlabeled data, independent of downstream tasks
Fine-tune on downstream tasks using labeled data

TS2Vect follows the usage paradigm of self-supervised models:

Representational model training
Use the output of the representation model for the downstream task (the downstream task of the current case is the prediction task)

A minimal example

Below minimal example uses a built-in TS2vect model to illustrate the basic usage.

1. The first stage:

Training of the representation model

Output of training set and test set representation results

1.1 Prepare the data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from paddlets.datasets import TimeSeries, TSDataset
from paddlets.models.representation.dl.ts2vec import TS2Vec
from paddlets.models.common.callbacks.callbacks import Callback
from paddlets.datasets.repository import get_dataset, dataset_list

# 1 prepare the data
# Single target prediction target_cols is one column, multi-target prediction target_cols is multiple columns
data = get_dataset('ETTh1') # target_cols: 'OT'
data, _ = data.split('2016-09-22 06:00:00')
train_data, test_data = data.split('2016-09-21 05:00:00')

1.2 Training of the representation model

# initialize the TS2Vect object
ts2vec = TS2Vec(
 segment_size=200, # maximum sequence length
 repr_dims=320,
 batch_size=32,
 max_epochs=20,
)

# training
ts2vec.fit(train_data)

1.3 Output of training set and test set representation results

# output_shape: [n_instance, n_timestamps, repr_dims]
# n_instance: number of instances
# n_timestamps: sequence length
# repr_dims: the representation dimension
sliding_len = 100 # Use past sliding_len length points to infer the representation of the current point in time
all_reprs = ts2vec.encode(data, sliding_len=sliding_len)
split_tag = len(train_data['OT'])
train_reprs = all_reprs[:, :split_tag]
test_reprs = all_reprs[:, split_tag:]

2. The second stage:

Build training and test samples for regression models

training and prediction

2.1 Build training and test samples for regression models

# generate samples
def generate_pred_samples(features, data, pred_len, drop=0):
    n = data.shape[1]
    features = features[:, :-pred_len]
    labels = np.stack([ data[:, i:1+n+i-pred_len] for i in range(pred_len)], axis=2)[:, 1:]
    features = features[:, drop:]
    labels = labels[:, drop:]
    return features.reshape(-1, features.shape[-1]), \
             labels.reshape(-1, labels.shape[2]*labels.shape[3])

pre_len = 24 # prediction lengths

# generate training samples
train_to_numpy = train_data.to_numpy()
train_to_numpy = np.expand_dims(train_to_numpy.T, -1) # keep the same dimensions as the encode output
train_features, train_labels = generate_pred_samples(train_reprs, train_to_numpy, pre_len, drop=sliding_len)

# generate test samples
test_to_numpy = test_data.to_numpy()
test_to_numpy = np.expand_dims(test_to_numpy.T, -1)
test_features, test_labels = generate_pred_samples(test_reprs, test_to_numpy, pre_len)

2.2 Training and prediction

# training
from sklearn.linear_model import Ridge
lr = Ridge(alpha=0.1)
lr.fit(train_features, train_labels)

# predict
test_pred = lr.predict(test_features)