Representation Model Tutorial
The representation model is one of the self-supervised models, mainly hoping to learn a general feature expression for downstream tasks. The current mainstream self-supervised learning mainly includes Generative-based and Contrastive-based methods, TS2Vec is a Self-Supervised Model Based on Contrastive Method
- Currently Supported repr models:
- The use of self-supervised models is divided into two phases:
Pre-training with unlabeled data, independent of downstream tasks
Fine-tune on downstream tasks using labeled data
- representation model follows the usage paradigm of self-supervised models:
Representational model training
Use the output of the representation model for the downstream task (the downstream task of the current case is the prediction task)
- For the sake of accommodating both beginners and experienced developers, there are two ways to use representation tasks:
A pipeline that combines the representation model and downstream tasks, which is very friendly to beginners
Decoupling the representational model and downstream tasks, showing in detail how the representational model and downstream tasks are used in combination
1. Method one
A pipeline that combines the representation model and downstream tasks
- Currently Support:
1.1 Prepare the data
import numpy as np
np.random.seed(2022)
import pandas as pd
import paddle
paddle.seed(2022)
from paddlets.models.representation import TS2Vec
from paddlets.datasets.repository import get_dataset
from paddlets.models.representation import ReprForecasting
data = get_dataset('ETTh1')
data, _ = data.split('2016-09-22 06:00:00')
train_data, test_data = data.split('2016-09-21 05:00:00')
1.2 Training
More information about ReprForecasting please check ReprForecasting API doc
ts2vec_params = {"segment_size": 200,
"repr_dims": 320,
"batch_size": 32,
"sampling_stride": 200,
"max_epochs": 20}
model = ReprForecasting(in_chunk_len=200,
out_chunk_len=24,
sampling_stride=1,
repr_model=TS2Vec,
repr_model_params=ts2vec_params)
model.fit(train_data)
1.3 Prediction
model.predict(train_data)
1.4 Backtest
from paddlets.utils.backtest import backtest
score, predicts = backtest(
data,
model,
start="2016-09-21 06:00:00",
predict_window=24,
stride=24,
return_predicts=True)
1.5 Save and load
#save model
model.save(path="/tmp/rper_test/")
#load model
model = ReprForecasting.load(path="/tmp/rpr_test/")
1. Method two
Decoupling the representational model and downstream tasks. It’s divided into two stages, the first stage is representation model training and prediction, and the second stage is the training and prediction of downstream tasks
The first stage:
Training of the representation model
Output of training set and test set representation results
The second stage:
Build training and test samples for regression models
training and prediction
2.1 Prepare the data
import numpy as np
np.random.seed(2022)
import pandas as pd
import paddle
paddle.seed(2022)
from paddlets.models.representation.dl.ts2vec import TS2Vec
from paddlets.datasets.repository import get_dataset
data = get_dataset('ETTh1')
data, _ = data.split('2016-09-22 06:00:00')
train_data, test_data = data.split('2016-09-21 05:00:00')
2.2 Training of the representation model
# initialize the TS2Vect object
ts2vec = TS2Vec(
segment_size=200,
repr_dims=320,
batch_size=32,
max_epochs=20,
)
# training
ts2vec.fit(train_data)
2.3 Output of training set and test set representation results
sliding_len = 200 # Use past sliding_len length points to infer the representation of the current point in time
all_reprs = ts2vec.encode(data, sliding_len=sliding_len)
split_tag = len(train_data['OT'])
train_reprs = all_reprs[:, :split_tag]
test_reprs = all_reprs[:, split_tag:]
2.4 Build training and test samples for regression models
# generate samples
def generate_pred_samples(features, data, pred_len, drop=0):
n = data.shape[1]
features = features[:, :-pred_len]
labels = np.stack([ data[:, i:1+n+i-pred_len] for i in range(pred_len)], axis=2)[:, 1:]
features = features[:, drop:]
labels = labels[:, drop:]
return features.reshape(-1, features.shape[-1]), \
labels.reshape(-1, labels.shape[2]*labels.shape[3])
pre_len = 24 # prediction lengths
# generate training samples
train_to_numpy = train_data.to_numpy()
train_to_numpy = np.expand_dims(train_to_numpy, 0) # keep the same dimensions as the encode output
train_features, train_labels = generate_pred_samples(train_reprs, train_to_numpy, pre_len, drop=sliding_len)
# generate test samples
test_to_numpy = test_data.to_numpy()
test_to_numpy = np.expand_dims(test_to_numpy, 0)
test_features, test_labels = generate_pred_samples(test_reprs, test_to_numpy, pre_len)
2.5 Training and prediction
# training
from sklearn.linear_model import Ridge
lr = Ridge(alpha=0.1)
lr.fit(train_features, train_labels)
# predict
test_pred = lr.predict(test_features)