Third-Party And User-Define Transform
1. Third-Party Transform
(The current version supports sklearn style
dataset transformations)
Paddlets introduces the make_ts_transform
function, which can wrap the universal third-party data transformation modules into paddlets easily.
from paddlets.datasets.repository import get_dataset
dataset = get_dataset('UNI_WTH')
print(dataset)
from paddlets.transform import make_ts_transform
from sklearn.preprocessing import MaxAbsScaler
ts_max_abs_scaler = make_ts_transform(
MaxAbsScaler,
drop_origin_columns=True,
per_col_transform=True
)
dataset = ts_max_abs_scaler.fit_transform(dataset)
print(dataset)
As shown in the above example, one can instantiate the MaxAbsScaler
module in sklearn
into a ts_max_abs_scaler
object through make_ts_transform
.
make_ts_transform
realizes automatic encapsulation of MaxAbsScaler
object, which support the trasnfrom of time-series data while retaining its core functions.
The interfaces of ts_max_abs_scaler
consumes a TSDataset and produces another TSDataset.
We can use the transformations object which build by make_ts_transform
just like using the built-in data transformations module, it has the same interface and functions as the built-in module.
Below example illustrates the usage in pipeline.
from paddlets.transform import KSigma, TimeFeatureGenerator
transfrom_list = [
KSigma("cols":["observed_a", "observed_b", "known_c", "known_d"], "k": 1),
TimeFeatureGenerator(),
ts_max_abs_scaler
]
for transformer in transfrom_list:
dataset = transformer.fit_transform(dataset)
2. User-Define Transform
make_ts_transform
function can also wrap the user-define data transformations modules into paddlets.
from sklearn.preprocessing import MaxAbsScaler
class MyMaxAbsSclaer(MaxAbsScaler):
def fit(self, dataset):
print("MyMaxAbsSclaer fit start!")
return super().fit(dataset)
ts_my_max_abs_scaler = make_ts_transform(
MyMaxAbsSclaer,
drop_origin_columns=True,
per_col_transform=True
)
dataset = ts_my_max_abs_scaler.fit_transform(dataset)
print(dataset)
make_ts_transform
function can also wrap universal third-party or the user-define data transformations modules into paddlets.
For third-party modules, not all data transformations modules are suitable for processing time-series data; at the same time, user-define modules is even more uncontrollable.
So make_ts_transform
includes the logic to check the legitimacy of its encapsulated functions. It will check the legitimacy based on the input dataset and output dataset,
so as to avoid the import of some illegal modules leading to abnormalities in subsequent processes, which increases the difficulty of troubleshooting.