Third-party Model
PaddleTS allows users to implement time series models based on third party models and verify the feasibility and performance efficiently.
scikit-learn is currently the only supported third-party library for PaddleTS.
1. Make Time Series Model Based On Third-party Model
PaddleTS provides make_ml_model interface that allows users to build time series models by simply specifying a third party model class and relevant parameters without extra development.
1.1 Minimal Example
Below is an example of how to make time series models based on sklearn.neighbors.KNeighborsRegressor .
from paddlets.datasets.repository import get_dataset
from paddlets.models.forecasting.ml.ml_model_wrapper import make_ml_model
from sklearn.neighbors import KNeighborsRegressor
# prepare data
paddlets_ds = get_dataset("UNI_WTH")
# make model based on sklearn.neighbors.KNeighborsRegressor
model = make_ml_model(
in_chunk_len=3,
out_chunk_len=1,
model_class=KNeighborsRegressor
)
# fit
model.fit(train_data=paddlets_ds)
# predict
predicted_ds = model.predict(paddlets_ds)
# WetBulbCelsius
# 2014-01-01 -1.72
1.2 Convert MLDataLoader to Trainable / Predictable ndarray
The third-party library such as scikit-learn usually accepts numpy.ndarray data as fit and predict method inputs, while PaddleTS uses paddlets.models.forecasting.ml.adapter.ml_dataloader.MLDataLoader to represent trainable / predictable time series data. Thus, make_ml_model provides 2 optional arguments udf_ml_dataloader_to_fit_ndarray and udf_ml_dataloader_to_predict_ndarray allow users to convert MLDataLoader to an numpy.ndarray object.
By default, make_ml_model uses default_ml_dataloader_to_fit_ndarray and default_ml_dataloader_to_predict_ndarray to convert MLDataLoader to numpy.ndarray for fit and predict method, respectively. Also, users are able to develop user-defined convert functions to get expected trainable / predictable output.
from paddlets.datasets.repository import get_dataset
from paddlets.models.forecasting.ml.adapter.ml_dataloader import MLDataLoader
from paddlets.models.forecasting.ml.ml_model_wrapper import make_ml_model
from sklearn.neighbors import KNeighborsRegressor
# prepare data
paddlets_ds = get_dataset("UNI_WTH")
# develop user-defined convert functions
def udf_ml_dataloader_to_fit_ndarray(
ml_dataloader: MLDataLoader,
model_init_params: Dict[str, Any],
in_chunk_len: int,
skip_chunk_len: int,
out_chunk_len: int
):
# build and return converted numpy.ndarray object that sklearn model fit method accepts.
pass
def udf_ml_dataloader_to_predict_ndarray(
ml_dataloader: MLDataLoader,
model_init_params: Dict[str, Any],
in_chunk_len: int,
skip_chunk_len: int,
out_chunk_len: int
):
# build and return converted numpy.ndarray object that sklearn model predict method accepts.
pass
# pass the above 2 udf arguments to make_ml_model
model = make_ml_model(
in_chunk_len=3,
out_chunk_len=1,
model_class=KNeighborsRegressor,
udf_ml_dataloader_to_fit_ndarray=udf_ml_dataloader_to_fit_ndarray,
udf_ml_dataloader_to_fit_ndarray=udf_ml_dataloader_to_predict_ndarray
)
# fit
model.fit(train_data=paddlets_ds)
# predict
predicted_ds = model.predict(paddlets_ds)
2. Multi-step forecasting
The time series models also support multi-timestep forecasting by calling recursive_predict .
from paddlets.datasets.repository import get_dataset
from paddlets.models.forecasting.ml.ml_model_wrapper import make_ml_model
# prepare data
paddlets_ds = get_dataset("UNI_WTH")
# make model
model = make_ml_model(
in_chunk_len=3,
out_chunk_len=1,
model_class=KNeighborsRegressor
)
# fit
model.fit(train_data=paddlets_ds)
# recursively predict
recursively_predicted_ds = model.recursive_predict(tsdataset=paddlets_ds, predict_length=4)
# WetBulbCelsius
# 2014-01-01 00:00:00 -1.72
# 2014-01-01 01:00:00 -1.88
# 2014-01-01 02:00:00 -2.18
# 2014-01-01 03:00:00 -2.44