paddlets.models.ml_model_wrapper

class MLModelBaseWrapper(model_class: Type, in_chunk_len: int, out_chunk_len: int = 1, skip_chunk_len: int = 0, sampling_stride: int = 1, model_init_params: Optional[Dict[str, Any]] = None, fit_params: Optional[Dict[str, Any]] = None, predict_params: Optional[Dict[str, Any]] = None)[source]

Bases: MLBaseModel

Time series model base wrapper for third party models.

Parameters

model_class (Type) – Class type of the third party model.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model.
skip_chunk_len (int, optional) – The number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps.
sampling_stride (int, optional) – Time steps to stride over the i-th sample and (i+1)-th sample. More precisely, let t be the time index of target time series, t[i] be the start time of the i-th sample, t[i+1] be the start time of the (i+1)-th sample, then sampling_stride represents the result of t[i+1] - t[i].
model_init_params (Dict[str, Any]) – All params for initializing the third party model.
fit_params (Dict[str, Any], optional) – All params for fitting third party model except x_train / y_train.
predict_params (Dict[str, Any], optional) – All params for forecasting third party model except x_test / y_test.

class SklearnModelWrapper(model_class: Type, in_chunk_len: int, out_chunk_len: int, skip_chunk_len: int = 0, sampling_stride: int = 1, model_init_params: Optional[Dict[str, Any]] = None, fit_params: Optional[Dict[str, Any]] = None, predict_params: Optional[Dict[str, Any]] = None, udf_ml_dataloader_to_fit_ndarray: Optional[Callable] = None, udf_ml_dataloader_to_predict_ndarray: Optional[Callable] = None)[source]

Bases: MLModelBaseWrapper

Time series model wrapper for sklearn third party models.

Parameters

model_class (Type) – Class type of the third party model.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model.
skip_chunk_len (int, optional) – The number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps.
sampling_stride (int, optional) – Time steps to stride over the i-th sample and (i+1)-th sample. More precisely, let t be the time index of target time series, t[i] be the start time of the i-th sample, t[i+1] be the start time of the (i+1)-th sample, then sampling_stride represents the result of t[i+1] - t[i].
model_init_params (Dict[str, Any]) – All params for initializing the third party model.
fit_params (Dict[str, Any], optional) – All params for fitting third party model except x_train / y_train.
predict_params (Dict[str, Any], optional) – All params for forecasting third party model except x_test / y_test.
udf_ml_dataloader_to_fit_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by fit method of the third party model.
udf_ml_dataloader_to_predict_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by predict method of the third party model.

fit(train_data: TSDataset, valid_data: Optional[TSDataset] = None) → None[source]

Fit a machine learning model.

Parameters

train_data (TSDataset) – training dataset.
valid_data (TSDataset, optional) – validation dataset.

predict(tsdataset: TSDataset) → TSDataset[source]

Make prediction.

Parameters: tsdataset (TSDataset) – TSDataset to predict.
Returns: TSDataset with predictions.
Return type: TSDataset

default_sklearn_ml_dataloader_to_fit_ndarray(ml_dataloader: MLDataLoader, model_init_params: Dict[str, Any], in_chunk_len: int, skip_chunk_len: int, out_chunk_len: int) → Tuple[ndarray, Optional[ndarray]][source]

Default function for converting MLDataLoader to a numpy array that can be used for fitting the sklearn model.

Parameters

ml_dataloader (MLDataLoader) – MLDataLoader object to be converted.
model_init_params (Dict) – parameters when initializing sklearn models, possibly be used while converting.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model. Possibly be used while converting.
skip_chunk_len (int, optional) – The number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps. Possibly be used while converting.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model. Possibly be used while converting.

Returns

Converted numpy array. The first and second element in the tuple represent x_train and y_train, respectively.

Return type

Tuple[np.ndarray, Optional[np.ndarray]]

default_sklearn_ml_dataloader_to_predict_ndarray(ml_dataloader: MLDataLoader, model_init_params: Dict[str, Any], in_chunk_len: int, skip_chunk_len: int, out_chunk_len: int) → Tuple[ndarray, Optional[ndarray]][source]

Default function for converting MLDataLoader to a numpy array that can be predicted by the sklearn model.

Parameters

ml_dataloader (MLDataLoader) – MLDataLoader object to be converted.
model_init_params (Dict) – parameters when initializing sklearn models, possibly be used while converting.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model. Possibly be used while converting.
skip_chunk_len (int, optional) – The number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps. Possibly be used while converting.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model. Possibly be used while converting.

Returns

Converted numpy array. The first and second element in the tuple represent x and y, respectively, where y is optional.

Return type

Tuple[np.ndarray, Optional[np.ndarray]]

class PyodModelWrapper(model_class: Type, in_chunk_len: int, sampling_stride: int = 1, model_init_params: Optional[Dict[str, Any]] = None, predict_params: Optional[Dict[str, Any]] = None, udf_ml_dataloader_to_fit_ndarray: Optional[Callable] = None, udf_ml_dataloader_to_predict_ndarray: Optional[Callable] = None)[source]

Bases: MLModelBaseWrapper

Time series model wrapper for pyod third party models.

Parameters

model_class (Type) – Class type of the third party model.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model.
sampling_stride (int, optional) – Time steps to stride over the i-th sample and (i+1)-th sample. More precisely, let t be the time index of target time series, t[i] be the start time of the i-th sample, t[i+1] be the start time of the (i+1)-th sample, then sampling_stride represents the result of t[i+1] - t[i].
model_init_params (Dict[str, Any]) – All params for initializing the third party model.
predict_params (Dict[str, Any], optional) – All params for forecasting third party model except x_test / y_test.
udf_ml_dataloader_to_fit_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by fit method of the third party model.
udf_ml_dataloader_to_predict_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by predict method of the third party model.

predict_score(tsdataset: TSDataset) → ndarray[source]

Predict raw anomaly scores of tsdataset using the fitted model, outliers are assigned with higher scores.

Parameters: tsdataset (TSDataset) – The input samples for which will be computed.
Returns: numpy array of shape (n_samples,), the anomaly score of the input samples.
Return type: np.ndarray

fit(train_data: TSDataset, valid_data: Optional[TSDataset] = None) → None[source]

Fit a machine learning model.

Parameters

train_data (TSDataset) – training dataset.
valid_data (TSDataset, optional) – validation dataset. Not used, present for API consistency by convention.

predict(tsdataset: TSDataset) → TSDataset[source]

Make prediction.

Parameters: tsdataset (TSDataset) – TSDataset to predict.
Returns: TSDataset with predictions.
Return type: TSDataset

default_pyod_ml_dataloader_to_fit_ndarray(ml_dataloader: MLDataLoader, model_init_params: Dict[str, Any], in_chunk_len: int) → Tuple[ndarray, Optional[ndarray]][source]

Default function for converting MLDataLoader to a numpy array that can be used for fitting the pyod model.

In this method will remove in_chunk_len dimension for the passed data. The reason is that all models in pyod requires X.ndim must == (n_samples, n_features), where n_samples is identical to batch_size, n_features is identical to observed_cov_col_num (In paddlets context, we define n_samples as batch_size, define n_features as observed_cov_col_num for anomaly detection models). However, the samples built by data adapter are 3-dim ndarray with shape of (batch_size, in_chunk_len, observed_cov_col_num), thus needs to flatten (i.e. remove) the first dimension (i.e., batch_size) and make it a 2-dim array.

Parameters

ml_dataloader (MLDataLoader) – MLDataLoader object to be converted.
model_init_params (Dict) – parameters when initializing sklearn models, possibly be used while converting.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model. Possibly be used while converting.

Returns

Converted numpy array. The first and second element in the tuple represent x_train and y_train, respectively.

Return type

Tuple[np.ndarray, Optional[np.ndarray]]

default_pyod_ml_dataloader_to_predict_ndarray(ml_dataloader: MLDataLoader, model_init_params: Dict[str, Any], in_chunk_len: int) → Tuple[ndarray, Optional[ndarray]][source]

Default function for converting MLDataLoader to a numpy array that can be predicted by the pyod model.

Parameters

ml_dataloader (MLDataLoader) – MLDataLoader object to be converted.
model_init_params (Dict) – parameters when initializing sklearn models, possibly be used while converting.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model. Possibly be used while converting.

Returns

Converted numpy array. The first and second element in the tuple represent x and y, respectively, where y is optional.

Return type

Tuple[np.ndarray, Optional[np.ndarray]]

make_ml_model(model_class: Type, in_chunk_len: int, out_chunk_len: int = 1, skip_chunk_len: int = 0, sampling_stride: int = 1, model_init_params: Optional[Dict[str, Any]] = None, fit_params: Optional[Dict[str, Any]] = None, predict_params: Optional[Dict[str, Any]] = None, udf_ml_dataloader_to_fit_ndarray: Optional[Callable] = None, udf_ml_dataloader_to_predict_ndarray: Optional[Callable] = None) → MLModelBaseWrapper[source]

Make Wrapped time series model based on the third-party model.

Parameters

model_class (Type) – Class type of the third party model.
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model.
skip_chunk_len (int, optional) – The number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps.
sampling_stride (int, optional) – Time steps to stride over the i-th sample and (i+1)-th sample. More precisely, let t be the time index of target time series, t[i] be the start time of the i-th sample, t[i+1] be the start time of the (i+1)-th sample, then sampling_stride represents the result of t[i+1] - t[i].
model_init_params (Dict[str, Any]) – All params for initializing the third party model.
fit_params (Dict[str, Any], optional) – All params for fitting third party model except x_train / y_train.
predict_params (Dict[str, Any], optional) – All params for forecasting third party model except x_test / y_test.
udf_ml_dataloader_to_fit_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by fit method of the third party model. Any third party models that accept numpy array as fit inputs can use this function to build the data for training.
udf_ml_dataloader_to_predict_ndarray (Callable, optional) – User defined function for converting MLDataLoader object to a numpy.ndarray object that can be processed by predict method of the third party model. Any third-party models that accept numpy array as predict inputs can use this function to build the data for prediction.

Returns

Wrapped time series model wrapper object, currently support SklearnModelWrapper and PyodModelWrapper.

Return type

MLModelBaseWrapper