paddlets.automl.autots
- class AutoTS(estimator: Union[str, Type[BaseModel], List[Union[str, Type[BaseTransform], Type[BaseModel]]]], in_chunk_len: int, out_chunk_len: int, skip_chunk_len: int = 0, sampling_stride: int = 1, search_space: Union[str, dict] = 'auto', search_alg: str = 'auto', resampling_strategy: str = 'auto', split_ratio: Union[str, float] = 'auto', k_fold: Union[str, int] = 'auto', metric: str = 'auto', mode: str = 'auto', refit: bool = True, ensemble: bool = False, local_dir: Optional[str] = None, n_jobs: int = - 1, verbose: int = 4)[source]
Bases:
BaseModelThe AutoTS Class. AutoTS is an automated machine learning tool for PaddleTS. It frees the user from selecting hyperparameters for PaddleTS models or PaddleTS pipelines.
- Parameters
estimator (Union[str, Type[BaseModel], List[Union[str, Type[BaseTransform], Type[BaseModel]]]]) – A class of a paddlets model or a list of classes consisting of several paddlets transformers and a paddlets model
in_chunk_len (int) – The size of the loopback window, i.e., the number of time steps feed to the model.
out_chunk_len (int) – The size of the forecasting horizon, i.e., the number of time steps output by the model.
skip_chunk_len (int) – Optional, the number of time steps between in_chunk and out_chunk for a single sample. The skip chunk is neither used as a feature (i.e. X) nor a label (i.e. Y) for a single sample. By default, it will NOT skip any time steps.
sampling_stride (int) – Sampling intervals between two adjacent samples.
search_space (Union[str, dict]) – The domain of the automl to be optimized. If search_space is ‘auto’, the default search space will be used.
search_alg (str) – The algorithm for optimization. Supported algorithms are “auto”, “Random”, “CMAES”, “TPE”, “CFO”, “BlendSearch”, “Bayes”. When the algorithm is “auto”, search_alg is set to “TPE” based on experimental experiences.
resampling_strategy (str) – A string of resampling strategies. Supported resampling strategy are “auto”, “cv”, “holdout”.When the strategy is “auto”, resampling_strategy is set to “holdout” and split_ratio is set to DEFAULT_SPLIT_RATIO by default.
split_ratio (Union[str, float]) – The proportion of the dataset included in the validation split for holdout. The split_ratio should be in the range of (0, 1). When the split_ratio is “auto”, split_ratio is set to DEFAULT_SPLIT_RATIO by default. Note that the split_ratio will be ignored if valid_tsdataset is provided in the AutoTS.fit().
k_fold (Union[str, int]) – Number of folds for cv. The k_fold should be in the range of (0, 10].When the k_fold is “auto”, k_fold is set to DEFAULT_K_FOLD by default. Note that the k_fold will be ignored if valid_tsdataset is provided in the AutoTS.fit().
metric (str) – A string of the metric name. The specified metric will be used to calculate validation loss reported to the search_algo. Supported metric are “mae”, “mse”, “logloss”. When the metric is “auto”, metric is set to “mae” by default.
mode (str) – According to the mode, the metric is maximized or minimized. Supported mode are “min”, “max”. When the mode is “auto”, metric is set to “min” by default.
refit (bool) – Whether to refit the model with the best parameter on full training data.If refit is True, the AutoTS object can be used to predict. If refit is False, the AutoTS object can be used to get the best parameter, but can not make predictions.
local_dir (str) – Local dir to save training results and log to. Defaults to ./.
ensemble (bool) – Not supported yet. This feature will be comming in future.
n_jobs (int) – Not supported yet. This feature will be comming in future.
verbose (int) – Not supported yet. This feature will be comming in future.
Examples
>>> from paddlets.automl.autots import AutoTS >>> from paddlets.models.forecasting import MLPRegressor >>> from paddlets.datasets.repository import get_dataset >>> tsdataset = get_dataset("UNI_WTH") >>> autots_model = AutoTS(MLPRegressor, 96, 2) >>> autots_model.fit(tsdataset) >>> predicted_tsdataset = autots_model.predict(tsdataset) >>> best_param = autots_model.best_param
- fit(train_tsdataset: Union[TSDataset, List[TSDataset]], valid_tsdataset: Union[TSDataset, List[TSDataset]] = None, n_trials: int = 20, cpu_resource: float = 1.0, gpu_resource: float = 0, max_concurrent_trials: int = 1)[source]
Fit the estimator with the given tsdataset. The way fit is done is that the search algorithm will suggest configurations from the hyperparameter search space, then choose the best parameter from all configurations. If refit is True, the fit() will refit the model with the best parameters on full training data.
- Parameters
train_tsdataset (Union[TSDataset, List[TSDataset]]) – Train dataset.
valid_tsdataset (Union[TSDataset, List[TSDataset]], optional) – Valid dataset.
n_trials (int) – The number of configurations suggested by the search algorithm.
cpu_resource (float) – CPU resources to allocate per trial.
gpu_resource (float) – GPU resources to allocate per trial. Note that GPUs will not be assigned if you do not specify them here.
max_concurrent_trials (int) – The maximum number of trials running concurrently.
- Returns
Refitted estimator.
- Return type
- predict(tsdataset: TSDataset) TSDataset[source]
Make prediction.
- Parameters
tsdataset – Data to be predicted.
- Returns
Predicted results of calling self.predict on the refitted estimator.
- Return type
- property best_param
Return the best parameters in optimization.
- Returns
The dict of the best parameters.
- Return type
Dict
- best_estimator()[source]
Return the best_estimator in optimization.
- Returns
The best_estimator in optimization.
- Return type
estimator