etna.auto.Auto#

class Auto(target_metric: BaseMetric, horizon: int, metric_aggregation: Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'] = 'mean', backtest_params: dict | None = None, experiment_folder: str | None = None, pool: Pool | PoolGenerator | List[BasePipeline] = Pool.no_freq_super_fast, generate_params: Dict[str, Any] | None = None, runner: AbstractRunner | None = None, storage: BaseStorage | None = None, metrics: List[BaseMetric] | None = None)[source]#

Bases: AutoBase

Automatic pipeline selection via defined or custom pipeline pool.

Note

This class requires auto extension to be installed. Read more about this at installation page.

Note

Class initialization could be slow due to downloading of pretrained models when using default pools.

Initialize Auto class.

Parameters:

target_metric (BaseMetric) – Metric to optimize.
horizon (int) – Horizon to forecast for.
metric_aggregation (Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics. By default, mean aggregation is used.
backtest_params (dict | None) – Custom parameters for backtest instead of default backtest parameters.
experiment_folder (str | None) – Name for saving experiment results, it determines the name for optuna study. By default, isn’t set.
pool (Pool | PoolGenerator | List[BasePipeline]) – Pool of pipelines to choose from. By default, no_freq_super_fast pool is used. For description of all available pools see Pool docs.
generate_params (Dict[str, Any] | None) – Dictionary with parameters to fill pool templates. Available parameters are timestamp_column, chronos_device and timesfm_device. For full description see Pool docs. For usage example see 205-automl notebook.
runner (AbstractRunner | None) – Runner to use for distributed training. By default, LocalRunner is used.
storage (BaseStorage | None) – Optuna storage to use. By default, sqlite storage is used.
metrics (List[BaseMetric] | None) – List of metrics to compute. By default, Sign, SMAPE, MAE, MSE, MedAE metrics are used.

Methods

`fit`(ts[, timeout, n_trials, initializer, ...])	Start automatic pipeline selection.
`get_configs`()	Get pipelines from `pool`.
`objective`(ts, target_metric, ...[, ...])	Optuna objective wrapper for the pool stage.
`summary`()	Get Auto trials summary.
`top_k`([k])	Get top k pipelines with the best metric value.

fit(ts: TSDataset, timeout: int | None = None, n_trials: int | None = None, initializer: _Initializer | None = None, callback: _Callback | None = None, **kwargs) → BasePipeline[source]#

Start automatic pipeline selection.

There are two stages:

Pool stage: trying every pipeline in a pool
Tuning stage: tuning tune_size best pipelines from a previous stage by using Tune.

Tuning stage starts only if limits on n_trials and timeout aren’t exceeded. Tuning goes from the best pipeline to the worst, and trial limits (n_trials, timeout) are divided evenly between each pipeline. If there are no limits on number of trials only the first pipeline will be tuned until user stops the process.

Parameters:

ts (TSDataset) – TSDataset to fit on.
timeout (int | None) – Timeout for optuna. N.B. this is timeout for each worker. By default, isn’t set.
n_trials (int | None) – Number of trials for optuna. N.B. this is number of trials for each worker. By default, isn’t set.
initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.
callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.
**kwargs – Parameter tune_size (default: 0) determines how many pipelines to fit during tuning stage. Other parameters are passed into optuna optuna.study.Study.optimize().

Return type:

BasePipeline

get_configs() → List[Dict[str, Any]][source]#

Get pipelines from pool.

Return type:: List[Dict[str, Any]]

static objective(ts: TSDataset, target_metric: BaseMetric, metric_aggregation: Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'], metrics: List[BaseMetric], backtest_params: dict, config_mapping: Dict[str, dict], initializer: _Initializer | None = None, callback: _Callback | None = None) → Callable[[Trial], float][source]#

Optuna objective wrapper for the pool stage.

Parameters:

ts (TSDataset) – TSDataset to fit on.
target_metric (BaseMetric) – Metric to optimize.
metric_aggregation (Literal['median', 'mean', 'std', 'notna_size', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics.
metrics (List[BaseMetric]) – List of metrics to compute.
backtest_params (dict) – Custom parameters for backtest instead of default backtest parameters.
initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.
callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.
config_mapping (Dict[str, dict]) – Mapping from config hashes to configs.

Returns:

function that runs specified trial and returns its evaluated score

Return type:

objective

summary() → DataFrame[source]#

Get Auto trials summary.

There are columns:

hash: hash of the pipeline;
pipeline: pipeline object;
metrics: columns with metrics’ values;
elapsed_time: fitting time of pipeline (doesn’t include model initialization);
state: state of the trial;
study: name of the study in which trial was made.

Returns:: dataframe with detailed info on each performed trial
Return type:: study_dataframe

top_k(k: int = 5) → List[BasePipeline][source]#

Get top k pipelines with the best metric value.

Only complete and non-duplicate studies are taken into account.

Parameters:: k (int) – Number of pipelines to return.
Returns:: List of top k pipelines.
Return type:: List[BasePipeline]