etna.pipeline.BasePipeline#
- class BasePipeline(horizon: int)[source]#
Bases:
BaseMixin,AbstractSaveableBase class for pipeline.
Create instance of BasePipeline with given parameters.
- Parameters:
horizon (int) – Number of timestamps in the future for forecasting
Methods
backtest(ts, metrics[, n_folds, mode, ...])Run backtest with the pipeline.
fit(ts[, save_ts])Fit the Pipeline.
forecast([ts, prediction_interval, ...])Make a forecast of the next points of a dataset.
get_historical_forecasts(ts[, n_folds, ...])Estimate forecast for each fold on the historical dataset.
load(path)Load an object.
Get hyperparameter grid to tune.
predict(ts[, start_timestamp, ...])Make in-sample predictions on dataset in a given range.
save(path)Save the object.
set_params(**params)Return new object instance with modified parameters.
to_dict()Collect all information about etna object in dict.
Attributes
This class stores its
__init__parameters as attributes.- backtest(ts: TSDataset, metrics: List[BaseMetric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) Dict[str, DataFrame | List[TSDataset] | List[Self]][source]#
Run backtest with the pipeline.
If
refit != Trueand some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
ts (TSDataset) – Dataset to fit models in backtest
metrics (List[BaseMetric]) – List of metrics to compute for each fold
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_foldsis integer. By default, is set to ‘expand’.aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise
n_jobs (int) – Number of jobs to run in parallel
Determines how often pipeline should be retrained during iteration over folds.
If
True: pipeline is retrained on each fold.If
False: pipeline is trained only on the first fold.If
value: int: pipeline is trained everyvaluefolds starting from the first.
stride (int | None) – Number of points between folds. Works only if
n_foldsis integer. By default, is set tohorizon.joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallelforecast_params (Dict[str, Any] | None) – Additional parameters for
forecast()
- Returns:
Dictionary with backtest results. It contains metrics dataframe, list of TSDatasets with forecast for each fold in ascending order folds, dataframe with information about folds and list of pipelines for each fold in ascending order folds.
- Return type:
backtest_result
- Raises:
ValueError: – If
modeis set whenn_foldsareList[FoldMask].ValueError: – If
strideis set whenn_foldsareList[FoldMask].
- abstract fit(ts: TSDataset, save_ts: bool = True) BasePipeline[source]#
Fit the Pipeline.
- Parameters:
- Returns:
Fitted Pipeline instance
- Return type:
- forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) TSDataset[source]#
Make a forecast of the next points of a dataset.
The result of forecasting starts from the last point of
ts, not including it.- Parameters:
ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during
fit()is used.prediction_interval (bool) – If True returns prediction interval for forecast
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval
n_folds (int) – Number of folds to use in the backtest for prediction interval estimation
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions
- Raises:
NotImplementedError: – Adding target components is not currently implemented
- Return type:
- get_historical_forecasts(ts: TSDataset, n_folds: int | List[FoldMask] = 5, mode: str | None = None, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) List[TSDataset][source]#
Estimate forecast for each fold on the historical dataset.
If
refit != Trueand some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
ts (TSDataset) – Dataset to fit models in backtest
n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks
mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if
n_foldsis integer. By default, is set to ‘expand’.n_jobs (int) – Number of jobs to run in parallel
Determines how often pipeline should be retrained during iteration over folds.
If
True: pipeline is retrained on each fold.If
False: pipeline is trained only on the first fold.If
value: int: pipeline is trained everyvaluefolds starting from the first.
stride (int | None) – Number of points between folds. Works only if
n_foldsis integer. By default, is set tohorizon.joblib_params (Dict[str, Any] | None) – Additional parameters for
joblib.Parallelforecast_params (Dict[str, Any] | None) – Additional parameters for
forecast()
- Returns:
List of TSDataset with forecast for each fold on the historical dataset.
- Raises:
ValueError: – If
modeis set whenn_foldsareList[FoldMask].ValueError: – If
strideis set whenn_foldsareList[FoldMask].
- Return type:
- abstract classmethod load(path: Path) Self[source]#
Load an object.
- Parameters:
path (Path) – Path to load object from.
- Return type:
Self
- abstract params_to_tune() Dict[str, BaseDistribution][source]#
Get hyperparameter grid to tune.
- Returns:
Grid with hyperparameters.
- Return type:
- predict(ts: TSDataset, start_timestamp: Timestamp | int | str | None = None, end_timestamp: Timestamp | int | str | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset[source]#
Make in-sample predictions on dataset in a given range.
Currently, in situation when segments start with different timestamps we only guarantee to work with
start_timestamp>= beginning of all segments.Parameters
start_timestampandend_timestampof typestrare converted intopd.Timestamp.- Parameters:
ts (TSDataset) – Dataset to make predictions on.
start_timestamp (Timestamp | int | str | None) – First timestamp of prediction range to return, should be >= than first timestamp in
ts; expected that beginning of each segment <=start_timestamp; if isn’t set the first timestamp where each segment began is taken.end_timestamp (Timestamp | int | str | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of
tsis taken. Expected that value is less or equal to the last timestamp ints.prediction_interval (bool) – If True returns prediction interval for forecast.
quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.
return_components (bool) – If True additionally returns forecast components
- Returns:
Dataset with predictions in
[start_timestamp, end_timestamp]range.- Raises:
ValueError – Incorrect type of
start_timestamporend_timestampis used according tots.freqValueError: – Value of
end_timestampis less thanstart_timestamp.ValueError: – Value of
start_timestampgoes before point where each segment started.ValueError: – Value of
end_timestampgoes after the last timestamp.NotImplementedError: – Adding target components is not currently implemented
- Return type:
- abstract save(path: Path)[source]#
Save the object.
- Parameters:
path (Path) – Path to save object to.
- set_params(**params: dict) Self[source]#
Return new object instance with modified parameters.
Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a
modelin aPipeline.Nested parameters are expected to be in a
<component_1>.<...>.<parameter>form, where components are separated by a dot.- Parameters:
**params (dict) – Estimator parameters
- Returns:
New instance with changed parameters
- Return type:
Self
Examples
>>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )