etna.ensembles.VotingEnsemble#
- class VotingEnsemble(pipelines: List[BasePipeline], weights: List[float] | Literal['auto'] | None = None, regressor: DecisionTreeRegressor | ExtraTreeRegressor | RandomForestRegressor | ExtraTreesRegressor | GradientBoostingRegressor | CatBoostRegressor | None = None, n_folds: int = 3, n_jobs: int = 1, joblib_params: Dict[str, Any] | None = None)[source]#
- Bases: - EnsembleMixin,- SaveEnsembleMixin,- BasePipeline- VotingEnsemble is a pipeline that forecast future values with weighted averaging of it’s pipelines forecasts. - Examples - >>> from etna.datasets import generate_ar_df >>> from etna.datasets import TSDataset >>> from etna.ensembles import VotingEnsemble >>> from etna.models import NaiveModel >>> from etna.models import ProphetModel >>> from etna.pipeline import Pipeline >>> df = generate_ar_df(periods=30, start_time="2021-06-01", ar_coef=[1.2], n_segments=3) >>> ts = TSDataset(df, "D") >>> prophet_pipeline = Pipeline(model=ProphetModel(), transforms=[], horizon=7) >>> naive_pipeline = Pipeline(model=NaiveModel(lag=10), transforms=[], horizon=7) >>> ensemble = VotingEnsemble( ... pipelines=[prophet_pipeline, naive_pipeline], ... weights=[0.7, 0.3] ... ) >>> _ = ensemble.fit(ts=ts) >>> forecast = ensemble.forecast() >>> forecast segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-07-01 -8.84 -186.67 130.99 2021-07-02 -8.96 -198.16 138.81 2021-07-03 -9.57 -212.48 148.48 2021-07-04 -10.48 -229.16 160.13 2021-07-05 -11.20 -248.93 174.39 2021-07-06 -12.47 -281.90 197.82 2021-07-07 -13.51 -307.02 215.73 - Init VotingEnsemble. - Parameters:
- pipelines (List[BasePipeline]) – List of pipelines that should be used in ensemble 
- weights (List[float] | Literal['auto'] | None) – - List of pipelines’ weights. - If None, use uniform weights 
- If List[float], use this weights for the base estimators, weights will be normalized automatically 
- If “auto”, use importances of the base estimators forecasts as weights of base estimators 
 
- regressor (DecisionTreeRegressor | ExtraTreeRegressor | RandomForestRegressor | ExtraTreesRegressor | GradientBoostingRegressor | CatBoostRegressor | None) – Regression model with fit/predict interface which will be used to evaluate weights of the base estimators. It should have - feature_importances_property (e.g. all tree-based regressors in sklearn)
- n_folds (int) – Number of folds to use in the backtest. Backtest is used to obtain the forecasts from the base estimators; forecasts will be used to evaluate the estimator’s weights. 
- n_jobs (int) – Number of jobs to run in parallel 
- joblib_params (Dict[str, Any] | None) – Additional parameters for - joblib.Parallel
 
- Raises:
- ValueError: – If the number of the pipelines is less than 2 or pipelines have different horizons. 
 - Methods - backtest(ts, metrics[, n_folds, mode, ...])- Run backtest with the pipeline. - fit(ts[, save_ts])- Fit pipelines in ensemble. - forecast([ts, prediction_interval, ...])- Make a forecast of the next points of a dataset. - get_historical_forecasts(ts[, n_folds, ...])- Estimate forecast for each fold on the historical dataset. - load(path[, ts])- Load an object. - Get hyperparameter grid to tune. - predict(ts[, start_timestamp, ...])- Make in-sample predictions on dataset in a given range. - save(path)- Save the object. - set_params(**params)- Return new object instance with modified parameters. - to_dict()- Collect all information about etna object in dict. - Attributes - This class stores its - __init__parameters as attributes.- backtest(ts: TSDataset, metrics: List[Metric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) Tuple[DataFrame, DataFrame, DataFrame][source]#
- Run backtest with the pipeline. - If - refit != Trueand some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
- ts (TSDataset) – Dataset to fit models in backtest 
- metrics (List[Metric]) – List of metrics to compute for each fold 
- n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks 
- mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if - n_foldsis integer. By default, is set to ‘expand’.
- aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise 
- n_jobs (int) – Number of jobs to run in parallel 
- Determines how often pipeline should be retrained during iteration over folds. - If - True: pipeline is retrained on each fold.
- If - False: pipeline is trained only on the first fold.
- If - value: int: pipeline is trained every- valuefolds starting from the first.
 
- stride (int | None) – Number of points between folds. Works only if - n_foldsis integer. By default, is set to- horizon.
- joblib_params (Dict[str, Any] | None) – Additional parameters for - joblib.Parallel
- forecast_params (Dict[str, Any] | None) – Additional parameters for - forecast()
 
- Returns:
- Metrics dataframe, forecast dataframe and dataframe with information about folds 
- Return type:
- metrics_df, forecast_df, fold_info_df 
- Raises:
- ValueError: – If - modeis set when- n_foldsare- List[FoldMask].
- ValueError: – If - strideis set when- n_foldsare- List[FoldMask].
 
 
 - fit(ts: TSDataset, save_ts: bool = True) VotingEnsemble[source]#
- Fit pipelines in ensemble. 
 - forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) TSDataset[source]#
- Make a forecast of the next points of a dataset. - The result of forecasting starts from the last point of - ts, not including it.- Parameters:
- ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during - fit()is used.
- prediction_interval (bool) – If True returns prediction interval for forecast 
- quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval 
- n_folds (int) – Number of folds to use in the backtest for prediction interval estimation 
- return_components (bool) – If True additionally returns forecast components 
 
- Returns:
- Dataset with predictions 
- Raises:
- NotImplementedError: – Adding target components is not currently implemented 
- Return type:
 
 - get_historical_forecasts(ts: TSDataset, n_folds: int | List[FoldMask] = 5, mode: str | None = None, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) DataFrame[source]#
- Estimate forecast for each fold on the historical dataset. - If - refit != Trueand some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.- Parameters:
- ts (TSDataset) – Dataset to fit models in backtest 
- n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks 
- mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if - n_foldsis integer. By default, is set to ‘expand’.
- n_jobs (int) – Number of jobs to run in parallel 
- Determines how often pipeline should be retrained during iteration over folds. - If - True: pipeline is retrained on each fold.
- If - False: pipeline is trained only on the first fold.
- If - value: int: pipeline is trained every- valuefolds starting from the first.
 
- stride (int | None) – Number of points between folds. Works only if - n_foldsis integer. By default, is set to- horizon.
- joblib_params (Dict[str, Any] | None) – Additional parameters for - joblib.Parallel
- forecast_params (Dict[str, Any] | None) – Additional parameters for - forecast()
 
- Returns:
- Forecast dataframe 
- Raises:
- ValueError: – If - modeis set when- n_foldsare- List[FoldMask].
- ValueError: – If - strideis set when- n_foldsare- List[FoldMask].
 
- Return type:
 
 - classmethod load(path: Path, ts: TSDataset | None = None) Self[source]#
- Load an object. - Warning - This method uses - dillmodule which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.
 - params_to_tune() Dict[str, BaseDistribution][source]#
- Get hyperparameter grid to tune. - Currently, returns empty dict, but could have a proper implementation in the future. - Returns:
- Grid with hyperparameters. 
- Return type:
 
 - predict(ts: TSDataset, start_timestamp: Timestamp | int | str | None = None, end_timestamp: Timestamp | int | str | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset[source]#
- Make in-sample predictions on dataset in a given range. - Currently, in situation when segments start with different timestamps we only guarantee to work with - start_timestamp>= beginning of all segments.- Parameters - start_timestampand- end_timestampof type- strare converted into- pd.Timestamp.- Parameters:
- ts (TSDataset) – Dataset to make predictions on. 
- start_timestamp (Timestamp | int | str | None) – First timestamp of prediction range to return, should be >= than first timestamp in - ts; expected that beginning of each segment <=- start_timestamp; if isn’t set the first timestamp where each segment began is taken.
- end_timestamp (Timestamp | int | str | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of - tsis taken. Expected that value is less or equal to the last timestamp in- ts.
- prediction_interval (bool) – If True returns prediction interval for forecast. 
- quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval. 
- return_components (bool) – If True additionally returns forecast components 
 
- Returns:
- Dataset with predictions in - [start_timestamp, end_timestamp]range.
- Raises:
- ValueError – Incorrect type of - start_timestampor- end_timestampis used according to- ts.freq
- ValueError: – Value of - end_timestampis less than- start_timestamp.
- ValueError: – Value of - start_timestampgoes before point where each segment started.
- ValueError: – Value of - end_timestampgoes after the last timestamp.
- NotImplementedError: – Adding target components is not currently implemented 
 
- Return type:
 
 - set_params(**params: dict) Self[source]#
- Return new object instance with modified parameters. - Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a - modelin a- Pipeline.- Nested parameters are expected to be in a - <component_1>.<...>.<parameter>form, where components are separated by a dot.- Parameters:
- **params (dict) – Estimator parameters 
- Returns:
- New instance with changed parameters 
- Return type:
- Self 
 - Examples - >>> from etna.pipeline import Pipeline >>> from etna.models import NaiveModel >>> from etna.transforms import AddConstTransform >>> model = NaiveModel(lag=1) >>> transforms = [AddConstTransform(in_column="target", value=1)] >>> pipeline = Pipeline(model, transforms=transforms, horizon=3) >>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2}) Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )