etna.transforms.IForestOutlierTransform#

class IForestOutlierTransform(in_column: str, ignore_flag_column: str | None = None, features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0)[source]#

Bases: OutliersTransform

Transform that uses get_anomalies_isolation_forest() to find anomalies in data.

Create instance of PredictionIntervalOutliersTransform.

Parameters:
  • in_column (str) – Name of the column in which the anomaly is searching

  • ignore_flag_column (str | None) – Column name for skipping values from outlier check

  • features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection

  • features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection

  • ignore_missing (bool) – Whether to ignore missing values inside a series

  • n_estimators (int) – The number of base estimators in the ensemble

  • max_samples (int | float | Literal['auto']) –

    The number of samples to draw from X to train each base estimator
    • If int, then draw max_samples samples.

    • If float, then draw max_samples * X.shape[0] samples.

    • If “auto”, then max_samples=min(256, n_samples).

    If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).

  • contamination (float | Literal['auto']) –

    The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.

    • If ‘auto’, the threshold is determined as in the original paper.

    • If float, the contamination should be in the range (0, 0.5].

  • max_features (int | float) –

    The number of features to draw from X to train each base estimator.
    • If int, then draw max_features features.

    • If float, then draw max(1, int(max_features * n_features_in_)) features.

    Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.

  • bootstrap (bool) –

    • If True, individual trees are fit on random subsets of the training data sampled with replacement.

    • If False, sampling without replacement is performed.

  • n_jobs (int | None) –

    The number of jobs to run in parallel for both fit and predict.
    • None means 1 unless in a joblib.parallel_backend context.

    • -1 means using all processors

  • random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

  • verbose (int) – Controls the verbosity of the tree building process.

Notes

To get more insights on parameters see documentation of Isolation Forest algorithm:

Documentation for Isolation Forest.

Methods

detect_outliers(ts)

Call get_anomalies_isolation_forest() function with self parameters.

fit(ts)

Fit the transform.

fit_transform(ts)

Fit and transform TSDataset.

get_regressors_info()

Return the list with regressors created by the transform.

inverse_transform(ts)

Inverse transform TSDataset.

load(path)

Load an object.

params_to_tune()

Get default grid for tuning hyperparameters.

save(path)

Save the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

transform(ts)

Transform TSDataset inplace.

Attributes

This class stores its __init__ parameters as attributes.

original_values

Backward compatibility property.

outliers_timestamps

Backward compatibility property.

detect_outliers(ts: TSDataset) Dict[str, Series][source]#

Call get_anomalies_isolation_forest() function with self parameters.

Parameters:

ts (TSDataset) – Dataset to process

Returns:

Dict of outliers in format {segment: [outliers_timestamps]}

Return type:

Dict[str, Series]

fit(ts: TSDataset) OutliersTransform[source]#

Fit the transform.

Parameters:

ts (TSDataset) – Dataset to fit the transform on.

Returns:

The fitted transform instance.

Return type:

OutliersTransform

fit_transform(ts: TSDataset) TSDataset[source]#

Fit and transform TSDataset.

May be reimplemented. But it is not recommended.

Parameters:

ts (TSDataset) – TSDataset to transform.

Returns:

Transformed TSDataset.

Return type:

TSDataset

get_regressors_info() List[str][source]#

Return the list with regressors created by the transform.

Returns:

List with regressors created by the transform.

Return type:

List[str]

inverse_transform(ts: TSDataset) TSDataset[source]#

Inverse transform TSDataset.

Apply the _inverse_transform method.

Parameters:

ts (TSDataset) – TSDataset to be inverse transformed.

Returns:

TSDataset after applying inverse transformation.

Return type:

TSDataset

classmethod load(path: Path) Self[source]#

Load an object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters:

path (Path) – Path to load object from.

Returns:

Loaded object.

Return type:

Self

params_to_tune() Dict[str, BaseDistribution][source]#

Get default grid for tuning hyperparameters.

This grid tunes parameters: n_estimators, max_samples, contamination, max_features, bootstrap. Other parameters are expected to be set by the user.

Returns:

Grid to tune.

Return type:

Dict[str, BaseDistribution]

save(path: Path)[source]#

Save the object.

Parameters:

path (Path) – Path to save object to.

set_params(**params: dict) Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:

**params (dict) – Estimator parameters

Returns:

New instance with changed parameters

Return type:

Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()[source]#

Collect all information about etna object in dict.

transform(ts: TSDataset) TSDataset[source]#

Transform TSDataset inplace.

Parameters:

ts (TSDataset) – Dataset to transform.

Returns:

Transformed TSDataset.

Return type:

TSDataset

property original_values: Dict[str, Series] | None[source]#

Backward compatibility property.

property outliers_timestamps: Dict[str, List[Timestamp]] | Dict[str, List[int]] | None[source]#

Backward compatibility property.