etna.analysis.get_anomalies_isolation_forest#

get_anomalies_isolation_forest(ts: TSDataset, in_column: str = 'target', features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0, index_only: bool = True) → Dict[str, List[Timestamp] | List[int] | Series][source]#

Get point outliers in time series using Isolation Forest algorithm.

Documentation for Isolation Forest.

Parameters:

ts (TSDataset) – TSDataset with timeseries data
in_column (str) – Name of the column in which the anomaly is searching
features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection
features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection
ignore_missing (bool) – Whether to ignore missing values inside a series
n_estimators (int) – The number of base estimators in the ensemble
max_samples (int | float | Literal['auto']) –
The number of samples to draw from X to train each base estimator
- If int, then draw max_samples samples.
- If float, then draw max_samples * X.shape[0] samples.
- If “auto”, then max_samples=min(256, n_samples).
If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).
contamination (float | Literal['auto']) –
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.
- If ‘auto’, the threshold is determined as in the original paper.
- If float, the contamination should be in the range (0, 0.5].
max_features (int | float) –
The number of features to draw from X to train each base estimator.
- If int, then draw max_features features.
- If float, then draw max(1, int(max_features * n_features_in_)) features.
Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.
bootstrap (bool) –
- If True, individual trees are fit on random subsets of the training data sampled with replacement.
- If False, sampling without replacement is performed.
n_jobs (int | None) –
The number of jobs to run in parallel for both fit and predict.
- None means 1 unless in a joblib.parallel_backend context.
- -1 means using all processors
random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.
verbose (int) – Controls the verbosity of the tree building process.
index_only (bool) – whether to return only outliers indices. If False will return outliers series

Returns:

dict of outliers in format {segment: [outliers_timestamps]}

Return type:

Dict[str, List[Timestamp] | List[int] | Series]