etna.analysis.get_anomalies_isolation_forest#

get_anomalies_isolation_forest(ts: TSDataset, in_column: str = 'target', features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0, index_only: bool = True) Dict[str, List[Timestamp] | List[int] | Series][source]#

Get point outliers in time series using Isolation Forest algorithm.

Documentation for Isolation Forest.

Parameters:
  • ts (TSDataset) – TSDataset with timeseries data

  • in_column (str) – Name of the column in which the anomaly is searching

  • features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection

  • features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection

  • ignore_missing (bool) – Whether to ignore missing values inside a series

  • n_estimators (int) – The number of base estimators in the ensemble

  • max_samples (int | float | Literal['auto']) –

    The number of samples to draw from X to train each base estimator
    • If int, then draw max_samples samples.

    • If float, then draw max_samples * X.shape[0] samples.

    • If “auto”, then max_samples=min(256, n_samples).

    If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).

  • contamination (float | Literal['auto']) –

    The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.

    • If ‘auto’, the threshold is determined as in the original paper.

    • If float, the contamination should be in the range (0, 0.5].

  • max_features (int | float) –

    The number of features to draw from X to train each base estimator.
    • If int, then draw max_features features.

    • If float, then draw max(1, int(max_features * n_features_in_)) features.

    Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.

  • bootstrap (bool) –

    • If True, individual trees are fit on random subsets of the training data sampled with replacement.

    • If False, sampling without replacement is performed.

  • n_jobs (int | None) –

    The number of jobs to run in parallel for both fit and predict.
    • None means 1 unless in a joblib.parallel_backend context.

    • -1 means using all processors

  • random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

  • verbose (int) – Controls the verbosity of the tree building process.

  • index_only (bool) – whether to return only outliers indices. If False will return outliers series

Returns:

dict of outliers in format {segment: [outliers_timestamps]}

Return type:

Dict[str, List[Timestamp] | List[int] | Series]