etna.analysis.get_anomalies_isolation_forest#
- get_anomalies_isolation_forest(ts: TSDataset, in_column: str = 'target', features_to_use: Sequence[str] | None = None, features_to_ignore: Sequence[str] | None = None, ignore_missing: bool = False, n_estimators: int = 100, max_samples: int | float | Literal['auto'] = 'auto', contamination: float | Literal['auto'] = 'auto', max_features: int | float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0, index_only: bool = True) Dict[str, List[Timestamp] | List[int] | Series][source]#
- Get point outliers in time series using Isolation Forest algorithm. - Documentation for Isolation Forest. - Parameters:
- ts (TSDataset) – TSDataset with timeseries data 
- in_column (str) – Name of the column in which the anomaly is searching 
- features_to_use (Sequence[str] | None) – List of feature column names to use for anomaly detection 
- features_to_ignore (Sequence[str] | None) – List of feature column names to exclude from anomaly detection 
- ignore_missing (bool) – Whether to ignore missing values inside a series 
- n_estimators (int) – The number of base estimators in the ensemble 
- max_samples (int | float | Literal['auto']) – - The number of samples to draw from X to train each base estimator
- If int, then draw max_samples samples. 
- If float, then draw max_samples * X.shape[0] samples. 
- If “auto”, then max_samples=min(256, n_samples). 
 
 - If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling). 
- contamination (float | Literal['auto']) – - The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples. - If ‘auto’, the threshold is determined as in the original paper. 
- If float, the contamination should be in the range (0, 0.5]. 
 
- The number of features to draw from X to train each base estimator.
- If int, then draw max_features features. 
- If float, then draw max(1, int(max_features * n_features_in_)) features. 
 
 - Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime. 
- bootstrap (bool) – - If True, individual trees are fit on random subsets of the training data sampled with replacement. 
- If False, sampling without replacement is performed. 
 
- n_jobs (int | None) – - The number of jobs to run in parallel for both fit and predict.
- None means 1 unless in a joblib.parallel_backend context. 
- -1 means using all processors 
 
 
- random_state (int | RandomState | None) – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest. 
- verbose (int) – Controls the verbosity of the tree building process. 
- index_only (bool) – whether to return only outliers indices. If False will return outliers series 
 
- Returns:
- dict of outliers in format {segment: [outliers_timestamps]} 
- Return type: