etna.experimental.classification.feature_extraction.WEASELFeatureExtractor#

class WEASELFeatureExtractor(padding_value: float | Literal['back_fill'], word_size: int = 4, ngram_range: Tuple[int, int] = (1, 2), n_bins: int = 4, window_sizes: List[int | float] | None = None, window_steps: List[int | float] | None = None, anova: bool = True, drop_sum: bool = True, norm_mean: bool = True, norm_std: bool = True, strategy: str = 'entropy', chi2_threshold: float = 2, sparse: bool = True, alphabet: List[str] | None = None)[source]#

Bases: BaseTimeSeriesFeatureExtractor

Class to extract features with WEASEL algorithm.

Note

This class requires classification extension to be installed. Read more about this at installation page.

Init WEASELFeatureExtractor with given parameters.

Parameters:
  • padding_value (float | Literal['back_fill']) – Value to pad the series to fit the series_len, if equals to “back_fill” the first value in the series is used.

  • word_size (int) – Size of each word.

  • ngram_range (Tuple[int, int]) – The lower and upper boundary of the range of ngrams.

  • n_bins (int) – The number of bins to produce. It must be between 2 and 26.

  • window_sizes (List[int | float] | None) – Size of the sliding windows. All the elements must be either integers or floats. In the latter case, each element represents the percentage of the size of each time series and must be between 0 and 1; the size of the sliding windows will be computed as np.ceil(window_sizes * n_timestamps).

  • window_steps (List[int | float] | None) – Step of the sliding windows. If None, each window_step is equal to window_size so that the windows are non-overlapping. Otherwise, all the elements must be either integers or floats. In the latter case, each element represents the percentage of the size of each time series and must be between 0 and 1; the step of the sliding windows will be computed as np.ceil(window_steps * n_timestamps).

  • anova (bool) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected.

  • drop_sum (bool) – If True, the first Fourier coefficient (i.e. the sum of the subseries) is dropped. Otherwise, it is kept.

  • norm_mean (bool) – If True, center each subseries before scaling.

  • norm_std (bool) – If True, scale each subseries to unit variance.

  • strategy (str) – Strategy used to define the widths of the bins: - ‘uniform’: All bins in each sample have identical widths - ‘quantile’: All bins in each sample have the same number of points - ‘normal’: Bin edges are quantiles from a standard normal distribution - ‘entropy’: Bin edges are computed using information gain

  • chi2_threshold (float) – The threshold used to perform feature selection. Only the words with a chi2 statistic above this threshold will be kept.

  • sparse (bool) – Return a sparse matrix if True, else return an array.

  • alphabet (List[str] | None) – Alphabet to use. If None, the first n_bins letters of the Latin alphabet are used.

Methods

dump(path, *args, **kwargs)

Save the object.

fit(x[, y])

Fit the feature extractor.

fit_transform(x[, y])

Fit the feature extractor and extract features from the input data.

load(path, *args, **kwargs)

Load the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

transform(x)

Extract weasel features from the input data.

Attributes

This class stores its __init__ parameters as attributes.

dump(path: str, *args, **kwargs)[source]#

Save the object.

Parameters:

path (str) –

fit(x: List[ndarray], y: ndarray | None = None) WEASELFeatureExtractor[source]#

Fit the feature extractor.

Parameters:
  • x (List[ndarray]) – Array with time series.

  • y (ndarray | None) – Array of class labels.

Returns:

Fitted instance of feature extractor.

Return type:

WEASELFeatureExtractor

fit_transform(x: List[ndarray], y: ndarray | None = None) ndarray[source]#

Fit the feature extractor and extract features from the input data.

Parameters:
  • x (List[ndarray]) – Array with time series.

  • y (ndarray | None) – Array of class labels.

Returns:

Transformed input data.

Return type:

ndarray

static load(path: str, *args, **kwargs)[source]#

Load the object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters:

path (str) –

set_params(**params: dict) Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:

**params (dict) – Estimator parameters

Returns:

New instance with changed parameters

Return type:

Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()[source]#

Collect all information about etna object in dict.

transform(x: List[ndarray]) ndarray[source]#

Extract weasel features from the input data.

Parameters:

x (List[ndarray]) – Array with time series.

Returns:

Transformed input data.

Return type:

ndarray