etna.transforms.GaleShapleyFeatureSelectionTransform#

class GaleShapleyFeatureSelectionTransform(relevance_table: RelevanceTable, top_k: int, features_to_use: List[str] | Literal['all'] = 'all', use_rank: bool = False, return_features: bool = False, **relevance_params)[source]#

Bases: BaseFeatureSelectionTransform

Transform that provides feature filtering by Gale-Shapley matching algorithm according to the relevance table.

Transform works with any type of features, however most of the models works only with regressors. Therefore, it is recommended to pass the regressors into the feature selection transforms.

Notes

As input, we have a table of relevances with size \(N_{f} \times N_{s}\) where \(N_{f}\) represents the number of features and \(N_{s}\) represents the number of segments. The procedure of filtering features consists of \(\lceil \frac{k}{N_{s}} \rceil\) iterations.

Algorithm of each iteration:

  • build a matching between segments and features by Gale–Shapley algorithm according to the relevance table, during the matching segments send proposals to features;

  • select features to add by taking matched feature for each segment;

  • add selected features to accumulated list of selected features taking into account that this list shouldn’t exceed the size of top_k;

  • remove added features from future consideration.

Init GaleShapleyFeatureSelectionTransform.

Parameters:
  • relevance_table (RelevanceTable) – class to build relevance table

  • top_k (int) – number of features that should be selected from all the given ones

  • features_to_use (List[str] | Literal['all']) – columns of the dataset to select from if “all” value is given, all columns are used

  • use_rank (bool) – if True, use rank in relevance table computation

  • return_features (bool) – indicates whether to return features or not.

Methods

fit(ts)

Fit the transform.

fit_transform(ts)

Fit and transform TSDataset.

get_regressors_info()

Return the list with regressors created by the transform.

inverse_transform(ts)

Inverse transform TSDataset.

load(path)

Load an object.

params_to_tune()

Get default grid for tuning hyperparameters.

save(path)

Save the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

transform(ts)

Transform TSDataset inplace.

Attributes

This class stores its __init__ parameters as attributes.

fit(ts: TSDataset) Transform[source]#

Fit the transform.

Parameters:

ts (TSDataset) – Dataset to fit the transform on.

Returns:

The fitted transform instance.

Return type:

Transform

fit_transform(ts: TSDataset) TSDataset[source]#

Fit and transform TSDataset.

May be reimplemented. But it is not recommended.

Parameters:

ts (TSDataset) – TSDataset to transform.

Returns:

Transformed TSDataset.

Return type:

TSDataset

get_regressors_info() List[str][source]#

Return the list with regressors created by the transform.

Return type:

List[str]

inverse_transform(ts: TSDataset) TSDataset[source]#

Inverse transform TSDataset.

Apply the _inverse_transform method.

Parameters:

ts (TSDataset) – TSDataset to be inverse transformed.

Returns:

TSDataset after applying inverse transformation.

Return type:

TSDataset

classmethod load(path: Path) Self[source]#

Load an object.

Warning

This method uses dill module which is not secure. It is possible to construct malicious data which will execute arbitrary code during loading. Never load data that could have come from an untrusted source, or that could have been tampered with.

Parameters:

path (Path) – Path to load object from.

Returns:

Loaded object.

Return type:

Self

params_to_tune() Dict[str, BaseDistribution][source]#

Get default grid for tuning hyperparameters.

This grid tunes parameters: top_k, use_rank. Other parameters are expected to be set by the user.

For top_k parameter the maximum suggested value is not greater than self.top_k.

Returns:

Grid to tune.

Return type:

Dict[str, BaseDistribution]

save(path: Path)[source]#

Save the object.

Parameters:

path (Path) – Path to save object to.

set_params(**params: dict) Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:

**params (dict) – Estimator parameters

Returns:

New instance with changed parameters

Return type:

Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()[source]#

Collect all information about etna object in dict.

transform(ts: TSDataset) TSDataset[source]#

Transform TSDataset inplace.

Parameters:

ts (TSDataset) – Dataset to transform.

Returns:

Transformed TSDataset.

Return type:

TSDataset