etna.datasets.TSDataset#
- class TSDataset(df: DataFrame, freq: DateOffset | str | None, df_exog: DataFrame | None = None, known_future: Literal['all'] | Sequence = (), hierarchical_structure: HierarchicalStructure | None = None)[source]#
- Bases: - object- TSDataset is the main class to handle your time series data. - It prepares the series for exploration analyzing, implements feature generation with Transforms and generation of future points. - Notes - TSDataset supports custom indexing and slicing method. It maybe done through these interface: - TSDataset[timestamp, segment, column]If at the start of the period dataset contains NaN those timestamps will be removed.- During creation segment is casted to string type. - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df(periods=30, start_time="2021-06-01", n_segments=2, scale=1) >>> ts = TSDataset(df, "D") >>> ts["2021-06-01":"2021-06-07", "segment_0", "target"] timestamp 2021-06-01 1.0 2021-06-02 1.0 2021-06-03 1.0 2021-06-04 1.0 2021-06-05 1.0 2021-06-06 1.0 2021-06-07 1.0 Freq: D, Name: (segment_0, target), dtype: float64 - >>> from etna.datasets import generate_ar_df >>> pd.options.display.float_format = '{:,.2f}'.format >>> df_to_forecast = generate_ar_df(100, start_time="2021-01-01", n_segments=1) >>> df_regressors = generate_ar_df(120, start_time="2021-01-01", n_segments=5) >>> df_regressors = df_regressors.pivot(index="timestamp", columns="segment").reset_index() >>> df_regressors.columns = ["timestamp"] + [f"regressor_{i}" for i in range(5)] >>> df_regressors["segment"] = "segment_0" >>> tsdataset = TSDataset(df=df_to_forecast, freq="D", df_exog=df_regressors, known_future="all") >>> tsdataset.head(5) segment segment_0 feature regressor_0 regressor_1 regressor_2 regressor_3 regressor_4 target timestamp 2021-01-01 1.62 -0.02 -0.50 -0.56 0.52 1.62 2021-01-02 1.01 -0.80 -0.81 0.38 -0.60 1.01 2021-01-03 0.48 0.47 -0.81 -1.56 -1.37 0.48 2021-01-04 -0.59 2.44 -2.21 -1.21 -0.69 -0.59 2021-01-05 0.28 0.58 -3.07 -1.45 0.77 0.28 - >>> from etna.datasets import generate_hierarchical_df >>> pd.options.display.width = 0 >>> df = generate_hierarchical_df(periods=100, n_segments=[2, 4], start_time="2021-01-01",) >>> df, hierarchical_structure = TSDataset.to_hierarchical_dataset(df=df, level_columns=["level_0", "level_1"]) >>> tsdataset = TSDataset(df=df, freq="D", hierarchical_structure=hierarchical_structure) >>> tsdataset.head(5) segment l0s0_l1s3 l0s1_l1s0 l0s1_l1s1 l0s1_l1s2 feature target target target target timestamp 2021-01-01 2.07 1.62 -0.45 -0.40 2021-01-02 0.59 1.01 0.78 0.42 2021-01-03 -0.24 0.48 1.18 -0.14 2021-01-04 -1.12 -0.59 1.77 1.82 2021-01-05 -1.40 0.28 0.68 0.48 - Init TSDataset. - Parameters:
- df (DataFrame) – dataframe with timeseries in a wide or long format: - DataFrameFormat; it is expected that- dfhas feature named “target”
- freq (DateOffset | str | None) – - frequency of timestamp in df, possible values: - pandas.DateOffsetobject for datetime timestamp
- pandas offset aliases for datetime timestamp 
- None for integer timestamp 
 
- df_exog (DataFrame | None) – dataframe with exogenous data in a wide or long format: - DataFrameFormat
- known_future (Literal['all'] | ~typing.Sequence) – columns in - df_exog[known_future]that are regressors, if “all” value is given, all columns are meant to be regressors
- hierarchical_structure (HierarchicalStructure | None) – Structure of the levels in the hierarchy. If None, there is no hierarchical structure in the dataset. 
 
 - Methods - add_features_from_pandas(df_update[, ...])- Update the dataset with the new columns from pandas dataframe. - add_prediction_intervals(prediction_intervals_df)- Add target components into dataset. - add_target_components(target_components_df)- Add target components into dataset. - create_from_misaligned(df, freq[, df_exog, ...])- Make TSDataset from misaligned data by realigning it according to inferred alignment in - df.- describe([segments])- Overview of the dataset that returns a DataFrame. - drop_features(features[, drop_from_exog])- Drop columns with features from the dataset. - Drop prediction intervals from dataset. - Drop target components from dataset. - fit_transform(transforms)- Fit and apply given transforms to the data. - get_level_dataset(target_level)- Generate new TSDataset on target level. - Get - pandas.DataFramewith prediction intervals.- Get DataFrame with target components. - Check whether dataset has hierarchical structure. - head([n_rows])- Return the first - n_rowsrows.- info([segments])- Overview of the dataset that prints the result. - inverse_transform(transforms)- Apply inverse transform method of transforms to the data. - isnull()- Return dataframe with flag that means if the correspondent element in wide representation of data is null. - Return names of the levels in the hierarchical structure. - make_future(future_steps[, transforms, ...])- Return new TSDataset with features extended into the future. - plot([n_segments, column, segments, start, ...])- Plot of random or chosen segments. - size()- Return size of TSDataset. - tail([n_rows])- Return the last - n_rowsrows.- to_dataset(df)- Convert pandas dataframe to wide format. - to_flatten(df[, features])- Return pandas DataFrame in a long format. - to_hierarchical_dataset(df, level_columns[, ...])- Convert pandas dataframe from long hierarchical to ETNA Dataset format. - to_pandas([flatten, features])- Return pandas DataFrame. - to_torch_dataset(make_samples[, dropna])- Convert the TSDataset to a - torch.Dataset.- train_test_split([train_start, train_end, ...])- Split given df with train-test timestamp indices or size of test set. - transform(transforms)- Apply given transform to the data. - tsdataset_idx_slice([start_idx, end_idx])- Return new TSDataset with integer-location based indexing. - update_features_from_pandas(df_update)- Update the existing columns in the dataset with the new values from pandas dataframe. - Attributes - Return current level of dataframe with exogenous data in hierarchical structure. - Return current level of dataframe in hierarchical structure. - Get list of all features across all segments in dataset. - Return string frequency of timestamp. - Return offset frequency of timestamp. - Shortcut for - pd.core.indexing.IndexSlice- Return columns in - df_exogthat are initially regressors.- Get a tuple with prediction intervals names. - Get list of all regressors across all segments in dataset. - Get list of all segments in dataset. - Get tuple with target components names. - Return TSDataset timestamp index. - add_features_from_pandas(df_update: DataFrame, update_exog: bool = False, regressors: List[str] | None = None)[source]#
- Update the dataset with the new columns from pandas dataframe. - Before updating columns in df, columns of df_update will be cropped by the last timestamp in df. - Parameters:
- df_update (DataFrame) – Dataframe with the new columns in wide ETNA format. 
- update_exog (bool) – If True, update columns also in df_exog. If you wish to add new regressors in the dataset it is recommended to turn on this flag. 
- regressors (List[str] | None) – List of regressors in the passed dataframe. 
 
 
 - add_prediction_intervals(prediction_intervals_df: DataFrame)[source]#
- Add target components into dataset. - Parameters:
- prediction_intervals_df (DataFrame) – Dataframe in a wide format with prediction intervals 
- Raises:
- ValueError: – If dataset already contains prediction intervals 
- ValueError: – If prediction intervals names differ between segments 
 
 
 - add_target_components(target_components_df: DataFrame)[source]#
- Add target components into dataset. - Parameters:
- target_components_df (DataFrame) – Dataframe in a wide format with target components 
- Raises:
- ValueError: – If dataset already contains target components 
- ValueError: – If target components names differ between segments 
- ValueError: – If components don’t sum up to target 
 
 
 - classmethod create_from_misaligned(df: DataFrame, freq: DateOffset | str | None, df_exog: DataFrame | None = None, known_future: Literal['all'] | Sequence = (), future_steps: int = 1, original_timestamp_name: str = 'external_timestamp') TSDataset[source]#
- Make TSDataset from misaligned data by realigning it according to inferred alignment in - df.- This method: - Infers alignment using - infer_alignment(); - Realigns- dfand- df_exogusing inferred alignment using- apply_alignment(); - Creates exog feature with original timestamp using- make_timestamp_df_from_alignment(); - Creates TSDataset from these data.- This method doesn’t work with - hierarchical_structure, because it doesn’t make much sense.- Parameters:
- df (DataFrame) – dataframe with timeseries in a long format: - DataFrameFormat; it is expected that- dfhas feature named “target”
- freq (DateOffset | str | None) – - frequency of timestamp in df, possible values: - pandas.DateOffsetobject for datetime timestamp
- pandas offset aliases for datetime timestamp 
- None for integer timestamp 
 
- df_exog (DataFrame | None) – dataframe with exogenous data in a long format: - DataFrameFormat
- known_future (Literal['all'] | ~typing.Sequence) – columns in - df_exog[known_future]that are regressors, if “all” value is given, all columns are meant to be regressors
- future_steps (int) – determines on how many steps original timestamp should be extended into the future before adding into - df_exog; expected to be positive
- original_timestamp_name (str) – name for original timestamp column to add it into - df_exog
 
- Returns:
- Created TSDataset. 
- Raises:
- ValueError: – If - future_stepsis not positive.
- ValueError: – If - original_timestamp_nameintersects with columns in- df_exog.
- ValueError: – Parameter - dfisn’t in a long format.
- ValueError: – Parameter - df_exogisn’t in a long format if it set.
 
- Return type:
 
 - describe(segments: Sequence[str] | None = None) DataFrame[source]#
- Overview of the dataset that returns a DataFrame. - Method describes dataset in segment-wise fashion. Description columns: - start_timestamp: beginning of the segment, missing values in the beginning are ignored 
- end_timestamp: ending of the dataset, common for all segments 
- length: length according to - start_timestampand- end_timestamp
- num_missing: number of missing variables between - start_timestampand- end_timestamp
- num_segments: total number of segments, common for all segments 
- num_exogs: number of exogenous features, common for all segments 
- num_regressors: number of exogenous factors, that are regressors, common for all segments 
- num_known_future: number of regressors, that are known since creation, common for all segments 
- freq: frequency of the series, common for all segments 
 - Parameters:
- segments (Sequence[str] | None) – segments to show in overview, if None all segments are shown. 
- Returns:
- result_table – table with results of the overview 
- Return type:
- pd.DataFrame 
 - Examples - >>> from etna.datasets import generate_const_df >>> pd.options.display.expand_frame_repr = False >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset(df, df_exog=df_exog, freq="D", known_future="all") >>> ts.describe() start_timestamp end_timestamp length num_missing num_segments num_exogs num_regressors num_known_future freq segments segment_0 2021-06-01 2021-06-30 30 0 2 1 1 1 D segment_1 2021-06-01 2021-06-30 30 0 2 1 1 1 D 
 - drop_features(features: List[str], drop_from_exog: bool = False)[source]#
- Drop columns with features from the dataset. - Parameters:
- Raises:
- ValueError: – If - featureslist contains target or target components
 
 - fit_transform(transforms: Sequence[Transform])[source]#
- Fit and apply given transforms to the data. - Parameters:
- transforms (Sequence[Transform]) – 
 
 - get_prediction_intervals() DataFrame | None[source]#
- Get - pandas.DataFramewith prediction intervals.- Returns:
- pandas.DataFramewith prediction intervals for target variable.
- Return type:
- DataFrame | None 
 
 - get_target_components() DataFrame | None[source]#
- Get DataFrame with target components. - Returns:
- Dataframe with target components 
- Return type:
- DataFrame | None 
 
 - head(n_rows: int = 5) DataFrame[source]#
- Return the first - n_rowsrows.- Mimics pandas method. - This function returns the first - n_rowsrows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.- For negative values of - n_rows, this function returns all rows except the last- n_rowsrows, equivalent to- df[:-n_rows].- Parameters:
- n_rows (int) – number of rows to select. 
- Returns:
- the first - n_rowsrows or 5 by default.
- Return type:
- pd.DataFrame 
 
 - info(segments: Sequence[str] | None = None) None[source]#
- Overview of the dataset that prints the result. - Method describes dataset in segment-wise fashion. - Information about dataset in general: - num_segments: total number of segments 
- num_exogs: number of exogenous features 
- num_regressors: number of exogenous factors, that are regressors 
- num_known_future: number of regressors, that are known since creation 
- freq: frequency of the dataset 
- end_timestamp: ending of the dataset 
 - Information about individual segments: - start_timestamp: beginning of the segment, missing values in the beginning are ignored 
- length: length according to - start_timestampand- end_timestamp
- num_missing: number of missing variables between - start_timestampand- end_timestamp
 - Parameters:
- segments (Sequence[str] | None) – segments to show in overview, if None all segments are shown. 
- Return type:
- None 
 - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset(df, df_exog=df_exog, freq="D", known_future="all") >>> ts.info() <class 'etna.datasets.TSDataset'> num_segments: 2 num_exogs: 1 num_regressors: 1 num_known_future: 1 freq: D end_timestamp: 2021-06-30 00:00:00 start_timestamp length num_missing segments segment_0 2021-06-01 30 0 segment_1 2021-06-01 30 0 
 - inverse_transform(transforms: Sequence[Transform])[source]#
- Apply inverse transform method of transforms to the data. - Applied in reversed order. - Parameters:
- transforms (Sequence[Transform]) – 
 
 - isnull() DataFrame[source]#
- Return dataframe with flag that means if the correspondent element in wide representation of data is null. - Wide representation could be obtained by using - self.to_pandas().- Returns:
- is_null dataframe 
- Return type:
- pd.Dataframe 
 
 - make_future(future_steps: int, transforms: Sequence[Transform] = (), tail_steps: int = 0) TSDataset[source]#
- Return new TSDataset with features extended into the future. - Notes - The result dataset doesn’t contain prediction intervals and target components. Some columns and modifications may be lost if a transformed dataset is used to make future. This behavior is due to the usage of an initial state of the dataset to compute the future. - Parameters:
- Returns:
- dataset with features extended into the. 
- Return type:
 - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df_regressors = pd.DataFrame({ ... "timestamp": list(pd.date_range("2021-06-01", periods=40))*2, ... "regressor_1": np.arange(80), "regressor_2": np.arange(80) + 5, ... "segment": ["segment_0"]*40 + ["segment_1"]*40 ... }) >>> ts = TSDataset( ... df, "D", df_exog=df_regressors, known_future="all" ... ) >>> ts.make_future(4) segment segment_0 segment_1 feature regressor_1 regressor_2 target regressor_1 regressor_2 target timestamp 2021-07-01 30 35 NaN 70 75 NaN 2021-07-02 31 36 NaN 71 76 NaN 2021-07-03 32 37 NaN 72 77 NaN 2021-07-04 33 38 NaN 73 78 NaN 
 - plot(n_segments: int = 10, column: str = 'target', segments: Sequence[str] | None = None, start: Timestamp | int | str | None = None, end: Timestamp | int | str | None = None, seed: int = 1, figsize: Tuple[int, int] = (10, 5))[source]#
- Plot of random or chosen segments. - Parameters:
- n_segments (int) – number of random segments to plot 
- column (str) – feature to plot 
- seed (int) – seed for local random state 
- start (Timestamp | int | str | None) – start plot from this timestamp 
- end (Timestamp | int | str | None) – end plot at this timestamp 
- figsize (Tuple[int, int]) – size of the figure per subplot with one segment in inches 
 
- Raises:
- ValueError: – Incorrect type of - startor- endis used according to- freq
 
 - size() Tuple[int, int, int | None][source]#
- Return size of TSDataset. - The order of sizes is (number of time series, number of segments, number of features). 
 - tail(n_rows: int = 5) DataFrame[source]#
- Return the last - n_rowsrows.- Mimics pandas method. - This function returns last - n_rowsrows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.- For negative values of - n_rows, this function returns all rows except the first n rows, equivalent to- df[n_rows:].- Parameters:
- n_rows (int) – number of rows to select. 
- Returns:
- the last - n_rowsrows or 5 by default.
- Return type:
- pd.DataFrame 
 
 - static to_dataset(df: DataFrame) DataFrame[source]#
- Convert pandas dataframe to wide format. - Columns “timestamp” and “segment” are required. - Parameters:
- df (DataFrame) – DataFrame with columns [“timestamp”, “segment”]. Other columns considered features. Columns “timestamp” is expected to be one of two types: integer or timestamp. 
- Return type:
 - Notes - During conversion segment is casted to string type. - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> df_wide = TSDataset.to_dataset(df) >>> df_wide.head(5) segment segment_0 segment_1 feature target target timestamp 2021-06-01 1.00 1.00 2021-06-02 1.00 1.00 2021-06-03 1.00 1.00 2021-06-04 1.00 1.00 2021-06-05 1.00 1.00 - >>> df_regressors = pd.DataFrame({ ... "timestamp": pd.date_range("2021-01-01", periods=10), ... "regressor_1": np.arange(10), "regressor_2": np.arange(10) + 5, ... "segment": ["segment_0"]*10 ... }) >>> TSDataset.to_dataset(df_regressors).head(5) segment segment_0 feature regressor_1 regressor_2 timestamp 2021-01-01 0 5 2021-01-02 1 6 2021-01-03 2 7 2021-01-04 3 8 2021-01-05 4 9 
 - static to_flatten(df: DataFrame, features: Literal['all'] | Sequence[str] = 'all') DataFrame[source]#
- Return pandas DataFrame in a long format. - The order of columns is (timestamp, segment, target, features in alphabetical order). - Parameters:
- Returns:
- dataframe with TSDataset data 
- Return type:
- pd.DataFrame 
 - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> df_wide = TSDataset.to_dataset(df) >>> TSDataset.to_flatten(df_wide).head(5) timestamp segment target 0 2021-06-01 segment_0 1.0 1 2021-06-02 segment_0 1.0 2 2021-06-03 segment_0 1.0 3 2021-06-04 segment_0 1.0 4 2021-06-05 segment_0 1.0 
 - static to_hierarchical_dataset(df: DataFrame, level_columns: List[str], keep_level_columns: bool = False, sep: str = '_', return_hierarchy: bool = True) Tuple[DataFrame, HierarchicalStructure | None][source]#
- Convert pandas dataframe from long hierarchical to ETNA Dataset format. - Parameters:
- df (DataFrame) – Dataframe in long hierarchical format with columns [timestamp, target] + [level_columns] + [other_columns] 
- level_columns (List[str]) – Columns of dataframe defines the levels in the hierarchy in order from top to bottom i.e [level_name_1, level_name_2, …]. Names of the columns will be used as names of the levels in hierarchy. 
- keep_level_columns (bool) – If true, leave the level columns in the result dataframe. By default level columns are concatenated into “segment” column and dropped 
- sep (str) – String to concatenated the level names with 
- return_hierarchy (bool) – If true, returns the hierarchical structure 
 
- Returns:
- Dataframe in wide format and optionally hierarchical structure 
- Raises:
- ValueError – If - level_columnsis empty
- Return type:
- Tuple[DataFrame, HierarchicalStructure | None] 
 
 - to_pandas(flatten: bool = False, features: Literal['all'] | Sequence[str] = 'all') DataFrame[source]#
- Return pandas DataFrame. - Parameters:
- flatten (bool) – - If False, return dataframe in a wide format 
- If True, return dataframe in a long format, its order of columns is (timestamp, segment, target, features in alphabetical order). 
 
- features (Literal['all'] | ~typing.Sequence[str]) – List of features to return. If “all”, return all the features in the dataset. 
 
- Returns:
- dataframe with TSDataset data 
- Return type:
- pd.DataFrame 
 - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> df.head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> ts = TSDataset(df, "D") >>> ts.to_pandas(True).head(5) timestamp segment target 0 2021-06-01 segment_0 1.00 1 2021-06-02 segment_0 1.00 2 2021-06-03 segment_0 1.00 3 2021-06-04 segment_0 1.00 4 2021-06-05 segment_0 1.00 >>> ts.to_pandas(False).head(5) segment segment_0 segment_1 feature target target timestamp 2021-06-01 1.00 1.00 2021-06-02 1.00 1.00 2021-06-03 1.00 1.00 2021-06-04 1.00 1.00 2021-06-05 1.00 1.00 
 - to_torch_dataset(make_samples: Callable[[DataFrame], Iterator[dict] | Iterable[dict]], dropna: bool = True) Dataset[source]#
- Convert the TSDataset to a - torch.Dataset.
 - train_test_split(train_start: Timestamp | int | str | None = None, train_end: Timestamp | int | str | None = None, test_start: Timestamp | int | str | None = None, test_end: Timestamp | int | str | None = None, test_size: int | None = None) Tuple[TSDataset, TSDataset][source]#
- Split given df with train-test timestamp indices or size of test set. - In case of inconsistencies between - test_sizeand (- test_start,- test_end),- test_sizeis ignored- During splitting all the features are kept in train and test parts including target, regressors, target components, prediction intervals. - Parameters:
- train_start (Timestamp | int | str | None) – start timestamp of new train dataset, if None first timestamp is used 
- train_end (Timestamp | int | str | None) – end timestamp of new train dataset, if None previous to - test_starttimestamp is used
- test_start (Timestamp | int | str | None) – start timestamp of new test dataset, if None next to - train_endtimestamp is used
- test_end (Timestamp | int | str | None) – end timestamp of new test dataset, if None last timestamp is used 
- test_size (int | None) – number of timestamps to use in test set 
 
- Returns:
- generated datasets 
- Return type:
- train, test 
- Raises:
- ValueError: – Incorrect type of - train_startor- train_endor- test_startor- test_endis used according to- ts.freq
 - Examples - >>> from etna.datasets import generate_ar_df >>> pd.options.display.float_format = '{:,.2f}'.format >>> df = generate_ar_df(100, start_time="2021-01-01", n_segments=3) >>> ts = TSDataset(df, "D") >>> train_ts, test_ts = ts.train_test_split( ... train_start="2021-01-01", train_end="2021-02-01", ... test_start="2021-02-02", test_end="2021-02-07" ... ) >>> train_ts.tail(5) segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-01-28 -2.06 2.03 1.51 2021-01-29 -2.33 0.83 0.81 2021-01-30 -1.80 1.69 0.61 2021-01-31 -2.49 1.51 0.85 2021-02-01 -2.89 0.91 1.06 >>> test_ts.head(5) segment segment_0 segment_1 segment_2 feature target target target timestamp 2021-02-02 -3.57 -0.32 1.72 2021-02-03 -4.42 0.23 3.51 2021-02-04 -5.09 1.02 3.39 2021-02-05 -5.10 0.40 2.15 2021-02-06 -6.22 0.92 0.97 
 - transform(transforms: Sequence[Transform])[source]#
- Apply given transform to the data. - Parameters:
- transforms (Sequence[Transform]) – 
 
 - tsdataset_idx_slice(start_idx: int | None = None, end_idx: int | None = None) TSDataset[source]#
- Return new TSDataset with integer-location based indexing. 
 - update_features_from_pandas(df_update: DataFrame)[source]#
- Update the existing columns in the dataset with the new values from pandas dataframe. - Before updating columns in - df, columns of- df_updatewill be cropped by the last timestamp in- df. Columns in- df_exogare not updated. If you wish to update the- df_exog, create the new instance of TSDataset.- Updating - dfwith- df_updatewith different corresponding column dtypes could lead to unexpected behaviour in different- pandasversions.- Parameters:
- df_update (DataFrame) – Dataframe with new values in wide ETNA format. 
- Raises:
- ValueError: – If timestamps do not match 
- ValueError: – If there are columns in the update dataframe that are not presented in the dataset 
- ValueError: – If there are duplicate features in the dataset (columns with the same name) 
 
 
 - property current_df_exog_level: str | None[source]#
- Return current level of dataframe with exogenous data in hierarchical structure. - Returns:
- Level of dataframe with exogenous data 
- Return type:
- str or None 
 
 - property current_df_level: str | None[source]#
- Return current level of dataframe in hierarchical structure. - Returns:
- Level of dataframe 
- Return type:
- str or None 
 
 - property features: List[str][source]#
- Get list of all features across all segments in dataset. - All features include initial exogenous data, generated features, target, target components, prediction intervals. The order of features in returned list isn’t specified. - Returns:
- List of features. 
 
 - property freq: str | None[source]#
- Return string frequency of timestamp. - Returns:
- String frequency of timestamp. 
- Return type:
- str or None 
 
 - property freq_offset: DateOffset | None[source]#
- Return offset frequency of timestamp. - Returns:
- Offset frequency of timestamp. 
- Return type:
- BaseOffset or None 
 
 - property known_future: List[str][source]#
- Return columns in - df_exogthat are initially regressors.- Returns:
- List of regressor columns 
 
 - property prediction_intervals_names: Tuple[str, ...][source]#
- Get a tuple with prediction intervals names. Return an empty tuple in the case of intervals absence. 
 - property regressors: List[str][source]#
- Get list of all regressors across all segments in dataset. - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> regressors_timestamp = pd.date_range(start="2021-06-01", periods=50) >>> df_regressors_1 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 1, "segment": "segment_0"} ... ) >>> df_regressors_2 = pd.DataFrame( ... {"timestamp": regressors_timestamp, "regressor_1": 2, "segment": "segment_1"} ... ) >>> df_exog = pd.concat([df_regressors_1, df_regressors_2], ignore_index=True) >>> ts = TSDataset( ... df, df_exog=df_exog, freq="D", known_future="all" ... ) >>> ts.regressors ['regressor_1'] 
 - property segments: List[str][source]#
- Get list of all segments in dataset. - Examples - >>> from etna.datasets import generate_const_df >>> df = generate_const_df( ... periods=30, start_time="2021-06-01", ... n_segments=2, scale=1 ... ) >>> ts = TSDataset(df, "D") >>> ts.segments ['segment_0', 'segment_1']