etna.datasets.DataFrameFormat#

class DataFrameFormat(value)[source]#

Bases: str, Enum

Enum for different kinds of pd.DataFrame which can be used.

This dataframe stores:

  • Timestamps;

  • Segments;

  • Features. In this context, ‘target’ is also a feature.

Currently, there are formats:

  • Wide

    • Has index to store timestamps.

    • Columns has two levels with names ‘segment’, ‘feature’. Each column stores values for a given feature in a given segment.

    • List of columns isn’t empty.

    • There are all combinations for (segment, feature) in the columns.

  • Long

    • Has column ‘timestamp’ to store timestamps.

    • Has column ‘segment’ to store segments.

    • Has at least one more column except for ‘timestamp’ and ‘segment’.

Currently, we don’t check the types of columns to save compatibility, but it is expected that:

  • Timestamps have type int or pd.Timestamp. If it isn’t, TSDataset makes conversion for you.

  • Segments have type str. If it isn’t, TSDataset makes conversion for you.

Methods

determine(df)

Determine format of the given dataframe.

Attributes

wide

Wide format.

long

Long format.

classmethod determine(df: DataFrame) DataFrameFormat[source]#

Determine format of the given dataframe.

Parameters:

df (DataFrame) – Dataframe to infer format.

Returns:

Format of the given dataframe.

Raises:
  • ValueError: – Given long dataframe doesn’t have required column ‘timestamp’

  • ValueError: – Given long dataframe doesn’t have required column ‘segment’

  • ValueError: – Given long dataframe doesn’t have any columns except for ‘timestamp` and ‘segment’

  • ValueError: – Given wide dataframe doesn’t have levels of columns [‘segment’, ‘feature’]

  • ValueError: – Given wide dataframe doesn’t have any features

  • ValueError: – Given wide dataframe doesn’t have all combinations of pairs (segment, feature)

Return type:

DataFrameFormat

long = 'long'[source]#

Long format.

wide = 'wide'[source]#

Wide format.