{ "cells": [ { "cell_type": "markdown", "id": "2c8984e0-0792-4cf8-b3c6-446b45b717f2", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Embedding models\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/210-embedding_models.ipynb)" ] }, { "cell_type": "markdown", "id": "94e7669f-de54-4df8-86ba-aa72c6d5fb55", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "This notebooks contains examples with embedding models.\n", "\n", "**Table of contents**\n", "\n", "* [Using embedding models directly](#chapter1) \n", "* [Using embedding models with transforms](#chapter2)\n", " * [Baseline](#section_2_1)\n", " * [EmbeddingSegmentTransform](#section_2_2)\n", " * [EmbeddingWindowTransform](#section_2_3)\n", "* [Saving and loading models](#chapter3)\n", "* [Loading external pretrained models](#chapter4)" ] }, { "cell_type": "code", "execution_count": 1, "id": "bf32c6a9-f920-4888-ac9d-f4a1c454cd91", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "d732e5b1-2c10-4de3-93ce-c6395ddbd4f1", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 1. Using embedding models directly " ] }, { "cell_type": "markdown", "id": "4c63da5a-eed8-472b-9786-9884a5bb78d1", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We have two models to generate embeddings for time series: `TS2VecEmbeddingModel` and `TSTCCEmbeddingModel`.\n", "\n", "Each model has following methods:\n", "\n", "- `fit` to train model:\n", "- `encode_segment` to generate embeddings for the whole series. These features are regressors.\n", "- `encode_window` to generate embeddings for each timestamp. These features aren't regressors and lag transformation should be applied to them before using in forecasting.\n", "- `freeze` to enable or disable skipping training in `fit` method. It is useful, for example, when you have a pretrained model and you want only to generate embeddings without new training during `backtest`.\n", "- `save` and `load` to save and load pretrained models, respectively." ] }, { "cell_type": "code", "execution_count": 2, "id": "d5ec9757-dd5a-423c-9be1-e4835b4b2a03", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Seed set to 42\n" ] }, { "data": { "text/plain": [ "42" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from lightning.pytorch import seed_everything\n", "\n", "seed_everything(42, workers=True)" ] }, { "cell_type": "code", "execution_count": 3, "id": "f99c90c5-8a8b-481a-848f-ebcb00b22bb0", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
segmentsegment_0segment_1segment_2
featuretargettargettarget
timestamp
2001-01-011.6243451.462108-1.100619
2001-01-021.012589-0.5980330.044105
2001-01-030.484417-0.9204500.945695
2001-01-04-0.588551-1.3045041.448190
2001-01-050.276856-0.1707352.349046
\n", "
" ], "text/plain": [ "segment segment_0 segment_1 segment_2\n", "feature target target target\n", "timestamp \n", "2001-01-01 1.624345 1.462108 -1.100619\n", "2001-01-02 1.012589 -0.598033 0.044105\n", "2001-01-03 0.484417 -0.920450 0.945695\n", "2001-01-04 -0.588551 -1.304504 1.448190\n", "2001-01-05 0.276856 -0.170735 2.349046" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from etna.datasets import TSDataset\n", "from etna.datasets import generate_ar_df\n", "\n", "df = generate_ar_df(periods=10, start_time=\"2001-01-01\", n_segments=3)\n", "ts = TSDataset(df, freq=\"D\")\n", "ts.head()" ] }, { "cell_type": "markdown", "id": "9712e58c-73fe-475e-807b-ae082752fcf8", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Now let's work with models directly.\n", "\n", "They are expecting array with shapes\n", "(n_segments, n_timestamps, num_features). The example shows working with `TS2VecEmbeddingModel`, it is all the same with `TSTCCEmbeddingModel`." ] }, { "cell_type": "code", "execution_count": 4, "id": "05a191ee-17dd-4cb1-a993-73aee7706272", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(3, 10, 1)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = ts.df.values.reshape(ts.size()).transpose(1, 0, 2)\n", "x.shape" ] }, { "cell_type": "code", "execution_count": 5, "id": "0263277f-b642-4c1b-8f19-a42520d6d09e", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(3, 2)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from etna.transforms.embeddings.models import TS2VecEmbeddingModel\n", "from etna.transforms.embeddings.models import TSTCCEmbeddingModel\n", "\n", "model_ts2vec = TS2VecEmbeddingModel(input_dims=1, output_dims=2)\n", "model_ts2vec.fit(x, n_epochs=1)\n", "segment_embeddings = model_ts2vec.encode_segment(x)\n", "segment_embeddings.shape" ] }, { "cell_type": "markdown", "id": "26329bf0-e955-46ad-9962-4ea1295ef671", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "As we are using `encode_segment` we get `output_dims` features consisting of one value for each segment.\n", "\n", "And what about `encode_window`?" ] }, { "cell_type": "code", "execution_count": 6, "id": "9a307886-cdf2-4e98-9a8e-3917741f287c", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(3, 10, 2)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "window_embeddings = model_ts2vec.encode_window(x)\n", "window_embeddings.shape" ] }, { "cell_type": "markdown", "id": "aded40fc-382c-4f7a-901b-8498f9258c3b", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We get `output_dims` features consisting of `n_timestamps` values for each segment." ] }, { "cell_type": "markdown", "id": "b2d599a0", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "You can change some attributes of the model after initialization, for example `device`, `batch_size` or `num_workers`." ] }, { "cell_type": "code", "execution_count": 7, "id": "6ae8c79d", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "model_ts2vec.device = \"cuda\"" ] }, { "cell_type": "markdown", "id": "ffbb2210-d77f-426e-91b2-3729544ce872", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 2. Using embedding models with transforms " ] }, { "cell_type": "markdown", "id": "459e90a9-97fb-4922-bc6a-52500b3a132e", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "In this section we will test our models on example." ] }, { "cell_type": "code", "execution_count": 8, "id": "b7827e25-4597-451a-88f8-5e0475556041", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "HORIZON = 6" ] }, { "cell_type": "markdown", "id": "17955757-7585-4db0-889b-dbd978339822", "metadata": { "pycharm": { "name": "#%% md\n" }, "tags": [] }, "source": [ "### 2.1 Baseline " ] }, { "cell_type": "markdown", "id": "8c92c86f-fce7-442b-a7f4-0c024344bec9", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Before working with embedding features, let's make forecasts using usual features." ] }, { "cell_type": "code", "execution_count": 9, "id": "21ac6694-1c3c-4fdc-a96a-3d0544ee90df", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
segmentM1000_MACROM1001_MACROM1002_MACROM1003_MACROM1004_MACROM1005_MACROM1006_MACROM1007_MACROM1008_MACROM1009_MACRO...M992_MACROM993_MACROM994_MACROM995_MACROM996_MACROM997_MACROM998_MACROM999_MACROM99_MICROM9_MICRO
featuretargettargettargettargettargettargettargettargettargettarget...targettargettargettargettargettargettargettargettargettarget
timestamp
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 1428 columns

\n", "
" ], "text/plain": [ "segment M1000_MACRO M1001_MACRO M1002_MACRO M1003_MACRO M1004_MACRO \\\n", "feature target target target target target \n", "timestamp \n", "0 NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN \n", "\n", "segment M1005_MACRO M1006_MACRO M1007_MACRO M1008_MACRO M1009_MACRO ... \\\n", "feature target target target target target ... \n", "timestamp ... \n", "0 NaN NaN NaN NaN NaN ... \n", "1 NaN NaN NaN NaN NaN ... \n", "2 NaN NaN NaN NaN NaN ... \n", "3 NaN NaN NaN NaN NaN ... \n", "4 NaN NaN NaN NaN NaN ... \n", "\n", "segment M992_MACRO M993_MACRO M994_MACRO M995_MACRO M996_MACRO M997_MACRO \\\n", "feature target target target target target target \n", "timestamp \n", "0 NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN \n", "\n", "segment M998_MACRO M999_MACRO M99_MICRO M9_MICRO \n", "feature target target target target \n", "timestamp \n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", "[5 rows x 1428 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from etna.datasets import load_dataset\n", "\n", "ts = load_dataset(\"m3_monthly\")\n", "ts = TSDataset(ts.to_pandas(features=[\"target\"]), freq=None)\n", "ts.head()" ] }, { "cell_type": "code", "execution_count": 10, "id": "fe224f12-6b86-4513-8d61-3fa0cb895eb1", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 5.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 10.3s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 15.5s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 15.5s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.8s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 1.7s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 2.5s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 2.5s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n" ] } ], "source": [ "from etna.metrics import SMAPE\n", "from etna.models import CatBoostMultiSegmentModel\n", "from etna.pipeline import Pipeline\n", "from etna.transforms import LagTransform\n", "\n", "model = CatBoostMultiSegmentModel()\n", "\n", "lag_transform = LagTransform(in_column=\"target\", lags=list(range(HORIZON, HORIZON + 6)), out_column=\"lag\")\n", "\n", "pipeline = Pipeline(model=model, transforms=[lag_transform], horizon=HORIZON)\n", "metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=3)" ] }, { "cell_type": "code", "execution_count": 11, "id": "bfbab09d-eb27-4529-8954-3dc0e471668a", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 14.719683971886594\n" ] } ], "source": [ "print(\"SMAPE: \", metrics_df[\"SMAPE\"].mean())" ] }, { "cell_type": "markdown", "id": "efa358d6-c1a0-460a-b1a2-7a123b5b4eec", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### 2.2 EmbeddingSegmentTransform " ] }, { "cell_type": "markdown", "id": "8f7ca5fd-4186-4bf4-ac32-0bace7802ca9", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "`EmbeddingSegmentTransform` calls models' `encode_segment` method inside." ] }, { "cell_type": "code", "execution_count": 12, "id": "f05f8f02-4d24-4438-ac15-9ed45e2e4f78", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from etna.transforms import EmbeddingSegmentTransform\n", "from etna.transforms.embeddings.models import BaseEmbeddingModel\n", "\n", "\n", "def forecast_with_segment_embeddings(\n", " emb_model: BaseEmbeddingModel, training_params: dict = {}, n_folds: int = 3\n", ") -> float:\n", " model = CatBoostMultiSegmentModel()\n", "\n", " emb_transform = EmbeddingSegmentTransform(\n", " in_columns=[\"target\"], embedding_model=emb_model, training_params=training_params, out_column=\"emb\"\n", " )\n", " pipeline = Pipeline(model=model, transforms=[lag_transform, emb_transform], horizon=HORIZON)\n", " metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=n_folds)\n", " smape_score = metrics_df[\"SMAPE\"].mean()\n", " return smape_score" ] }, { "cell_type": "markdown", "id": "6bc237e1-d2e3-48ee-99b5-ac35b957717b", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "You can see training parameters of the model to pass it to transform.\n", "\n", "Let's begin with `TSTCCEmbeddingModel`" ] }, { "cell_type": "code", "execution_count": 13, "id": "3e6cc297-48bd-4614-bcbe-44e5732bf3a8", "metadata": { "pycharm": { "name": "#%%\n" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "\u001B[0;31mSignature:\u001B[0m\n", "\u001B[0mTSTCCEmbeddingModel\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mfit\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mx\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mnumpy\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mndarray\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mn_epochs\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mint\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m40\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mlr\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mfloat\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m0.001\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mtemperature\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mfloat\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m0.2\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mlambda1\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mfloat\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m1\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mlambda2\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mfloat\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;36m0.7\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m \u001B[0mverbose\u001B[0m\u001B[0;34m:\u001B[0m \u001B[0mbool\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0;32mFalse\u001B[0m\u001B[0;34m,\u001B[0m\u001B[0;34m\u001B[0m\n", "\u001B[0;34m\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;34m->\u001B[0m \u001B[0;34m'TSTCCEmbeddingModel'\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n", "\u001B[0;31mDocstring:\u001B[0m\n", "Fit TSTCC embedding model.\n", "\n", "Parameters\n", "----------\n", "x:\n", " data with shapes (n_segments, n_timestamps, input_dims).\n", "n_epochs:\n", " The number of epochs. When this reaches, the training stops.\n", "lr:\n", " The learning rate.\n", "temperature:\n", " Temperature in NTXentLoss.\n", "lambda1:\n", " The relative weight of the first item in the loss (temporal contrasting loss).\n", "lambda2:\n", " The relative weight of the second item in the loss (contextual contrasting loss).\n", "verbose:\n", " Whether to print the training loss after each epoch.\n", "\u001B[0;31mFile:\u001B[0m ~/PycharmProjects/etna/etna/transforms/embeddings/models/tstcc.py\n", "\u001B[0;31mType:\u001B[0m function" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "?TSTCCEmbeddingModel.fit" ] }, { "cell_type": "code", "execution_count": 14, "id": "516b209e-7bd2-45c6-8db0-b1708ffda0fc", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 1.0min\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 2.2min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 3.3min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 3.3min\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 4.4s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 8.6s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 13.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 13.0s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n" ] } ], "source": [ "import torch\n", "\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=16, depth=3, output_dims=6, device=device)\n", "training_params = {\"n_epochs\": 10}\n", "smape_score = forecast_with_segment_embeddings(emb_model, training_params)" ] }, { "cell_type": "code", "execution_count": 15, "id": "e673ff95-fdc2-4751-9025-98940f73211d", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 14.039866723276173\n" ] } ], "source": [ "print(\"SMAPE: \", smape_score)" ] }, { "cell_type": "markdown", "id": "e5e35346-6f16-4d42-8a33-b98bf5679046", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Better then without embeddings. Let's try `TS2VecEmbeddingModel`." ] }, { "cell_type": "code", "execution_count": 16, "id": "6b7e43f6-9ce3-4a3f-b6e9-70ed46b272e5", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 38.8s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 1.3min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 1.9min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 1.9min\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 3.1s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 6.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 9.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 9.1s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n" ] } ], "source": [ "emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=16, depth=3, output_dims=6, device=device)\n", "training_params = {\"n_epochs\": 10}\n", "smape_score = forecast_with_segment_embeddings(emb_model, training_params)" ] }, { "cell_type": "code", "execution_count": 17, "id": "5688ea80-5d6c-414a-89a7-7ec144d09b4f", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 13.66458975722417\n" ] } ], "source": [ "print(\"SMAPE: \", smape_score)" ] }, { "cell_type": "markdown", "id": "da58c577-3519-41a7-b63c-1da896121954", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Much better. Now let's try another transform." ] }, { "cell_type": "markdown", "id": "949a07ba-548f-4b09-bcba-a07f76c9d501", "metadata": { "pycharm": { "name": "#%% md\n" }, "tags": [] }, "source": [ "### 2.3 EmbeddingWindowTransform " ] }, { "cell_type": "markdown", "id": "dc3ef834-fcd7-4ee1-85e9-a8ace8dde8a4", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "`EmbeddingWindowTransform` calls models' `encode_window` method inside. As we have discussed, these features are not regressors and should be used as lags for future." ] }, { "cell_type": "code", "execution_count": 18, "id": "b39d1abe-42f9-44ff-af5e-f93d08c0ac02", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "from etna.transforms import EmbeddingWindowTransform\n", "from etna.transforms import FilterFeaturesTransform\n", "\n", "\n", "def forecast_with_window_embeddings(emb_model: BaseEmbeddingModel, training_params: dict) -> float:\n", " model = CatBoostMultiSegmentModel()\n", "\n", " output_dims = emb_model.output_dims\n", "\n", " emb_transform = EmbeddingWindowTransform(\n", " in_columns=[\"target\"], embedding_model=emb_model, training_params=training_params, out_column=\"embedding_window\"\n", " )\n", " lag_emb_transforms = [\n", " LagTransform(in_column=f\"embedding_window_{i}\", lags=[HORIZON], out_column=f\"lag_emb_{i}\")\n", " for i in range(output_dims)\n", " ]\n", " filter_transforms = FilterFeaturesTransform(exclude=[f\"embedding_window_{i}\" for i in range(output_dims)])\n", "\n", " transforms = [lag_transform] + [emb_transform] + lag_emb_transforms + [filter_transforms]\n", "\n", " pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)\n", " metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=3)\n", " smape_score = metrics_df[\"SMAPE\"].mean()\n", " return smape_score" ] }, { "cell_type": "code", "execution_count": 19, "id": "5e663aa0-778d-4393-80e6-7ef888210ec5", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 1.4min\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 2.7min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 4.2min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 4.2min\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 18.4s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 37.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 55.4s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 55.4s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n" ] } ], "source": [ "emb_model = TSTCCEmbeddingModel(input_dims=1, tc_hidden_dim=16, depth=3, output_dims=6, device=device)\n", "training_params = {\"n_epochs\": 10}\n", "smape_score = forecast_with_window_embeddings(emb_model, training_params)" ] }, { "cell_type": "code", "execution_count": 20, "id": "d07041ad-698c-4b07-b339-16aed9856129", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 116.00285960604187\n" ] } ], "source": [ "print(\"SMAPE: \", smape_score)" ] }, { "cell_type": "markdown", "id": "41d78b2c-7574-4b0b-806c-adb5db324998", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Oops... What about `TS2VecEmbeddingModel`?" ] }, { "cell_type": "code", "execution_count": 21, "id": "fab4711f-7b9d-4263-abbd-05a3cedcaaef", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 47.9s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 1.7min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 2.6min\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 2.6min\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 17.2s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 34.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 51.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 51.0s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 2 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n", "[Parallel(n_jobs=1)]: Done 3 tasks | elapsed: 0.1s\n" ] } ], "source": [ "emb_model = TS2VecEmbeddingModel(input_dims=1, hidden_dims=16, depth=3, output_dims=6, device=device)\n", "training_params = {\"n_epochs\": 10}\n", "smape_score = forecast_with_window_embeddings(emb_model, training_params)" ] }, { "cell_type": "code", "execution_count": 22, "id": "fd890c55-ea57-4f51-a2e1-320cc4111b46", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 31.71456493613132\n" ] } ], "source": [ "print(\"SMAPE: \", smape_score)" ] }, { "cell_type": "markdown", "id": "3a73e55e-1e1b-4ca2-a42e-62a68142c517", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Window embeddings don't help with this dataset. It means that you should try both models and both transforms to get the best results." ] }, { "cell_type": "markdown", "id": "84fcd1b8-e61a-40d4-a80a-9c558637a8d4", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 3. Saving and loading models \n" ] }, { "cell_type": "markdown", "id": "c5f0bc45-6388-4e92-bdc1-d8090af66b26", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "If you have a pretrained embedding model and aren't going to train it on calling `fit`, you should \"freeze\" training loop. It is helpful for using the model inside transforms, which call `fit` method on each `fit` of the pipeline." ] }, { "cell_type": "code", "execution_count": 23, "id": "1d5fb109-b1c7-431c-a5bc-b2eeddc311f3", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "MODEL_PATH = \"model.zip\"" ] }, { "cell_type": "code", "execution_count": 24, "id": "24229c75-5e9a-4ff8-a7f4-1c723c62fc9e", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "emb_model.freeze()\n", "emb_model.save(MODEL_PATH)" ] }, { "cell_type": "markdown", "id": "e7a0275d-47ca-4aa4-a312-d02fcd06a7ae", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Now you are ready to load pretrained model. " ] }, { "cell_type": "code", "execution_count": 25, "id": "5d3f522a-dc2d-46d4-a28b-8a1524793874", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "model_loaded = TS2VecEmbeddingModel.load(MODEL_PATH)" ] }, { "cell_type": "markdown", "id": "98e5325a-abe1-4c88-9f2b-fb61ef5d110e", "metadata": { "pycharm": { "name": "#%% md\n" }, "tags": [] }, "source": [ "If you need to fine-tune pretrained model, you should \"unfreeze\" training loop. After that it will start fitting on calling `fit` method." ] }, { "cell_type": "code", "execution_count": 26, "id": "f9961758-6f5b-42f6-92f1-0aa68bb0a677", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" }, "tags": [] }, "outputs": [], "source": [ "model_loaded.freeze(is_freezed=False)" ] }, { "cell_type": "markdown", "id": "9ff472a4", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "To get information about whether model is \"freezed\" or not use `is_freezed` property." ] }, { "cell_type": "code", "execution_count": 27, "id": "eba6d010", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_loaded.is_freezed" ] }, { "cell_type": "markdown", "id": "5d5a6f56", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 4. Loading external pretrained models \n", "\n" ] }, { "cell_type": "markdown", "id": "565729ea", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "In this section we introduce our pretrained embedding models." ] }, { "cell_type": "code", "execution_count": 28, "id": "8d38bf52", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
segmentm1m10m100m101m102m103m104m105m106m107...m90m91m92m93m94m95m96m97m98m99
featuretargettargettargettargettargettargettargettargettargettarget...targettargettargettargettargettargettargettargettargettarget
timestamp
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3NaNNaN4.0329.01341.0319.01419.0462.0921.03118.0...7301.04374.0803.0191.0124.0319.0270.036.0109.038.0
4NaNNaN40.0439.01258.0315.01400.0550.01060.02775.0...13980.03470.0963.0265.0283.0690.0365.031.0158.074.0
\n", "

5 rows × 366 columns

\n", "
" ], "text/plain": [ "segment m1 m10 m100 m101 m102 m103 m104 m105 m106 \\\n", "feature target target target target target target target target target \n", "timestamp \n", "0 NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN 4.0 329.0 1341.0 319.0 1419.0 462.0 921.0 \n", "4 NaN NaN 40.0 439.0 1258.0 315.0 1400.0 550.0 1060.0 \n", "\n", "segment m107 ... m90 m91 m92 m93 m94 m95 m96 \\\n", "feature target ... target target target target target target target \n", "timestamp ... \n", "0 NaN ... NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN ... NaN NaN NaN NaN NaN NaN NaN \n", "2 NaN ... NaN NaN NaN NaN NaN NaN NaN \n", "3 3118.0 ... 7301.0 4374.0 803.0 191.0 124.0 319.0 270.0 \n", "4 2775.0 ... 13980.0 3470.0 963.0 265.0 283.0 690.0 365.0 \n", "\n", "segment m97 m98 m99 \n", "feature target target target \n", "timestamp \n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 36.0 109.0 38.0 \n", "4 31.0 158.0 74.0 \n", "\n", "[5 rows x 366 columns]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HORIZON = 12\n", "\n", "ts = load_dataset(\"tourism_monthly\")\n", "ts = TSDataset(ts.to_pandas(features=[\"target\"]), freq=None)\n", "ts.head()" ] }, { "cell_type": "markdown", "id": "70588951", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Our base pipeline with lags. " ] }, { "cell_type": "code", "execution_count": 29, "id": "3ed0d8d8", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 3.7s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 3.7s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.3s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.3s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n" ] } ], "source": [ "model = CatBoostMultiSegmentModel()\n", "\n", "lag_transform = LagTransform(in_column=\"target\", lags=list(range(HORIZON, HORIZON + 6)), out_column=\"lag\")\n", "\n", "pipeline = Pipeline(model=model, transforms=[lag_transform], horizon=HORIZON)\n", "metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=1)" ] }, { "cell_type": "code", "execution_count": 30, "id": "73c6f34c", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 18.80136468764402\n" ] } ], "source": [ "print(\"SMAPE: \", metrics_df[\"SMAPE\"].mean())" ] }, { "cell_type": "markdown", "id": "64b52daa", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "It is often useful to encode segment by `SegmentEncoderTransform` when using multi-segment models like now." ] }, { "cell_type": "code", "execution_count": 31, "id": "56b8f36a", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 9.2s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 9.2s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.6s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.6s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n" ] } ], "source": [ "from etna.transforms import SegmentEncoderTransform\n", "\n", "model = CatBoostMultiSegmentModel()\n", "\n", "lag_transform = LagTransform(in_column=\"target\", lags=list(range(HORIZON, HORIZON + 6)), out_column=\"lag\")\n", "segment_transform = SegmentEncoderTransform()\n", "\n", "pipeline = Pipeline(model=model, transforms=[lag_transform, segment_transform], horizon=HORIZON)\n", "metrics_df, _, _ = pipeline.backtest(ts, metrics=[SMAPE()], n_folds=1)" ] }, { "cell_type": "code", "execution_count": 32, "id": "05d839c2", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 18.719919206298737\n" ] } ], "source": [ "print(\"SMAPE: \", metrics_df[\"SMAPE\"].mean())" ] }, { "cell_type": "markdown", "id": "ccda7e72", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Segment embeddings from `EmbeddingSegmentTransform` can replace `SegmentEncoderTransform`'s feature. The main advantage of using segment embeddings is that you can forecast new segments by your trained pipeline. `SegmentEncoderTransform` can't work with segments that weren't present during training.\n", "\n", "To see available embedding models use `list_models` method of `TS2VecEmbeddingModel` or `TSTCCEmbeddingModel`" ] }, { "cell_type": "code", "execution_count": 33, "id": "fa270a0a", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "['ts2vec_tiny']" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TS2VecEmbeddingModel.list_models()" ] }, { "cell_type": "markdown", "id": "04e8575b", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's load `ts2vec_tiny` pretrained model." ] }, { "cell_type": "code", "execution_count": 34, "id": "2834d4cb", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 6.9s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 6.9s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 1.8s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 1.8s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n", "[Parallel(n_jobs=1)]: Done 1 tasks | elapsed: 0.0s\n" ] } ], "source": [ "emb_model = TS2VecEmbeddingModel.load(path=\"ts2vec_model.zip\", model_name=\"ts2vec_tiny\")\n", "\n", "smape_score = forecast_with_segment_embeddings(emb_model, n_folds=1)" ] }, { "cell_type": "code", "execution_count": 35, "id": "837728f8", "metadata": { "pycharm": { "is_executing": true, "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SMAPE: 18.436162523492154\n" ] } ], "source": [ "print(\"SMAPE: \", smape_score)" ] }, { "cell_type": "markdown", "id": "ab62fb69", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "We get better result compared to `SegmentEncoderTransform` and opportunity to use pipeline for new segments." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 5 }