{ "cells": [ { "cell_type": "markdown", "id": "c855b45e", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# AutoML\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/etna-team/etna/master?filepath=examples/205-automl.ipynb)" ] }, { "cell_type": "markdown", "id": "bca01a6c", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "This notebooks covers AutoML utilities of ETNA library.\n", "\n", "**Table of contents**\n", "\n", "- [Hyperparameters tuning](#chapter_1)\n", " - [How Tune works](#section_1_1)\n", " - [Example](#section_1_2)\n", "- [General AutoML](#chapter_2)\n", " - [How Auto works](#section_2_1)\n", " - [Example](#section_2_2)\n", " - [Using custom pipeline pool](#section_2_3)\n", "- [Summary](chapter_3)" ] }, { "cell_type": "code", "execution_count": 1, "id": "45f65253", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "!pip install \"etna[auto, prophet]\" -q" ] }, { "cell_type": "code", "execution_count": 2, "id": "6f70e872", "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "4a371085", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from etna.datasets import TSDataset\n", "from etna.metrics import SMAPE\n", "from etna.models import LinearPerSegmentModel\n", "from etna.models import NaiveModel\n", "from etna.models import ProphetModel\n", "from etna.pipeline import Pipeline\n", "from etna.transforms import DateFlagsTransform\n", "from etna.transforms import LagTransform" ] }, { "cell_type": "code", "execution_count": 4, "id": "2f67d0d0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "HORIZON = 14" ] }, { "cell_type": "markdown", "id": "c30553de", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "## 1. Hyperparameters tuning " ] }, { "cell_type": "markdown", "id": "e1d81449", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "It is a common task to tune hyperparameters of existing pipeline to improve its quality. For this purpose there is an `etna.auto.Tune` class, which is responsible for creating [optuna](https://github.com/optuna/optuna) study to solve this problem.\n", "\n", "In the next sections we will see how it works and how to use it for your particular problems." ] }, { "cell_type": "markdown", "id": "a79e3a34", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "### 1.1 How `Tune` works " ] }, { "cell_type": "markdown", "id": "09a9cb75", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "During init `Tune` accepts `pipeline`, its tuning parameters (`params_to_tune`), optimization metric (`target_metric`), parameters of backtest and parameters of optuna study.\n", "\n", "In `fit` the optuna study is created. During each trial the sample of parameters is generated from `params_to_tune` and applied to `pipeline`. After that, the new pipeline is checked in backtest and target metric is returned to optuna framework." ] }, { "cell_type": "markdown", "id": "b5bcae1b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's look closer at `params_to_tune` parameter. It expects dictionary with parameter names and its distributions. But how this parameter names should be chosen?" ] }, { "cell_type": "markdown", "id": "416208e3", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.1.1 `set_params`" ] }, { "cell_type": "markdown", "id": "7b39811b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We are going to make a little detour to explain the `set_params` method, which is supported by ETNA pipelines, models and transforms. Given a dictionary with parameters it allows to create from existing object a new one with changed parameters." ] }, { "cell_type": "markdown", "id": "b1388880", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "First, we define some objects for our future examples." ] }, { "cell_type": "code", "execution_count": 5, "id": "19767c13", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "model = LinearPerSegmentModel()\n", "transforms = [\n", " LagTransform(in_column=\"target\", lags=list(range(HORIZON, HORIZON + 10)), out_column=\"target_lag\"),\n", " DateFlagsTransform(out_column=\"date_flags\"),\n", "]\n", "pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)" ] }, { "cell_type": "markdown", "id": "7aed68b5", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's look at simple example, when we want to change `fit_intercept` parameter of the `model`." ] }, { "cell_type": "code", "execution_count": 6, "id": "4caacbcb", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'fit_intercept': True,\n", " 'kwargs': {},\n", " '_target_': 'etna.models.linear.LinearPerSegmentModel'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.to_dict()" ] }, { "cell_type": "code", "execution_count": 7, "id": "e80b47b3", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'fit_intercept': False,\n", " 'kwargs': {},\n", " '_target_': 'etna.models.linear.LinearPerSegmentModel'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_model_params = {\"fit_intercept\": False}\n", "new_model = model.set_params(**new_model_params)\n", "new_model.to_dict()" ] }, { "cell_type": "markdown", "id": "523251f2", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Great! On the next step we want to change the `fit_intercept` of `model` inside the `pipeline`." ] }, { "cell_type": "code", "execution_count": 8, "id": "6caccffb", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'model': {'fit_intercept': True,\n", " 'kwargs': {},\n", " '_target_': 'etna.models.linear.LinearPerSegmentModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'lags': [14, 15, 16, 17, 18, 19, 20, 21, 22, 23],\n", " 'out_column': 'target_lag',\n", " '_target_': 'etna.transforms.math.lags.LagTransform'},\n", " {'day_number_in_week': True,\n", " 'day_number_in_month': True,\n", " 'day_number_in_year': False,\n", " 'week_number_in_month': False,\n", " 'week_number_in_year': False,\n", " 'month_number_in_year': False,\n", " 'season_number': False,\n", " 'year_number': False,\n", " 'is_weekend': True,\n", " 'special_days_in_week': (),\n", " 'special_days_in_month': (),\n", " 'out_column': 'date_flags',\n", " '_target_': 'etna.transforms.timestamp.date_flags.DateFlagsTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.to_dict()" ] }, { "cell_type": "code", "execution_count": 9, "id": "cb5959d3", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'model': {'fit_intercept': False,\n", " 'kwargs': {},\n", " '_target_': 'etna.models.linear.LinearPerSegmentModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'lags': [14, 15, 16, 17, 18, 19, 20, 21, 22, 23],\n", " 'out_column': 'target_lag',\n", " '_target_': 'etna.transforms.math.lags.LagTransform'},\n", " {'day_number_in_week': True,\n", " 'day_number_in_month': True,\n", " 'day_number_in_year': False,\n", " 'week_number_in_month': False,\n", " 'week_number_in_year': False,\n", " 'month_number_in_year': False,\n", " 'season_number': False,\n", " 'year_number': False,\n", " 'is_weekend': True,\n", " 'special_days_in_week': (),\n", " 'special_days_in_month': (),\n", " 'out_column': 'date_flags',\n", " '_target_': 'etna.transforms.timestamp.date_flags.DateFlagsTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_pipeline_params = {\"model.fit_intercept\": False}\n", "new_pipeline = pipeline.set_params(**new_pipeline_params)\n", "new_pipeline.to_dict()" ] }, { "cell_type": "markdown", "id": "fe2ab629", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Ok, it looks like we managed to do this. On the last step we are going to change `is_weekend` flag of `DateFlagsTransform` inside our `pipeline`." ] }, { "cell_type": "code", "execution_count": 10, "id": "d658e37b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'model': {'fit_intercept': True,\n", " 'kwargs': {},\n", " '_target_': 'etna.models.linear.LinearPerSegmentModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'lags': [14, 15, 16, 17, 18, 19, 20, 21, 22, 23],\n", " 'out_column': 'target_lag',\n", " '_target_': 'etna.transforms.math.lags.LagTransform'},\n", " {'day_number_in_week': True,\n", " 'day_number_in_month': True,\n", " 'day_number_in_year': False,\n", " 'week_number_in_month': False,\n", " 'week_number_in_year': False,\n", " 'month_number_in_year': False,\n", " 'season_number': False,\n", " 'year_number': False,\n", " 'is_weekend': False,\n", " 'special_days_in_week': (),\n", " 'special_days_in_month': (),\n", " 'out_column': 'date_flags',\n", " '_target_': 'etna.transforms.timestamp.date_flags.DateFlagsTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_pipeline_params = {\"transforms.1.is_weekend\": False}\n", "new_pipeline = pipeline.set_params(**new_pipeline_params)\n", "new_pipeline.to_dict()" ] }, { "cell_type": "markdown", "id": "cc89ba4a", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "As we can see, we managed to do this." ] }, { "cell_type": "markdown", "id": "304a8015", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.1.2 `params_to_tune`" ] }, { "cell_type": "markdown", "id": "8686327c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's get back to our initial question about `params_to_tune`. In our optuna study we are going to sample each parameter value from its distribution and pass it into `pipeline.set_params` method. So, the keys for `params_to_tune` should be a valid for `set_params` method.\n", "\n", "Distributions are taken from `etna.distributions` and they are matching `optuna.Trial.suggest_` methods." ] }, { "cell_type": "markdown", "id": "bad3f449", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "For example, something like this will be valid for our `pipeline` defined above:" ] }, { "cell_type": "code", "execution_count": 11, "id": "cc010639", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from etna.distributions import CategoricalDistribution\n", "\n", "example_params_to_tune = {\n", " \"model.fit_intercept\": CategoricalDistribution(choices=[False, True]),\n", " \"transforms.1.is_weekend\": CategoricalDistribution(choices=[False, True]),\n", "}" ] }, { "cell_type": "markdown", "id": "9d401266", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "This custom dict could be passed into `Tune` class. This will be shown in the [Example](#custom_params) below." ] }, { "cell_type": "markdown", "id": "d37306b8", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "There are some good news: it isn't necessary for our users to define `params_to_tune`, because we have a default grid for many of our classes. The default grid is available by calling `params_to_tune` method on pipeline, model or transform. Let's check our `pipeline`:" ] }, { "cell_type": "code", "execution_count": 12, "id": "6571b3e8", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "{'model.fit_intercept': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.day_number_in_week': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.day_number_in_month': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.day_number_in_year': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.week_number_in_month': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.week_number_in_year': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.month_number_in_year': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.season_number': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.year_number': CategoricalDistribution(choices=[False, True]),\n", " 'transforms.1.is_weekend': CategoricalDistribution(choices=[False, True])}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipeline.params_to_tune()" ] }, { "cell_type": "markdown", "id": "fb521dca", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Now we are ready to use it in practice." ] }, { "cell_type": "markdown", "id": "fb561f05", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "### 1.2 Example " ] }, { "cell_type": "markdown", "id": "a3b573d6", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.2.1 Loading data" ] }, { "cell_type": "markdown", "id": "ae581eb6", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's start by loading example data." ] }, { "cell_type": "code", "execution_count": 13, "id": "66f053a1", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampsegmenttarget
02019-01-01segment_a170
12019-01-02segment_a243
22019-01-03segment_a267
32019-01-04segment_a287
42019-01-05segment_a279
\n", "
" ], "text/plain": [ " timestamp segment target\n", "0 2019-01-01 segment_a 170\n", "1 2019-01-02 segment_a 243\n", "2 2019-01-03 segment_a 267\n", "3 2019-01-04 segment_a 287\n", "4 2019-01-05 segment_a 279" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"data/example_dataset.csv\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 14, "id": "7421a616", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "full_ts = TSDataset(df, freq=\"D\")\n", "full_ts.plot()" ] }, { "cell_type": "markdown", "id": "ddc66d43", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's divide current dataset into train and validation parts. We will use validation part later to check final results." ] }, { "cell_type": "code", "execution_count": 15, "id": "9c76e694", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "ts, _ = full_ts.train_test_split(test_size=HORIZON * 5)" ] }, { "cell_type": "markdown", "id": "e3271b39", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.2.2 Running `Tune`" ] }, { "cell_type": "markdown", "id": "ae2fbaed", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We are going to define our `Tune` object:" ] }, { "cell_type": "code", "execution_count": 16, "id": "db305339", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from etna.auto import Tune\n", "\n", "tune = Tune(pipeline=pipeline, target_metric=SMAPE(), horizon=HORIZON, backtest_params=dict(n_folds=5))" ] }, { "cell_type": "markdown", "id": "4a27b5ac", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We used mostly default parameters for this example. But for your own experiments you might want to also set up other parameters. \n", "\n", "For example, parameter `runner` allows you to run tuning in parallel on a local machine, and parameter `storage` makes it possible to store optuna results on a dedicated remote server.\n", "\n", "For a full list of parameters we advise you to check our documentation." ] }, { "cell_type": "markdown", "id": "ae2d89ca", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's hide the logs of optuna, there are too many of them for a notebook." ] }, { "cell_type": "code", "execution_count": 17, "id": "22fbec9f", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import optuna\n", "\n", "optuna.logging.set_verbosity(optuna.logging.CRITICAL)" ] }, { "cell_type": "markdown", "id": "8ad33ea7", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's run the tuning" ] }, { "cell_type": "code", "execution_count": 18, "id": "0a4a7781", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "best_pipeline = tune.fit(ts=ts, n_trials=20)" ] }, { "cell_type": "markdown", "id": "fb6e5682", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Command `%%capture` just hides the output." ] }, { "cell_type": "markdown", "id": "cd45ff60", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.2.3 Running `Tune` with custom `params_to_tune`" ] }, { "cell_type": "markdown", "id": "14fefb85", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's remember that earlier we created a dict:\n", "```python\n", "example_params_to_tune = {\n", " \"model.fit_intercept\": CategoricalDistribution(choices=[False, True]),\n", " \"transforms.1.is_weekend\": CategoricalDistribution(choices=[False, True]),\n", "}\n", "```\n", "Now we can use these parameters when initializing `Tune`." ] }, { "cell_type": "code", "execution_count": 19, "id": "4157e5c6", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "tune_custom_params = Tune(\n", " pipeline=pipeline,\n", " target_metric=SMAPE(),\n", " horizon=HORIZON,\n", " backtest_params=dict(n_folds=5),\n", " params_to_tune=example_params_to_tune,\n", ")" ] }, { "cell_type": "markdown", "id": "dcc67b63", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's run the tuning with our custom parameters." ] }, { "cell_type": "code", "execution_count": 20, "id": "996e0ee0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "best_pipeline_custom_params = tune_custom_params.fit(ts=ts, n_trials=20)" ] }, { "cell_type": "markdown", "id": "f8bdaf7e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "#### 1.2.4 Analysis" ] }, { "cell_type": "markdown", "id": "a84cdb9e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "In the last section dedicated to `Tune` we will look at methods for result analysis." ] }, { "cell_type": "markdown", "id": "ea57eccc", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "First of all there is `summary` method that shows us the results of optuna trials." ] }, { "cell_type": "code", "execution_count": 21, "id": "004dcb2a", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAE_meanMAE_medianMAE_notna_sizeMAE_percentile_25MAE_percentile_5MAE_percentile_75MAE_percentile_95MAE_stdMSE_meanMSE_median...Sign_notna_sizeSign_percentile_25Sign_percentile_5Sign_percentile_75Sign_percentile_95Sign_stdelapsed_timehashpipelinestate
025.66790426.7884724.020.38524315.14529432.07113334.6217178.2178651527.8132691356.477027...4.0-0.621429-0.672857-0.357143-0.2542860.1777121.586009f4f02e1d5f60b8f322a4a8a622dd1c1ePipeline(model = LinearPerSegmentModel(fit_int...1
126.60648228.2724294.020.55426215.19831534.32464935.6823228.9007251729.2957051599.030615...4.0-0.642857-0.745714-0.300000-0.2657140.2099561.7007753d7b7af16d71a36f3b935f69e113e22dPipeline(model = LinearPerSegmentModel(fit_int...1
226.42767026.8093574.020.23858911.88490232.99843840.43607811.7022702133.3439671997.594638...4.0-0.392857-0.581429-0.078571-0.0614290.2290171.0968777c7932114268832a5458acfecfb453fcPipeline(model = LinearPerSegmentModel(fit_int...1
334.58673335.6168314.029.79723316.15767440.40633051.57365314.7504083185.4610933324.924980...4.0-0.100000-0.3400000.0142860.0485710.1829460.989273b7ac5f7fcf9c8959626befe263a9d561Pipeline(model = LinearPerSegmentModel(fit_int...1
428.04663128.2796124.022.15313014.34940834.17311441.41768111.0905671875.9584392045.887754...4.0-0.585714-0.620000-0.250000-0.2328570.1799940.785666e928929f89156d88ef49e28abaf55847Pipeline(model = LinearPerSegmentModel(fit_int...1
531.71457834.6504414.025.44185814.66217940.92316044.65676712.6908272496.8626132344.530581...4.0-0.514286-0.788571-0.2714290.0371430.3437491.1415063b4311d41fcaab7307235ea23b6d4599Pipeline(model = LinearPerSegmentModel(fit_int...1
628.40101130.9374394.022.51739115.06496936.82105938.18605310.0364152160.7036191565.256313...4.0-0.621429-0.672857-0.257143-0.1885710.2132120.84920974065ebc11c81bed6a9819d026c7cd84Pipeline(model = LinearPerSegmentModel(fit_int...1
728.83440230.7827354.023.54048015.88396536.07665639.0571739.7683191887.3308681671.590780...4.0-0.657143-0.725714-0.250000-0.2328570.2253120.883154b0d0420255c6117045f8254bf8f377a0Pipeline(model = LinearPerSegmentModel(fit_int...1
827.58946828.2969734.021.66862314.02518334.21781840.16324510.7457561861.9653632038.946807...4.0-0.671429-0.705714-0.242857-0.2085710.2303500.87596625dcd8bb095f87a1ffc499fa6a83ef5dPipeline(model = LinearPerSegmentModel(fit_int...1
929.72015128.7134704.023.39534114.72615335.03828046.12350112.9424201975.6500041963.591771...4.0-0.657143-0.725714-0.192857-0.1757140.2534461.0439013f1ca1759261598081fa3bb2f32fe0acPipeline(model = LinearPerSegmentModel(fit_int...1
1025.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1125.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1225.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1325.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1425.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1525.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1625.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1725.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
1825.28843026.2438694.018.91316814.51827132.61913134.7209758.6940621441.2969011366.199962...4.0-0.685714-0.754286-0.164286-0.1471430.2816681.0712876f595f4f43b323804c04d4cea49c169bPipeline(model = LinearPerSegmentModel(fit_int...1
1925.59882222.9121854.019.47206011.71998829.03894643.23894913.3577561822.0760841734.654459...4.0-0.328571-0.431429-0.0142860.0200000.1963961.1312008363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...1
\n", "

20 rows × 44 columns

\n", "
" ], "text/plain": [ " MAE_mean MAE_median MAE_notna_size MAE_percentile_25 \\\n", "0 25.667904 26.788472 4.0 20.385243 \n", "1 26.606482 28.272429 4.0 20.554262 \n", "2 26.427670 26.809357 4.0 20.238589 \n", "3 34.586733 35.616831 4.0 29.797233 \n", "4 28.046631 28.279612 4.0 22.153130 \n", "5 31.714578 34.650441 4.0 25.441858 \n", "6 28.401011 30.937439 4.0 22.517391 \n", "7 28.834402 30.782735 4.0 23.540480 \n", "8 27.589468 28.296973 4.0 21.668623 \n", "9 29.720151 28.713470 4.0 23.395341 \n", "10 25.598822 22.912185 4.0 19.472060 \n", "11 25.598822 22.912185 4.0 19.472060 \n", "12 25.598822 22.912185 4.0 19.472060 \n", "13 25.598822 22.912185 4.0 19.472060 \n", "14 25.598822 22.912185 4.0 19.472060 \n", "15 25.598822 22.912185 4.0 19.472060 \n", "16 25.598822 22.912185 4.0 19.472060 \n", "17 25.598822 22.912185 4.0 19.472060 \n", "18 25.288430 26.243869 4.0 18.913168 \n", "19 25.598822 22.912185 4.0 19.472060 \n", "\n", " MAE_percentile_5 MAE_percentile_75 MAE_percentile_95 MAE_std \\\n", "0 15.145294 32.071133 34.621717 8.217865 \n", "1 15.198315 34.324649 35.682322 8.900725 \n", "2 11.884902 32.998438 40.436078 11.702270 \n", "3 16.157674 40.406330 51.573653 14.750408 \n", "4 14.349408 34.173114 41.417681 11.090567 \n", "5 14.662179 40.923160 44.656767 12.690827 \n", "6 15.064969 36.821059 38.186053 10.036415 \n", "7 15.883965 36.076656 39.057173 9.768319 \n", "8 14.025183 34.217818 40.163245 10.745756 \n", "9 14.726153 35.038280 46.123501 12.942420 \n", "10 11.719988 29.038946 43.238949 13.357756 \n", "11 11.719988 29.038946 43.238949 13.357756 \n", "12 11.719988 29.038946 43.238949 13.357756 \n", "13 11.719988 29.038946 43.238949 13.357756 \n", "14 11.719988 29.038946 43.238949 13.357756 \n", "15 11.719988 29.038946 43.238949 13.357756 \n", "16 11.719988 29.038946 43.238949 13.357756 \n", "17 11.719988 29.038946 43.238949 13.357756 \n", "18 14.518271 32.619131 34.720975 8.694062 \n", "19 11.719988 29.038946 43.238949 13.357756 \n", "\n", " MSE_mean MSE_median ... Sign_notna_size Sign_percentile_25 \\\n", "0 1527.813269 1356.477027 ... 4.0 -0.621429 \n", "1 1729.295705 1599.030615 ... 4.0 -0.642857 \n", "2 2133.343967 1997.594638 ... 4.0 -0.392857 \n", "3 3185.461093 3324.924980 ... 4.0 -0.100000 \n", "4 1875.958439 2045.887754 ... 4.0 -0.585714 \n", "5 2496.862613 2344.530581 ... 4.0 -0.514286 \n", "6 2160.703619 1565.256313 ... 4.0 -0.621429 \n", "7 1887.330868 1671.590780 ... 4.0 -0.657143 \n", "8 1861.965363 2038.946807 ... 4.0 -0.671429 \n", "9 1975.650004 1963.591771 ... 4.0 -0.657143 \n", "10 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "11 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "12 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "13 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "14 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "15 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "16 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "17 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "18 1441.296901 1366.199962 ... 4.0 -0.685714 \n", "19 1822.076084 1734.654459 ... 4.0 -0.328571 \n", "\n", " Sign_percentile_5 Sign_percentile_75 Sign_percentile_95 Sign_std \\\n", "0 -0.672857 -0.357143 -0.254286 0.177712 \n", "1 -0.745714 -0.300000 -0.265714 0.209956 \n", "2 -0.581429 -0.078571 -0.061429 0.229017 \n", "3 -0.340000 0.014286 0.048571 0.182946 \n", "4 -0.620000 -0.250000 -0.232857 0.179994 \n", "5 -0.788571 -0.271429 0.037143 0.343749 \n", "6 -0.672857 -0.257143 -0.188571 0.213212 \n", "7 -0.725714 -0.250000 -0.232857 0.225312 \n", "8 -0.705714 -0.242857 -0.208571 0.230350 \n", "9 -0.725714 -0.192857 -0.175714 0.253446 \n", "10 -0.431429 -0.014286 0.020000 0.196396 \n", "11 -0.431429 -0.014286 0.020000 0.196396 \n", "12 -0.431429 -0.014286 0.020000 0.196396 \n", "13 -0.431429 -0.014286 0.020000 0.196396 \n", "14 -0.431429 -0.014286 0.020000 0.196396 \n", "15 -0.431429 -0.014286 0.020000 0.196396 \n", "16 -0.431429 -0.014286 0.020000 0.196396 \n", "17 -0.431429 -0.014286 0.020000 0.196396 \n", "18 -0.754286 -0.164286 -0.147143 0.281668 \n", "19 -0.431429 -0.014286 0.020000 0.196396 \n", "\n", " elapsed_time hash \\\n", "0 1.586009 f4f02e1d5f60b8f322a4a8a622dd1c1e \n", "1 1.700775 3d7b7af16d71a36f3b935f69e113e22d \n", "2 1.096877 7c7932114268832a5458acfecfb453fc \n", "3 0.989273 b7ac5f7fcf9c8959626befe263a9d561 \n", "4 0.785666 e928929f89156d88ef49e28abaf55847 \n", "5 1.141506 3b4311d41fcaab7307235ea23b6d4599 \n", "6 0.849209 74065ebc11c81bed6a9819d026c7cd84 \n", "7 0.883154 b0d0420255c6117045f8254bf8f377a0 \n", "8 0.875966 25dcd8bb095f87a1ffc499fa6a83ef5d \n", "9 1.043901 3f1ca1759261598081fa3bb2f32fe0ac \n", "10 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "11 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "12 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "13 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "14 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "15 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "16 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "17 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "18 1.071287 6f595f4f43b323804c04d4cea49c169b \n", "19 1.131200 8363309e454e72993f86f10c7fc7c137 \n", "\n", " pipeline state \n", "0 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "1 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "2 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "3 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "4 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "5 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "6 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "7 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "8 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "9 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "10 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "11 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "12 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "13 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "14 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "15 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "16 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "17 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "18 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "19 Pipeline(model = LinearPerSegmentModel(fit_int... 1 \n", "\n", "[20 rows x 44 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tune.summary()" ] }, { "cell_type": "markdown", "id": "a95dc86a", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's show only the columns we are interested in." ] }, { "cell_type": "code", "execution_count": 22, "id": "88427207", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hashpipelineSMAPE_meanelapsed_timestate
198363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
178363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
168363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
158363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
148363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
138363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
128363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
108363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
118363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
27c7932114268832a5458acfecfb453fcPipeline(model = LinearPerSegmentModel(fit_int...9.2101831.0968771
825dcd8bb095f87a1ffc499fa6a83ef5dPipeline(model = LinearPerSegmentModel(fit_int...9.9436580.8759661
4e928929f89156d88ef49e28abaf55847Pipeline(model = LinearPerSegmentModel(fit_int...9.9468660.7856661
0f4f02e1d5f60b8f322a4a8a622dd1c1ePipeline(model = LinearPerSegmentModel(fit_int...9.9577811.5860091
186f595f4f43b323804c04d4cea49c169bPipeline(model = LinearPerSegmentModel(fit_int...10.0617421.0712871
13d7b7af16d71a36f3b935f69e113e22dPipeline(model = LinearPerSegmentModel(fit_int...10.3069091.7007751
93f1ca1759261598081fa3bb2f32fe0acPipeline(model = LinearPerSegmentModel(fit_int...10.5544441.0439011
53b4311d41fcaab7307235ea23b6d4599Pipeline(model = LinearPerSegmentModel(fit_int...10.7567031.1415061
674065ebc11c81bed6a9819d026c7cd84Pipeline(model = LinearPerSegmentModel(fit_int...10.9171640.8492091
3b7ac5f7fcf9c8959626befe263a9d561Pipeline(model = LinearPerSegmentModel(fit_int...11.2553200.9892731
7b0d0420255c6117045f8254bf8f377a0Pipeline(model = LinearPerSegmentModel(fit_int...11.4787600.8831541
\n", "
" ], "text/plain": [ " hash \\\n", "19 8363309e454e72993f86f10c7fc7c137 \n", "17 8363309e454e72993f86f10c7fc7c137 \n", "16 8363309e454e72993f86f10c7fc7c137 \n", "15 8363309e454e72993f86f10c7fc7c137 \n", "14 8363309e454e72993f86f10c7fc7c137 \n", "13 8363309e454e72993f86f10c7fc7c137 \n", "12 8363309e454e72993f86f10c7fc7c137 \n", "10 8363309e454e72993f86f10c7fc7c137 \n", "11 8363309e454e72993f86f10c7fc7c137 \n", "2 7c7932114268832a5458acfecfb453fc \n", "8 25dcd8bb095f87a1ffc499fa6a83ef5d \n", "4 e928929f89156d88ef49e28abaf55847 \n", "0 f4f02e1d5f60b8f322a4a8a622dd1c1e \n", "18 6f595f4f43b323804c04d4cea49c169b \n", "1 3d7b7af16d71a36f3b935f69e113e22d \n", "9 3f1ca1759261598081fa3bb2f32fe0ac \n", "5 3b4311d41fcaab7307235ea23b6d4599 \n", "6 74065ebc11c81bed6a9819d026c7cd84 \n", "3 b7ac5f7fcf9c8959626befe263a9d561 \n", "7 b0d0420255c6117045f8254bf8f377a0 \n", "\n", " pipeline SMAPE_mean \\\n", "19 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "17 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "16 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "15 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "14 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "13 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "12 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "10 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "11 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "2 Pipeline(model = LinearPerSegmentModel(fit_int... 9.210183 \n", "8 Pipeline(model = LinearPerSegmentModel(fit_int... 9.943658 \n", "4 Pipeline(model = LinearPerSegmentModel(fit_int... 9.946866 \n", "0 Pipeline(model = LinearPerSegmentModel(fit_int... 9.957781 \n", "18 Pipeline(model = LinearPerSegmentModel(fit_int... 10.061742 \n", "1 Pipeline(model = LinearPerSegmentModel(fit_int... 10.306909 \n", "9 Pipeline(model = LinearPerSegmentModel(fit_int... 10.554444 \n", "5 Pipeline(model = LinearPerSegmentModel(fit_int... 10.756703 \n", "6 Pipeline(model = LinearPerSegmentModel(fit_int... 10.917164 \n", "3 Pipeline(model = LinearPerSegmentModel(fit_int... 11.255320 \n", "7 Pipeline(model = LinearPerSegmentModel(fit_int... 11.478760 \n", "\n", " elapsed_time state \n", "19 1.131200 1 \n", "17 1.131200 1 \n", "16 1.131200 1 \n", "15 1.131200 1 \n", "14 1.131200 1 \n", "13 1.131200 1 \n", "12 1.131200 1 \n", "10 1.131200 1 \n", "11 1.131200 1 \n", "2 1.096877 1 \n", "8 0.875966 1 \n", "4 0.785666 1 \n", "0 1.586009 1 \n", "18 1.071287 1 \n", "1 1.700775 1 \n", "9 1.043901 1 \n", "5 1.141506 1 \n", "6 0.849209 1 \n", "3 0.989273 1 \n", "7 0.883154 1 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tune.summary()[[\"hash\", \"pipeline\", \"SMAPE_mean\", \"elapsed_time\", \"state\"]].sort_values(\"SMAPE_mean\")" ] }, { "cell_type": "markdown", "id": "08736c75", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "As we can see, we have duplicate lines according to the `hash` column. Some trials have the same sampled hyperparameters and they have the same results. We have a special handling for such duplicates: they are skipped during optimization and the previously computed metric values are returned.\n", "\n", "Duplicates on the summary can be eliminated using `hash` column." ] }, { "cell_type": "code", "execution_count": 23, "id": "6433880c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hashpipelineSMAPE_meanelapsed_timestate
198363309e454e72993f86f10c7fc7c137Pipeline(model = LinearPerSegmentModel(fit_int...8.5565351.1312001
27c7932114268832a5458acfecfb453fcPipeline(model = LinearPerSegmentModel(fit_int...9.2101831.0968771
825dcd8bb095f87a1ffc499fa6a83ef5dPipeline(model = LinearPerSegmentModel(fit_int...9.9436580.8759661
4e928929f89156d88ef49e28abaf55847Pipeline(model = LinearPerSegmentModel(fit_int...9.9468660.7856661
0f4f02e1d5f60b8f322a4a8a622dd1c1ePipeline(model = LinearPerSegmentModel(fit_int...9.9577811.5860091
186f595f4f43b323804c04d4cea49c169bPipeline(model = LinearPerSegmentModel(fit_int...10.0617421.0712871
13d7b7af16d71a36f3b935f69e113e22dPipeline(model = LinearPerSegmentModel(fit_int...10.3069091.7007751
93f1ca1759261598081fa3bb2f32fe0acPipeline(model = LinearPerSegmentModel(fit_int...10.5544441.0439011
53b4311d41fcaab7307235ea23b6d4599Pipeline(model = LinearPerSegmentModel(fit_int...10.7567031.1415061
674065ebc11c81bed6a9819d026c7cd84Pipeline(model = LinearPerSegmentModel(fit_int...10.9171640.8492091
3b7ac5f7fcf9c8959626befe263a9d561Pipeline(model = LinearPerSegmentModel(fit_int...11.2553200.9892731
7b0d0420255c6117045f8254bf8f377a0Pipeline(model = LinearPerSegmentModel(fit_int...11.4787600.8831541
\n", "
" ], "text/plain": [ " hash \\\n", "19 8363309e454e72993f86f10c7fc7c137 \n", "2 7c7932114268832a5458acfecfb453fc \n", "8 25dcd8bb095f87a1ffc499fa6a83ef5d \n", "4 e928929f89156d88ef49e28abaf55847 \n", "0 f4f02e1d5f60b8f322a4a8a622dd1c1e \n", "18 6f595f4f43b323804c04d4cea49c169b \n", "1 3d7b7af16d71a36f3b935f69e113e22d \n", "9 3f1ca1759261598081fa3bb2f32fe0ac \n", "5 3b4311d41fcaab7307235ea23b6d4599 \n", "6 74065ebc11c81bed6a9819d026c7cd84 \n", "3 b7ac5f7fcf9c8959626befe263a9d561 \n", "7 b0d0420255c6117045f8254bf8f377a0 \n", "\n", " pipeline SMAPE_mean \\\n", "19 Pipeline(model = LinearPerSegmentModel(fit_int... 8.556535 \n", "2 Pipeline(model = LinearPerSegmentModel(fit_int... 9.210183 \n", "8 Pipeline(model = LinearPerSegmentModel(fit_int... 9.943658 \n", "4 Pipeline(model = LinearPerSegmentModel(fit_int... 9.946866 \n", "0 Pipeline(model = LinearPerSegmentModel(fit_int... 9.957781 \n", "18 Pipeline(model = LinearPerSegmentModel(fit_int... 10.061742 \n", "1 Pipeline(model = LinearPerSegmentModel(fit_int... 10.306909 \n", "9 Pipeline(model = LinearPerSegmentModel(fit_int... 10.554444 \n", "5 Pipeline(model = LinearPerSegmentModel(fit_int... 10.756703 \n", "6 Pipeline(model = LinearPerSegmentModel(fit_int... 10.917164 \n", "3 Pipeline(model = LinearPerSegmentModel(fit_int... 11.255320 \n", "7 Pipeline(model = LinearPerSegmentModel(fit_int... 11.478760 \n", "\n", " elapsed_time state \n", "19 1.131200 1 \n", "2 1.096877 1 \n", "8 0.875966 1 \n", "4 0.785666 1 \n", "0 1.586009 1 \n", "18 1.071287 1 \n", "1 1.700775 1 \n", "9 1.043901 1 \n", "5 1.141506 1 \n", "6 0.849209 1 \n", "3 0.989273 1 \n", "7 0.883154 1 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tune.summary()[[\"hash\", \"pipeline\", \"SMAPE_mean\", \"elapsed_time\", \"state\"]].sort_values(\"SMAPE_mean\").drop_duplicates(\n", " subset=\"hash\"\n", ")" ] }, { "cell_type": "markdown", "id": "19c46c87", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "The second method `top_k` is useful when you want to check out best tried pipelines without duplicates." ] }, { "cell_type": "code", "execution_count": 24, "id": "2f77e3d5", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "top_3_pipelines = tune.top_k(k=3)" ] }, { "cell_type": "code", "execution_count": 25, "id": "7e39db49", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "[Pipeline(model = LinearPerSegmentModel(fit_intercept = True, ), transforms = [LagTransform(in_column = 'target', lags = [14, 15, 16, 17, 18, 19, 20, 21, 22, 23], out_column = 'target_lag', ), DateFlagsTransform(day_number_in_week = False, day_number_in_month = True, day_number_in_year = False, week_number_in_month = True, week_number_in_year = False, month_number_in_year = False, season_number = False, year_number = False, is_weekend = True, special_days_in_week = (), special_days_in_month = (), out_column = 'date_flags', in_column = None, )], horizon = 14, ),\n", " Pipeline(model = LinearPerSegmentModel(fit_intercept = True, ), transforms = [LagTransform(in_column = 'target', lags = [14, 15, 16, 17, 18, 19, 20, 21, 22, 23], out_column = 'target_lag', ), DateFlagsTransform(day_number_in_week = False, day_number_in_month = True, day_number_in_year = False, week_number_in_month = True, week_number_in_year = False, month_number_in_year = False, season_number = False, year_number = False, is_weekend = False, special_days_in_week = (), special_days_in_month = (), out_column = 'date_flags', in_column = None, )], horizon = 14, ),\n", " Pipeline(model = LinearPerSegmentModel(fit_intercept = False, ), transforms = [LagTransform(in_column = 'target', lags = [14, 15, 16, 17, 18, 19, 20, 21, 22, 23], out_column = 'target_lag', ), DateFlagsTransform(day_number_in_week = True, day_number_in_month = False, day_number_in_year = True, week_number_in_month = False, week_number_in_year = False, month_number_in_year = False, season_number = False, year_number = True, is_weekend = False, special_days_in_week = (), special_days_in_month = (), out_column = 'date_flags', in_column = None, )], horizon = 14, )]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_3_pipelines" ] }, { "cell_type": "markdown", "id": "69161b23", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "## 2. General AutoML " ] }, { "cell_type": "markdown", "id": "07081c2c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Hyperparameters tuning is useful, but can be too narrow. In this section we move our attention to general AutoML pipeline.\n", "In ETNA we have an `etna.auto.Auto` class for making automatic pipeline selection. It can be useful to quickly create a good baseline for your forecasting task." ] }, { "cell_type": "markdown", "id": "5dfe064b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "### 2.1 How `Auto` works " ] }, { "cell_type": "markdown", "id": "ad5c1149", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "`Auto` init has similar parameters to `Tune`, but instead of `pipeline` it works with `pool`. Pool, in general, is just a list of pipelines.\n", "\n", "During `fit` there are two stages:\n", "\n", "- pool stage,\n", "- tuning stage.\n", "\n", "Pool stage is responsible for checking every pipeline suggested in a given `pool`. For each pipeline we run a backtest and compute `target_metric`. Results are saved in optuna study.\n", "\n", "Tuning stage takes `tune_size` best pipelines according to the results of the pool stage. And then runs `Tune` with default `params_to_tune` for them sequentially from best to the worst.\n", "\n", "Limit parameters `n_trials` and `timeout` are shared between pool and tuning stages. First, we run pool stage with given `n_trials` and `timeout`. After that, the remaining values are divided equally among `tune_size` tuning steps." ] }, { "cell_type": "markdown", "id": "e339e19e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "### 2.2 Example " ] }, { "cell_type": "markdown", "id": "0d8d046e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We will move straight to the example." ] }, { "cell_type": "markdown", "id": "d1e4202e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We have prepared pools for different frequencies and with different duration time. Frequency of our dataset is `D`, so we can choose `D_super_fast` pool. See docs of `Pool` class to know how to choose prepared pools correctly." ] }, { "cell_type": "code", "execution_count": 26, "id": "dc2c6c0f", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from etna.auto import Auto\n", "from etna.auto import Pool\n", "\n", "auto = Auto(\n", " target_metric=SMAPE(), horizon=HORIZON, pool=Pool.D_super_fast, generate_params={}, backtest_params=dict(n_folds=1)\n", ")" ] }, { "cell_type": "markdown", "id": "b5297e51", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's have a closer look at `generate_params`. It helps to set dynamic parameters for pipelines. Default pools contain `timestamp_column`, `chronos_device` and `timesfm_device` parameters.\n", "\n", "Parameter `timestamp_column` can be met in some models and transform (e.g. `statsforecast` models or `DateFlagTransform`) and defines column with timestamps. We use such columns in misaligned datasets (see `307-working_with_misaligned_data` notebook).\n", "\n", "- If you have regular `TSDataset` with `freq` != `None`, set `generate_params={}` or `generate_params=None`\n", "- If you have `TSDataset` with int index and exogenous column (e.g.`external_timestamp`) with timestamps, set `generate_params={\"timestamp_column\": \"external_timestamp\"}`\n", "\n", "Parameters `chronos_device` and `timesfm_device` specify `device` parameters for corresponding pretrained model. By default `Chronos`-like models use `device=\"auto\"` and `TimesFMModel` use `device=\"gpu\"`. As all models are initialized together, to reduce workload on gpu you can change device for one model type setting `generate_params={\"timesfm_device\": \"cpu\"}`" ] }, { "cell_type": "markdown", "id": "ea98123c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "After initialization `Auto` you can see pipelines which will be fitted." ] }, { "cell_type": "code", "execution_count": 27, "id": "89ab5f4a", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "[{'model': {'lag': 1, '_target_': 'etna.models.naive.NaiveModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'lag': 7, '_target_': 'etna.models.naive.NaiveModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'lag': 14, '_target_': 'etna.models.naive.NaiveModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'window': 3,\n", " 'seasonality': 7,\n", " '_target_': 'etna.models.seasonal_ma.SeasonalMovingAverageModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'strategy': 'forward_fill',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'},\n", " {'in_column': 'target',\n", " 'window_size': 21,\n", " 'distance_coef': 3,\n", " 'n_neighbors': 3,\n", " 'distance_func': 'absolute_difference',\n", " '_target_': 'etna.transforms.outliers.point_outliers.DensityOutliersTransform'},\n", " {'in_column': 'target',\n", " 'strategy': 'mean',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'window': 3,\n", " 'seasonality': ,\n", " '_target_': 'etna.models.deadline_ma.DeadlineMovingAverageModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'strategy': 'forward_fill',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'},\n", " {'in_column': 'target',\n", " 'window_size': 21,\n", " 'distance_coef': 3,\n", " 'n_neighbors': 3,\n", " 'distance_func': 'absolute_difference',\n", " '_target_': 'etna.transforms.outliers.point_outliers.DensityOutliersTransform'},\n", " {'in_column': 'target',\n", " 'strategy': 'mean',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'pipelines': [{'model': {'window': 1,\n", " 'seasonality': 14,\n", " '_target_': 'etna.models.seasonal_ma.SeasonalMovingAverageModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'window': 2,\n", " 'seasonality': 7,\n", " '_target_': 'etna.models.seasonal_ma.SeasonalMovingAverageModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'window': 7,\n", " 'seasonality': 7,\n", " '_target_': 'etna.models.seasonal_ma.SeasonalMovingAverageModel'},\n", " 'transforms': [],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'}],\n", " 'regressor': {'_target_': 'sklearn.ensemble._forest.RandomForestRegressor',\n", " 'bootstrap': True,\n", " 'ccp_alpha': 0.0,\n", " 'criterion': 'squared_error',\n", " 'max_depth': None,\n", " 'max_features': 1.0,\n", " 'max_leaf_nodes': None,\n", " 'max_samples': None,\n", " 'min_impurity_decrease': 0.0,\n", " 'min_samples_leaf': 1,\n", " 'min_samples_split': 2,\n", " 'min_weight_fraction_leaf': 0.0,\n", " 'monotonic_cst': None,\n", " 'n_estimators': 5,\n", " 'n_jobs': None,\n", " 'oob_score': False,\n", " 'random_state': None,\n", " 'verbose': 0,\n", " 'warm_start': False},\n", " 'n_folds': 3,\n", " 'n_jobs': 1,\n", " 'joblib_params': {'verbose': 11,\n", " 'backend': 'multiprocessing',\n", " 'mmap_mode': 'c'},\n", " '_target_': 'etna.ensembles.voting_ensemble.VotingEnsemble'},\n", " {'model': {'path_or_url': 'http://etna-github-prod.cdn-tinkoff.ru/chronos/chronos-bolt-tiny.zip',\n", " 'encoder_length': 2048,\n", " 'device': 'auto',\n", " 'dtype': 'float32',\n", " 'limit_prediction_length': False,\n", " 'batch_size': 128,\n", " 'cache_dir': '/Users/e.a.baturin/.etna/chronos-models/chronos-bolt',\n", " '_target_': 'etna.models.nn.chronos.chronos_bolt.ChronosBoltModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'strategy': 'mean',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'path_or_url': 'http://etna-github-prod.cdn-tinkoff.ru/chronos/chronos-bolt-mini.zip',\n", " 'encoder_length': 2048,\n", " 'device': 'auto',\n", " 'dtype': 'float32',\n", " 'limit_prediction_length': False,\n", " 'batch_size': 128,\n", " 'cache_dir': '/Users/e.a.baturin/.etna/chronos-models/chronos-bolt',\n", " '_target_': 'etna.models.nn.chronos.chronos_bolt.ChronosBoltModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'strategy': 'mean',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'},\n", " {'model': {'path_or_url': 'http://etna-github-prod.cdn-tinkoff.ru/chronos/chronos-bolt-small.zip',\n", " 'encoder_length': 2048,\n", " 'device': 'auto',\n", " 'dtype': 'float32',\n", " 'limit_prediction_length': False,\n", " 'batch_size': 128,\n", " 'cache_dir': '/Users/e.a.baturin/.etna/chronos-models/chronos-bolt',\n", " '_target_': 'etna.models.nn.chronos.chronos_bolt.ChronosBoltModel'},\n", " 'transforms': [{'in_column': 'target',\n", " 'strategy': 'mean',\n", " 'window': -1,\n", " 'seasonality': 1,\n", " 'constant_value': 0,\n", " '_target_': 'etna.transforms.missing_values.imputation.TimeSeriesImputerTransform'}],\n", " 'horizon': 14,\n", " '_target_': 'etna.pipeline.pipeline.Pipeline'}]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto.get_configs()" ] }, { "cell_type": "markdown", "id": "ae89f2e6", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's start the fitting. We can start by running only pool stage." ] }, { "cell_type": "code", "execution_count": 28, "id": "f2214a9e", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.\n" ] } ], "source": [ "%%capture\n", "best_pool_pipeline = auto.fit(ts=ts, tune_size=0)" ] }, { "cell_type": "code", "execution_count": 29, "id": "1fd516e0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hashpipelineSMAPE_meanelapsed_timestatestudy
85545ff3ef3cb3b2fc879b100afd9ca30Pipeline(model = ChronosBoltModel(path_or_url ...7.8973970.4461711pool
3d8215d95e2c6c9a4b4fdacf3fa77dddcPipeline(model = NaiveModel(lag = 7, ), transf...8.2663400.0357171pool
68b532534025945b07dedac76ad1e6207Pipeline(model = ChronosBoltModel(path_or_url ...8.5126190.4193101pool
007fbe9c7aeaf9f56701630c5897db4f2Pipeline(model = SeasonalMovingAverageModel(wi...9.1348000.1206821pool
7add1079869e9d79f001e8c2df6c64279VotingEnsemble(pipelines = [Pipeline(model = S...9.3515930.0753311pool
27899a017183d4dc042a70d5b38e7ed96Pipeline(model = ChronosBoltModel(path_or_url ...9.3692981.4024101pool
18b47488c737903536374d375f7c66f40Pipeline(model = NaiveModel(lag = 14, ), trans...9.5805480.0467691pool
4de05e6c98895bc7d9c54fefe8dedea7bPipeline(model = DeadlineMovingAverageModel(wi...14.1510980.1429921pool
553e90ae4cf7f1f71e6396107549c25efPipeline(model = NaiveModel(lag = 1, ), transf...22.1556400.0336791pool
\n", "
" ], "text/plain": [ " hash \\\n", "8 5545ff3ef3cb3b2fc879b100afd9ca30 \n", "3 d8215d95e2c6c9a4b4fdacf3fa77dddc \n", "6 8b532534025945b07dedac76ad1e6207 \n", "0 07fbe9c7aeaf9f56701630c5897db4f2 \n", "7 add1079869e9d79f001e8c2df6c64279 \n", "2 7899a017183d4dc042a70d5b38e7ed96 \n", "1 8b47488c737903536374d375f7c66f40 \n", "4 de05e6c98895bc7d9c54fefe8dedea7b \n", "5 53e90ae4cf7f1f71e6396107549c25ef \n", "\n", " pipeline SMAPE_mean \\\n", "8 Pipeline(model = ChronosBoltModel(path_or_url ... 7.897397 \n", "3 Pipeline(model = NaiveModel(lag = 7, ), transf... 8.266340 \n", "6 Pipeline(model = ChronosBoltModel(path_or_url ... 8.512619 \n", "0 Pipeline(model = SeasonalMovingAverageModel(wi... 9.134800 \n", "7 VotingEnsemble(pipelines = [Pipeline(model = S... 9.351593 \n", "2 Pipeline(model = ChronosBoltModel(path_or_url ... 9.369298 \n", "1 Pipeline(model = NaiveModel(lag = 14, ), trans... 9.580548 \n", "4 Pipeline(model = DeadlineMovingAverageModel(wi... 14.151098 \n", "5 Pipeline(model = NaiveModel(lag = 1, ), transf... 22.155640 \n", "\n", " elapsed_time state study \n", "8 0.446171 1 pool \n", "3 0.035717 1 pool \n", "6 0.419310 1 pool \n", "0 0.120682 1 pool \n", "7 0.075331 1 pool \n", "2 1.402410 1 pool \n", "1 0.046769 1 pool \n", "4 0.142992 1 pool \n", "5 0.033679 1 pool " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto.summary()[[\"hash\", \"pipeline\", \"SMAPE_mean\", \"elapsed_time\", \"state\", \"study\"]].sort_values(\"SMAPE_mean\")" ] }, { "cell_type": "code", "execution_count": 30, "id": "5fadbb8b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "best_pool_metrics = best_pool_pipeline.backtest(ts=full_ts, metrics=[SMAPE()], n_folds=5)[\"metrics\"]" ] }, { "cell_type": "code", "execution_count": 31, "id": "161471e0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best pool SMAPE: 5.691\n" ] } ], "source": [ "best_pool_smape = best_pool_metrics[\"SMAPE\"].mean()\n", "print(f\"Best pool SMAPE: {best_pool_smape:.3f}\")" ] }, { "cell_type": "markdown", "id": "4496dea3", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "### 2.3 Using custom pipeline pool " ] }, { "cell_type": "markdown", "id": "4fa5bd32", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We can define our own set of pipelines for the search." ] }, { "cell_type": "code", "execution_count": 32, "id": "9c7a153c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "custom_pool = [\n", " Pipeline(model=NaiveModel(lag=1), transforms=(), horizon=HORIZON),\n", " Pipeline(\n", " model=LinearPerSegmentModel(),\n", " transforms=[LagTransform(in_column=\"target\", lags=list(range(HORIZON, 2 * HORIZON)), out_column=\"target_lag\")],\n", " horizon=HORIZON,\n", " ),\n", " Pipeline(\n", " model=ProphetModel(),\n", " transforms=[],\n", " horizon=HORIZON,\n", " ),\n", "]" ] }, { "cell_type": "code", "execution_count": 33, "id": "15c0ed98", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "custom_auto = Auto(\n", " target_metric=SMAPE(),\n", " horizon=HORIZON,\n", " pool=custom_pool,\n", " backtest_params=dict(n_folds=1),\n", " storage=\"sqlite:///etna-auto-custom.db\",\n", ")\n", "best_custom_pool_pipeline = custom_auto.fit(ts=ts, tune_size=0)" ] }, { "cell_type": "code", "execution_count": 34, "id": "fe5977f8", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hashpipelineSMAPE_meanelapsed_timestatestudy
16e2eb71d033b6d0607f5b6d0a7596ce9Pipeline(model = ProphetModel(growth = 'linear...7.7847370.6523591pool
0d4b50dc4c1b7debb0355ebfbd9c39ffbPipeline(model = LinearPerSegmentModel(fit_int...8.5870040.1563821pool
253e90ae4cf7f1f71e6396107549c25efPipeline(model = NaiveModel(lag = 1, ), transf...22.1556400.0343371pool
\n", "
" ], "text/plain": [ " hash \\\n", "1 6e2eb71d033b6d0607f5b6d0a7596ce9 \n", "0 d4b50dc4c1b7debb0355ebfbd9c39ffb \n", "2 53e90ae4cf7f1f71e6396107549c25ef \n", "\n", " pipeline SMAPE_mean \\\n", "1 Pipeline(model = ProphetModel(growth = 'linear... 7.784737 \n", "0 Pipeline(model = LinearPerSegmentModel(fit_int... 8.587004 \n", "2 Pipeline(model = NaiveModel(lag = 1, ), transf... 22.155640 \n", "\n", " elapsed_time state study \n", "1 0.652359 1 pool \n", "0 0.156382 1 pool \n", "2 0.034337 1 pool " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "custom_auto.summary()[[\"hash\", \"pipeline\", \"SMAPE_mean\", \"elapsed_time\", \"state\", \"study\"]].sort_values(\"SMAPE_mean\")" ] }, { "cell_type": "markdown", "id": "e51b46e3", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "We can continue our training. The pool stage is over and there will be only the tuning stage. If we don't want to wait forever we should limit the tuning by fixing `n_trials` or `timeout`. \n", "\n", "We also set some parameters for `optuna.Study.optimize`: \n", "\n", "- `gc_after_trial=True`: to prevent `fit` from increasing memory consumption\n", "- `catch=(Exception,)`: to prevent failing if some trials are erroneous." ] }, { "cell_type": "code", "execution_count": 35, "id": "80b27cda", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "best_custom_tuning_pipeline = custom_auto.fit(ts=ts, tune_size=2, n_trials=10, gc_after_trial=True, catch=(Exception,))" ] }, { "cell_type": "markdown", "id": "a9c8a6cf", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "Let's look at the results." ] }, { "cell_type": "code", "execution_count": 36, "id": "1dda3b16", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hashpipelineSMAPE_meanelapsed_timestatestudy
80da73fb28ad659adcd67e73d6a12ffcbPipeline(model = ProphetModel(growth = 'linear...6.3503130.6817771tuning/6e2eb71d033b6d0607f5b6d0a7596ce9
706056568d43cc286f099fd767f9c4e01Pipeline(model = ProphetModel(growth = 'linear...6.8271160.9466321tuning/6e2eb71d033b6d0607f5b6d0a7596ce9
16e2eb71d033b6d0607f5b6d0a7596ce9Pipeline(model = ProphetModel(growth = 'linear...7.7847370.6523591pool
0d4b50dc4c1b7debb0355ebfbd9c39ffbPipeline(model = LinearPerSegmentModel(fit_int...8.5870040.1563821pool
4d4b50dc4c1b7debb0355ebfbd9c39ffbPipeline(model = LinearPerSegmentModel(fit_int...8.5870040.1818141tuning/d4b50dc4c1b7debb0355ebfbd9c39ffb
98eacbbe63deeee69548eb62734a428e1Pipeline(model = ProphetModel(growth = 'linear...8.7286271.9181301tuning/6e2eb71d033b6d0607f5b6d0a7596ce9
53a875591627f904e0d2f5b633ec986b1Pipeline(model = LinearPerSegmentModel(fit_int...9.1454370.1221331tuning/d4b50dc4c1b7debb0355ebfbd9c39ffb
253e90ae4cf7f1f71e6396107549c25efPipeline(model = NaiveModel(lag = 1, ), transf...22.1556400.0343371pool
\n", "
" ], "text/plain": [ " hash \\\n", "8 0da73fb28ad659adcd67e73d6a12ffcb \n", "7 06056568d43cc286f099fd767f9c4e01 \n", "1 6e2eb71d033b6d0607f5b6d0a7596ce9 \n", "0 d4b50dc4c1b7debb0355ebfbd9c39ffb \n", "4 d4b50dc4c1b7debb0355ebfbd9c39ffb \n", "9 8eacbbe63deeee69548eb62734a428e1 \n", "5 3a875591627f904e0d2f5b633ec986b1 \n", "2 53e90ae4cf7f1f71e6396107549c25ef \n", "\n", " pipeline SMAPE_mean \\\n", "8 Pipeline(model = ProphetModel(growth = 'linear... 6.350313 \n", "7 Pipeline(model = ProphetModel(growth = 'linear... 6.827116 \n", "1 Pipeline(model = ProphetModel(growth = 'linear... 7.784737 \n", "0 Pipeline(model = LinearPerSegmentModel(fit_int... 8.587004 \n", "4 Pipeline(model = LinearPerSegmentModel(fit_int... 8.587004 \n", "9 Pipeline(model = ProphetModel(growth = 'linear... 8.728627 \n", "5 Pipeline(model = LinearPerSegmentModel(fit_int... 9.145437 \n", "2 Pipeline(model = NaiveModel(lag = 1, ), transf... 22.155640 \n", "\n", " elapsed_time state study \n", "8 0.681777 1 tuning/6e2eb71d033b6d0607f5b6d0a7596ce9 \n", "7 0.946632 1 tuning/6e2eb71d033b6d0607f5b6d0a7596ce9 \n", "1 0.652359 1 pool \n", "0 0.156382 1 pool \n", "4 0.181814 1 tuning/d4b50dc4c1b7debb0355ebfbd9c39ffb \n", "9 1.918130 1 tuning/6e2eb71d033b6d0607f5b6d0a7596ce9 \n", "5 0.122133 1 tuning/d4b50dc4c1b7debb0355ebfbd9c39ffb \n", "2 0.034337 1 pool " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "custom_auto.summary()[[\"hash\", \"pipeline\", \"SMAPE_mean\", \"elapsed_time\", \"state\", \"study\"]].sort_values(\n", " \"SMAPE_mean\"\n", ").drop_duplicates(subset=(\"hash\", \"study\"))" ] }, { "cell_type": "code", "execution_count": 37, "id": "b1d4a0d4", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "[Pipeline(model = ProphetModel(growth = 'linear', changepoints = None, n_changepoints = 25, changepoint_range = 0.9187587557123996, yearly_seasonality = 'auto', weekly_seasonality = 'auto', daily_seasonality = 'auto', holidays = None, seasonality_mode = 'multiplicative', seasonality_prior_scale = 7.780155576901417, holidays_prior_scale = 0.3860866271460545, changepoint_prior_scale = 0.01083670267174957, mcmc_samples = 0, interval_width = 0.8, uncertainty_samples = 1000, stan_backend = None, additional_seasonality_params = (), timestamp_column = None, ), transforms = [], horizon = 14, ),\n", " Pipeline(model = ProphetModel(growth = 'linear', changepoints = None, n_changepoints = 25, changepoint_range = 0.8635482199008357, yearly_seasonality = 'auto', weekly_seasonality = 'auto', daily_seasonality = 'auto', holidays = None, seasonality_mode = 'multiplicative', seasonality_prior_scale = 0.6431172050131991, holidays_prior_scale = 0.8663279761354559, changepoint_prior_scale = 0.029554483012804774, mcmc_samples = 0, interval_width = 0.8, uncertainty_samples = 1000, stan_backend = None, additional_seasonality_params = (), timestamp_column = None, ), transforms = [], horizon = 14, ),\n", " Pipeline(model = ProphetModel(growth = 'linear', changepoints = None, n_changepoints = 25, changepoint_range = 0.8, yearly_seasonality = 'auto', weekly_seasonality = 'auto', daily_seasonality = 'auto', holidays = None, seasonality_mode = 'additive', seasonality_prior_scale = 10.0, holidays_prior_scale = 10.0, changepoint_prior_scale = 0.05, mcmc_samples = 0, interval_width = 0.8, uncertainty_samples = 1000, stan_backend = None, additional_seasonality_params = (), timestamp_column = None, ), transforms = [], horizon = 14, ),\n", " Pipeline(model = LinearPerSegmentModel(fit_intercept = True, ), transforms = [LagTransform(in_column = 'target', lags = [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27], out_column = 'target_lag', )], horizon = 14, ),\n", " Pipeline(model = ProphetModel(growth = 'linear', changepoints = None, n_changepoints = 25, changepoint_range = 0.8030327596160489, yearly_seasonality = 'auto', weekly_seasonality = 'auto', daily_seasonality = 'auto', holidays = None, seasonality_mode = 'multiplicative', seasonality_prior_scale = 0.0163345876110695, holidays_prior_scale = 3.146730406166005, changepoint_prior_scale = 0.001718538897359816, mcmc_samples = 0, interval_width = 0.8, uncertainty_samples = 1000, stan_backend = None, additional_seasonality_params = (), timestamp_column = None, ), transforms = [], horizon = 14, )]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "custom_auto.top_k(k=5)" ] }, { "cell_type": "markdown", "id": "488a749b", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "If we look at `study` column we will see that best trial from tuning stage is better then best trial from pool stage. It means, that tuning stage was successful and improved the final result. \n", "\n", "Let's compare best pipeline on pool and tuning stages on hold-out part of initial `ts`." ] }, { "cell_type": "code", "execution_count": 38, "id": "5b5b5d22", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "%%capture\n", "best_custom_pool_metrics = best_custom_pool_pipeline.backtest(ts=full_ts, metrics=[SMAPE()], n_folds=5)[\"metrics\"]\n", "best_custom_tuning_metrics = best_custom_tuning_pipeline.backtest(ts=full_ts, metrics=[SMAPE()], n_folds=5)[\"metrics\"]" ] }, { "cell_type": "code", "execution_count": 39, "id": "ae1f4609", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best pool SMAPE: 7.796\n", "Best tuning SMAPE: 7.712\n" ] } ], "source": [ "best_custom_pool_smape = best_custom_pool_metrics[\"SMAPE\"].mean()\n", "best_custom_tuning_smape = best_custom_tuning_metrics[\"SMAPE\"].mean()\n", "print(f\"Best pool SMAPE: {best_custom_pool_smape:.3f}\")\n", "print(f\"Best tuning SMAPE: {best_custom_tuning_smape:.3f}\")" ] }, { "cell_type": "markdown", "id": "d490acff", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%% md\n" } }, "source": [ "As we can see, the results after the tuning stage are a little bit better." ] }, { "cell_type": "markdown", "id": "3322d9c2", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 3. Summary " ] }, { "cell_type": "markdown", "id": "39b4a081", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "In this notebook we discussed how AutoML works in ETNA library and how to use it. There are two supported scenarios:\n", "\n", "- Tuning your existing pipeline;\n", "- Automatic search of the pipeline for your forecasting task." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 5 }