
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/preprocessing/plot_target_encoder.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_preprocessing_plot_target_encoder.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_preprocessing_plot_target_encoder.py:


============================================
Comparing Target Encoder with Other Encoders
============================================

.. currentmodule:: sklearn.preprocessing

The :class:`TargetEncoder` uses the value of the target to encode each
categorical feature. In this example, we will compare three different approaches
for handling categorical features: :class:`TargetEncoder`,
:class:`OrdinalEncoder`, :class:`OneHotEncoder` and dropping the category.

.. note::
    `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
    cross fitting scheme is used in `fit_transform` for encoding. See the
    :ref:`User Guide <target_encoder>` for details.

.. GENERATED FROM PYTHON SOURCE LINES 18-22

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 23-27

Loading Data from OpenML
========================
First, we load the wine reviews dataset, where the target is the points given
be a reviewer:

.. GENERATED FROM PYTHON SOURCE LINES 27-34

.. code-block:: Python

    from sklearn.datasets import fetch_openml

    wine_reviews = fetch_openml(data_id=42074, as_frame=True)

    df = wine_reviews.frame
    df.head()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>country</th>
          <th>description</th>
          <th>designation</th>
          <th>points</th>
          <th>price</th>
          <th>province</th>
          <th>region_1</th>
          <th>region_2</th>
          <th>variety</th>
          <th>winery</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>US</td>
          <td>This tremendous 100% varietal wine hails from ...</td>
          <td>Martha's Vineyard</td>
          <td>96</td>
          <td>235.0</td>
          <td>California</td>
          <td>Napa Valley</td>
          <td>Napa</td>
          <td>Cabernet Sauvignon</td>
          <td>Heitz</td>
        </tr>
        <tr>
          <th>1</th>
          <td>Spain</td>
          <td>Ripe aromas of fig, blackberry and cassis are ...</td>
          <td>Carodorum Selección Especial Reserva</td>
          <td>96</td>
          <td>110.0</td>
          <td>Northern Spain</td>
          <td>Toro</td>
          <td>NaN</td>
          <td>Tinta de Toro</td>
          <td>Bodega Carmen Rodríguez</td>
        </tr>
        <tr>
          <th>2</th>
          <td>US</td>
          <td>Mac Watson honors the memory of a wine once ma...</td>
          <td>Special Selected Late Harvest</td>
          <td>96</td>
          <td>90.0</td>
          <td>California</td>
          <td>Knights Valley</td>
          <td>Sonoma</td>
          <td>Sauvignon Blanc</td>
          <td>Macauley</td>
        </tr>
        <tr>
          <th>3</th>
          <td>US</td>
          <td>This spent 20 months in 30% new French oak, an...</td>
          <td>Reserve</td>
          <td>96</td>
          <td>65.0</td>
          <td>Oregon</td>
          <td>Willamette Valley</td>
          <td>Willamette Valley</td>
          <td>Pinot Noir</td>
          <td>Ponzi</td>
        </tr>
        <tr>
          <th>4</th>
          <td>France</td>
          <td>This is the top wine from La Bégude, named aft...</td>
          <td>La Brûlade</td>
          <td>95</td>
          <td>66.0</td>
          <td>Provence</td>
          <td>Bandol</td>
          <td>NaN</td>
          <td>Provence red blend</td>
          <td>Domaine de la Bégude</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 35-37

For this example, we use the following subset of numerical and categorical
features in the data. The target are continuous values from 80 to 100:

.. GENERATED FROM PYTHON SOURCE LINES 37-53

.. code-block:: Python

    numerical_features = ["price"]
    categorical_features = [
        "country",
        "province",
        "region_1",
        "region_2",
        "variety",
        "winery",
    ]
    target_name = "points"

    X = df[numerical_features + categorical_features]
    y = df[target_name]

    _ = y.hist()




.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_target_encoder_001.png
   :alt: plot target encoder
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_target_encoder_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 54-60

Training and Evaluating Pipelines with Different Encoders
=========================================================
In this section, we will evaluate pipelines with
:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with different encoding
strategies. First, we list out the encoders we will be using to preprocess
the categorical features:

.. GENERATED FROM PYTHON SOURCE LINES 60-73

.. code-block:: Python

    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, TargetEncoder

    categorical_preprocessors = [
        ("drop", "drop"),
        ("ordinal", OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1)),
        (
            "one_hot",
            OneHotEncoder(handle_unknown="ignore", max_categories=20, sparse_output=False),
        ),
        ("target", TargetEncoder(target_type="continuous")),
    ]








.. GENERATED FROM PYTHON SOURCE LINES 74-75

Next, we evaluate the models using cross validation and record the results:

.. GENERATED FROM PYTHON SOURCE LINES 75-119

.. code-block:: Python

    from sklearn.ensemble import HistGradientBoostingRegressor
    from sklearn.model_selection import cross_validate
    from sklearn.pipeline import make_pipeline

    n_cv_folds = 3
    max_iter = 20
    results = []


    def evaluate_model_and_store(name, pipe):
        result = cross_validate(
            pipe,
            X,
            y,
            scoring="neg_root_mean_squared_error",
            cv=n_cv_folds,
            return_train_score=True,
        )
        rmse_test_score = -result["test_score"]
        rmse_train_score = -result["train_score"]
        results.append(
            {
                "preprocessor": name,
                "rmse_test_mean": rmse_test_score.mean(),
                "rmse_test_std": rmse_train_score.std(),
                "rmse_train_mean": rmse_train_score.mean(),
                "rmse_train_std": rmse_train_score.std(),
            }
        )


    for name, categorical_preprocessor in categorical_preprocessors:
        preprocessor = ColumnTransformer(
            [
                ("numerical", "passthrough", numerical_features),
                ("categorical", categorical_preprocessor, categorical_features),
            ]
        )
        pipe = make_pipeline(
            preprocessor, HistGradientBoostingRegressor(random_state=0, max_iter=max_iter)
        )
        evaluate_model_and_store(name, pipe)









.. GENERATED FROM PYTHON SOURCE LINES 120-126

Native Categorical Feature Support
==================================
In this section, we build and evaluate a pipeline that uses native categorical
feature support in :class:`~sklearn.ensemble.HistGradientBoostingRegressor`,
which only supports up to 255 unique categories. In our dataset, the most of
the categorical features have more than 255 unique categories:

.. GENERATED FROM PYTHON SOURCE LINES 126-129

.. code-block:: Python

    n_unique_categories = df[categorical_features].nunique().sort_values(ascending=False)
    n_unique_categories





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    winery      14810
    region_1     1236
    variety       632
    province      455
    country        48
    region_2       18
    dtype: int64



.. GENERATED FROM PYTHON SOURCE LINES 130-134

To workaround the limitation above, we group the categorical features into
low cardinality and high cardinality features. The high cardinality features
will be target encoded and the low cardinality features will use the native
categorical feature in gradient boosting.

.. GENERATED FROM PYTHON SOURCE LINES 134-164

.. code-block:: Python

    high_cardinality_features = n_unique_categories[n_unique_categories > 255].index
    low_cardinality_features = n_unique_categories[n_unique_categories <= 255].index
    mixed_encoded_preprocessor = ColumnTransformer(
        [
            ("numerical", "passthrough", numerical_features),
            (
                "high_cardinality",
                TargetEncoder(target_type="continuous"),
                high_cardinality_features,
            ),
            (
                "low_cardinality",
                OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1),
                low_cardinality_features,
            ),
        ],
        verbose_feature_names_out=False,
    )

    # The output of the of the preprocessor must be set to pandas so the
    # gradient boosting model can detect the low cardinality features.
    mixed_encoded_preprocessor.set_output(transform="pandas")
    mixed_pipe = make_pipeline(
        mixed_encoded_preprocessor,
        HistGradientBoostingRegressor(
            random_state=0, max_iter=max_iter, categorical_features=low_cardinality_features
        ),
    )
    mixed_pipe






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-80 {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: #000;
      --sklearn-color-text-muted: #666;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;
    }

    #sk-container-id-80.light {
      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: black;
      --sklearn-color-background: white;
      --sklearn-color-border-box: black;
      --sklearn-color-icon: #696969;
    }

    #sk-container-id-80.dark {
      --sklearn-color-text-on-default-background: white;
      --sklearn-color-background: #111;
      --sklearn-color-border-box: white;
      --sklearn-color-icon: #878787;
    }

    #sk-container-id-80 {
      color: var(--sklearn-color-text);
    }

    #sk-container-id-80 pre {
      padding: 0;
    }

    #sk-container-id-80 input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    #sk-container-id-80 div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    #sk-container-id-80 div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    #sk-container-id-80 div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    #sk-container-id-80 div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    #sk-container-id-80 div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    #sk-container-id-80 div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    #sk-container-id-80 div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    #sk-container-id-80 div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    #sk-container-id-80 div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    #sk-container-id-80 div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    #sk-container-id-80 div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    #sk-container-id-80 label.sk-toggleable__label {
      cursor: pointer;
      display: flex;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
      align-items: center;
      justify-content: center;
      gap: 0.5em;
    }

    #sk-container-id-80 label.sk-toggleable__label .caption {
      font-size: 0.6rem;
      font-weight: lighter;
      color: var(--sklearn-color-text-muted);
    }

    #sk-container-id-80 label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    #sk-container-id-80 label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    #sk-container-id-80 div.sk-toggleable__content {
      display: none;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-80 div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-80 div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-80 div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-80 input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      display: block;
      width: 100%;
      overflow: visible;
    }

    #sk-container-id-80 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    #sk-container-id-80 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-80 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    #sk-container-id-80 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-80 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    #sk-container-id-80 div.sk-label label.sk-toggleable__label,
    #sk-container-id-80 div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    #sk-container-id-80 div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    #sk-container-id-80 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    #sk-container-id-80 div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      line-height: 1.2em;
    }

    #sk-container-id-80 div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    #sk-container-id-80 div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-80 div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    #sk-container-id-80 div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-80 div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 0.5em;
      text-align: center;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-3) 1pt solid;
      color: var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3) 1pt solid;
      color: var(--sklearn-color-fitted-level-3);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-unfitted-level-0);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-fitted-level-0);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    #sk-container-id-80 a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    #sk-container-id-80 a.estimator_doc_link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    #sk-container-id-80 a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    #sk-container-id-80 a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }

    .estimator-table {
        font-family: monospace;
    }

    .estimator-table summary {
        padding: .5rem;
        cursor: pointer;
    }

    .estimator-table summary::marker {
        font-size: 0.7rem;
    }

    .estimator-table details[open] {
        padding-left: 0.1rem;
        padding-right: 0.1rem;
        padding-bottom: 0.3rem;
    }

    .estimator-table .parameters-table {
        margin-left: auto !important;
        margin-right: auto !important;
        margin-top: 0;
    }

    .estimator-table .parameters-table tr:nth-child(odd) {
        background-color: #fff;
    }

    .estimator-table .parameters-table tr:nth-child(even) {
        background-color: #f6f6f6;
    }

    .estimator-table .parameters-table tr:hover {
        background-color: #e0e0e0;
    }

    .estimator-table table td {
        border: 1px solid rgba(106, 105, 104, 0.232);
    }

    /*
        `table td`is set in notebook with right text-align.
        We need to overwrite it.
    */
    .estimator-table table td.param {
        text-align: left;
        position: relative;
        padding: 0;
    }

    .user-set td {
        color:rgb(255, 94, 0);
        text-align: left !important;
    }

    .user-set td.value {
        color:rgb(255, 94, 0);
        background-color: transparent;
    }

    .default td {
        color: black;
        text-align: left !important;
    }

    .user-set td i,
    .default td i {
        color: black;
    }

    /*
        Styles for parameter documentation links
        We need styling for visited so jupyter doesn't overwrite it
    */
    a.param-doc-link,
    a.param-doc-link:link,
    a.param-doc-link:visited {
        text-decoration: underline dashed;
        text-underline-offset: .3em;
        color: inherit;
        display: block;
        padding: .5em;
    }

    /* "hack" to make the entire area of the cell containing the link clickable */
    a.param-doc-link::before {
        position: absolute;
        content: "";
        inset: 0;
    }

    .param-doc-description {
        display: none;
        position: absolute;
        z-index: 9999;
        left: 0;
        padding: .5ex;
        margin-left: 1.5em;
        color: var(--sklearn-color-text);
        box-shadow: .3em .3em .4em #999;
        width: max-content;
        text-align: left;
        max-height: 10em;
        overflow-y: auto;

        /* unfitted */
        background: var(--sklearn-color-unfitted-level-0);
        border: thin solid var(--sklearn-color-unfitted-level-3);
    }

    /* Fitted state for parameter tooltips */
    .fitted .param-doc-description {
        /* fitted */
        background: var(--sklearn-color-fitted-level-0);
        border: thin solid var(--sklearn-color-fitted-level-3);
    }

    .param-doc-link:hover .param-doc-description {
        display: block;
    }

    .copy-paste-icon {
        background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0NDggNTEyIj48IS0tIUZvbnQgQXdlc29tZSBGcmVlIDYuNy4yIGJ5IEBmb250YXdlc29tZSAtIGh0dHBzOi8vZm9udGF3ZXNvbWUuY29tIExpY2Vuc2UgLSBodHRwczovL2ZvbnRhd2Vzb21lLmNvbS9saWNlbnNlL2ZyZWUgQ29weXJpZ2h0IDIwMjUgRm9udGljb25zLCBJbmMuLS0+PHBhdGggZD0iTTIwOCAwTDMzMi4xIDBjMTIuNyAwIDI0LjkgNS4xIDMzLjkgMTQuMWw2Ny45IDY3LjljOSA5IDE0LjEgMjEuMiAxNC4xIDMzLjlMNDQ4IDMzNmMwIDI2LjUtMjEuNSA0OC00OCA0OGwtMTkyIDBjLTI2LjUgMC00OC0yMS41LTQ4LTQ4bDAtMjg4YzAtMjYuNSAyMS41LTQ4IDQ4LTQ4ek00OCAxMjhsODAgMCAwIDY0LTY0IDAgMCAyNTYgMTkyIDAgMC0zMiA2NCAwIDAgNDhjMCAyNi41LTIxLjUgNDgtNDggNDhMNDggNTEyYy0yNi41IDAtNDgtMjEuNS00OC00OEwwIDE3NmMwLTI2LjUgMjEuNS00OCA0OC00OHoiLz48L3N2Zz4=);
        background-repeat: no-repeat;
        background-size: 14px 14px;
        background-position: 0;
        display: inline-block;
        width: 14px;
        height: 14px;
        cursor: pointer;
    }
    </style><body><div id="sk-container-id-80" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;columntransformer&#x27;,
                     ColumnTransformer(transformers=[(&#x27;numerical&#x27;, &#x27;passthrough&#x27;,
                                                      [&#x27;price&#x27;]),
                                                     (&#x27;high_cardinality&#x27;,
                                                      TargetEncoder(target_type=&#x27;continuous&#x27;),
                                                      Index([&#x27;winery&#x27;, &#x27;region_1&#x27;, &#x27;variety&#x27;, &#x27;province&#x27;], dtype=&#x27;object&#x27;)),
                                                     (&#x27;low_cardinality&#x27;,
                                                      OrdinalEncoder(handle_unknown=&#x27;use_encoded_value&#x27;,
                                                                     unknown_value=-1),
                                                      Index([&#x27;country&#x27;, &#x27;region_2&#x27;], dtype=&#x27;object&#x27;))],
                                       verbose_feature_names_out=False)),
                    (&#x27;histgradientboostingregressor&#x27;,
                     HistGradientBoostingRegressor(categorical_features=Index([&#x27;country&#x27;, &#x27;region_2&#x27;], dtype=&#x27;object&#x27;),
                                                   max_iter=20, random_state=0))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-346" type="checkbox" ><label for="sk-estimator-id-346" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>Pipeline</div></div><div><a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.pipeline.Pipeline.html">?<span>Documentation for Pipeline</span></a><span class="sk-estimator-doc-link ">i<span>Not fitted</span></span></div></label><div class="sk-toggleable__content " data-param-prefix="">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('steps',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=steps,-list%20of%20tuples">
                steps
                <span class="param-doc-description">steps: list of tuples<br><br>List of (name of step, estimator) tuples that are to be chained in<br>sequential order. To be compatible with the scikit-learn API, all steps<br>must define `fit`. All non-last steps must also define `transform`. See<br>:ref:`Combining Estimators <combining_estimators>` for more details.</span>
            </a>
        </td>
                <td class="value">[(&#x27;columntransformer&#x27;, ...), (&#x27;histgradientboostingregressor&#x27;, ...)]</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('transform_input',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=transform_input,-list%20of%20str%2C%20default%3DNone">
                transform_input
                <span class="param-doc-description">transform_input: list of str, default=None<br><br>The names of the :term:`metadata` parameters that should be transformed by the<br>pipeline before passing it to the step consuming it.<br><br>This enables transforming some input arguments to ``fit`` (other than ``X``)<br>to be transformed by the steps of the pipeline up to the step which requires<br>them. Requirement is defined via :ref:`metadata routing <metadata_routing>`.<br>For instance, this can be used to pass a validation set through the pipeline.<br><br>You can only set this if metadata routing is enabled, which you<br>can enable using ``sklearn.set_config(enable_metadata_routing=True)``.<br><br>.. versionadded:: 1.6</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('memory',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=memory,-str%20or%20object%20with%20the%20joblib.Memory%20interface%2C%20default%3DNone">
                memory
                <span class="param-doc-description">memory: str or object with the joblib.Memory interface, default=None<br><br>Used to cache the fitted transformers of the pipeline. The last step<br>will never be cached, even if it is a transformer. By default, no<br>caching is performed. If a string is given, it is the path to the<br>caching directory. Enabling caching triggers a clone of the transformers<br>before fitting. Therefore, the transformer instance given to the<br>pipeline cannot be inspected directly. Use the attribute ``named_steps``<br>or ``steps`` to inspect estimators within the pipeline. Caching the<br>transformers is advantageous when fitting is time consuming. See<br>:ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py`<br>for an example on how to enable caching.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=verbose,-bool%2C%20default%3DFalse">
                verbose
                <span class="param-doc-description">verbose: bool, default=False<br><br>If True, the time elapsed while fitting each step will be printed as it<br>is completed.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-serial"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-347" type="checkbox" ><label for="sk-estimator-id-347" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>columntransformer: ColumnTransformer</div></div><div><a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html">?<span>Documentation for columntransformer: ColumnTransformer</span></a></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('transformers',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=transformers,-list%20of%20tuples">
                transformers
                <span class="param-doc-description">transformers: list of tuples<br><br>List of (name, transformer, columns) tuples specifying the<br>transformer objects to be applied to subsets of the data.<br><br>name : str<br>    Like in Pipeline and FeatureUnion, this allows the transformer and<br>    its parameters to be set using ``set_params`` and searched in grid<br>    search.<br>transformer : {'drop', 'passthrough'} or estimator<br>    Estimator must support :term:`fit` and :term:`transform`.<br>    Special-cased strings 'drop' and 'passthrough' are accepted as<br>    well, to indicate to drop the columns or to pass them through<br>    untransformed, respectively.<br>columns :  str, array-like of str, int, array-like of int,                 array-like of bool, slice or callable<br>    Indexes the data on its second axis. Integers are interpreted as<br>    positional columns, while strings can reference DataFrame columns<br>    by name.  A scalar string or int should be used where<br>    ``transformer`` expects X to be a 1d array-like (vector),<br>    otherwise a 2d array will be passed to the transformer.<br>    A callable is passed the input data `X` and can return any of the<br>    above. To select multiple columns by name or dtype, you can use<br>    :obj:`make_column_selector`.</span>
            </a>
        </td>
                <td class="value">[(&#x27;numerical&#x27;, ...), (&#x27;high_cardinality&#x27;, ...), ...]</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('remainder',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=remainder,-%7B%27drop%27%2C%20%27passthrough%27%7D%20or%20estimator%2C%20default%3D%27drop%27">
                remainder
                <span class="param-doc-description">remainder: {'drop', 'passthrough'} or estimator, default='drop'<br><br>By default, only the specified columns in `transformers` are<br>transformed and combined in the output, and the non-specified<br>columns are dropped. (default of ``'drop'``).<br>By specifying ``remainder='passthrough'``, all remaining columns that<br>were not specified in `transformers`, but present in the data passed<br>to `fit` will be automatically passed through. This subset of columns<br>is concatenated with the output of the transformers. For dataframes,<br>extra columns not seen during `fit` will be excluded from the output<br>of `transform`.<br>By setting ``remainder`` to be an estimator, the remaining<br>non-specified columns will use the ``remainder`` estimator. The<br>estimator must support :term:`fit` and :term:`transform`.<br>Note that using this feature requires that the DataFrame columns<br>input at :term:`fit` and :term:`transform` have identical order.</span>
            </a>
        </td>
                <td class="value">&#x27;drop&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('sparse_threshold',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=sparse_threshold,-float%2C%20default%3D0.3">
                sparse_threshold
                <span class="param-doc-description">sparse_threshold: float, default=0.3<br><br>If the output of the different transformers contains sparse matrices,<br>these will be stacked as a sparse matrix if the overall density is<br>lower than this value. Use ``sparse_threshold=0`` to always return<br>dense.  When the transformed output consists of all dense data, the<br>stacked result will be dense, and this keyword will be ignored.</span>
            </a>
        </td>
                <td class="value">0.3</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_jobs',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=n_jobs,-int%2C%20default%3DNone">
                n_jobs
                <span class="param-doc-description">n_jobs: int, default=None<br><br>Number of jobs to run in parallel.<br>``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.<br>``-1`` means using all processors. See :term:`Glossary <n_jobs>`<br>for more details.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('transformer_weights',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=transformer_weights,-dict%2C%20default%3DNone">
                transformer_weights
                <span class="param-doc-description">transformer_weights: dict, default=None<br><br>Multiplicative weights for features per transformer. The output of the<br>transformer is multiplied by these weights. Keys are transformer names,<br>values the weights.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=verbose,-bool%2C%20default%3DFalse">
                verbose
                <span class="param-doc-description">verbose: bool, default=False<br><br>If True, the time elapsed while fitting each transformer will be<br>printed as it is completed.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose_feature_names_out',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=verbose_feature_names_out,-bool%2C%20str%20or%20Callable%5B%5Bstr%2C%20str%5D%2C%20str%5D%2C%20default%3DTrue">
                verbose_feature_names_out
                <span class="param-doc-description">verbose_feature_names_out: bool, str or Callable[[str, str], str], default=True<br><br>- If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix<br>  all feature names with the name of the transformer that generated that<br>  feature. It is equivalent to setting<br>  `verbose_feature_names_out="{transformer_name}__{feature_name}"`.<br>- If False, :meth:`ColumnTransformer.get_feature_names_out` will not<br>  prefix any feature names and will error if feature names are not<br>  unique.<br>- If ``Callable[[str, str], str]``,<br>  :meth:`ColumnTransformer.get_feature_names_out` will rename all the features<br>  using the name of the transformer. The first argument of the callable is the<br>  transformer name and the second argument is the feature name. The returned<br>  string will be the new feature name.<br>- If ``str``, it must be a string ready for formatting. The given string will<br>  be formatted using two field names: ``transformer_name`` and ``feature_name``.<br>  e.g. ``"{feature_name}__{transformer_name}"``. See :meth:`str.format` method<br>  from the standard library for more info.<br><br>.. versionadded:: 1.0<br><br>.. versionchanged:: 1.6<br>    `verbose_feature_names_out` can be a callable or a string to be formatted.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('force_int_remainder_cols',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.compose.ColumnTransformer.html#:~:text=force_int_remainder_cols,-bool%2C%20default%3DFalse">
                force_int_remainder_cols
                <span class="param-doc-description">force_int_remainder_cols: bool, default=False<br><br>This parameter has no effect.<br><br>.. note::<br>    If you do not access the list of columns for the remainder columns<br>    in the `transformers_` fitted attribute, you do not need to set<br>    this parameter.<br><br>.. versionadded:: 1.5<br><br>.. versionchanged:: 1.7<br>   The default value for `force_int_remainder_cols` will change from<br>   `True` to `False` in version 1.7.<br><br>.. deprecated:: 1.7<br>   `force_int_remainder_cols` is deprecated and will be removed in 1.9.</span>
            </a>
        </td>
                <td class="value">&#x27;deprecated&#x27;</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-348" type="checkbox" ><label for="sk-estimator-id-348" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>numerical</div></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__numerical__"><pre>[&#x27;price&#x27;]</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-349" type="checkbox" ><label for="sk-estimator-id-349" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>passthrough</div></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__numerical__"><pre>passthrough</pre></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-350" type="checkbox" ><label for="sk-estimator-id-350" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>high_cardinality</div></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__high_cardinality__"><pre>Index([&#x27;winery&#x27;, &#x27;region_1&#x27;, &#x27;variety&#x27;, &#x27;province&#x27;], dtype=&#x27;object&#x27;)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-351" type="checkbox" ><label for="sk-estimator-id-351" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>TargetEncoder</div></div><div><a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html">?<span>Documentation for TargetEncoder</span></a></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__high_cardinality__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('categories',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=categories,-%22auto%22%20or%20list%20of%20shape%20%28n_features%2C%29%20of%20array-like%2C%20default%3D%22auto%22">
                categories
                <span class="param-doc-description">categories: "auto" or list of shape (n_features,) of array-like, default="auto"<br><br>Categories (unique values) per feature:<br><br>- `"auto"` : Determine categories automatically from the training data.<br>- list : `categories[i]` holds the categories expected in the i-th column. The<br>  passed categories should not mix strings and numeric values within a single<br>  feature, and should be sorted in case of numeric values.<br><br>The used categories are stored in the `categories_` fitted attribute.</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('target_type',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=target_type,-%7B%22auto%22%2C%20%22continuous%22%2C%20%22binary%22%2C%20%22multiclass%22%7D%2C%20default%3D%22auto%22">
                target_type
                <span class="param-doc-description">target_type: {"auto", "continuous", "binary", "multiclass"}, default="auto"<br><br>Type of target.<br><br>- `"auto"` : Type of target is inferred with<br>  :func:`~sklearn.utils.multiclass.type_of_target`.<br>- `"continuous"` : Continuous target<br>- `"binary"` : Binary target<br>- `"multiclass"` : Multiclass target<br><br>.. note::<br>    The type of target inferred with `"auto"` may not be the desired target<br>    type used for modeling. For example, if the target consisted of integers<br>    between 0 and 100, then :func:`~sklearn.utils.multiclass.type_of_target`<br>    will infer the target as `"multiclass"`. In this case, setting<br>    `target_type="continuous"` will specify the target as a regression<br>    problem. The `target_type_` attribute gives the target type used by the<br>    encoder.<br><br>.. versionchanged:: 1.4<br>   Added the option 'multiclass'.</span>
            </a>
        </td>
                <td class="value">&#x27;continuous&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('smooth',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=smooth,-%22auto%22%20or%20float%2C%20default%3D%22auto%22">
                smooth
                <span class="param-doc-description">smooth: "auto" or float, default="auto"<br><br>The amount of mixing of the target mean conditioned on the value of the<br>category with the global target mean. A larger `smooth` value will put<br>more weight on the global target mean.<br>If `"auto"`, then `smooth` is set to an empirical Bayes estimate.</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('cv',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=cv,-int%2C%20default%3D5">
                cv
                <span class="param-doc-description">cv: int, default=5<br><br>Determines the number of folds in the :term:`cross fitting` strategy used in<br>:meth:`fit_transform`. For classification targets, `StratifiedKFold` is used<br>and for continuous targets, `KFold` is used.</span>
            </a>
        </td>
                <td class="value">5</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('shuffle',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=shuffle,-bool%2C%20default%3DTrue">
                shuffle
                <span class="param-doc-description">shuffle: bool, default=True<br><br>Whether to shuffle the data in :meth:`fit_transform` before splitting into<br>folds. Note that the samples within each split will not be shuffled.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('random_state',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.TargetEncoder.html#:~:text=random_state,-int%2C%20RandomState%20instance%20or%20None%2C%20default%3DNone">
                random_state
                <span class="param-doc-description">random_state: int, RandomState instance or None, default=None<br><br>When `shuffle` is True, `random_state` affects the ordering of the<br>indices, which controls the randomness of each fold. Otherwise, this<br>parameter has no effect.<br>Pass an int for reproducible output across multiple function calls.<br>See :term:`Glossary <random_state>`.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-352" type="checkbox" ><label for="sk-estimator-id-352" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>low_cardinality</div></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__low_cardinality__"><pre>Index([&#x27;country&#x27;, &#x27;region_2&#x27;], dtype=&#x27;object&#x27;)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-353" type="checkbox" ><label for="sk-estimator-id-353" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>OrdinalEncoder</div></div><div><a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html">?<span>Documentation for OrdinalEncoder</span></a></div></label><div class="sk-toggleable__content " data-param-prefix="columntransformer__low_cardinality__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('categories',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=categories,-%27auto%27%20or%20a%20list%20of%20array-like%2C%20default%3D%27auto%27">
                categories
                <span class="param-doc-description">categories: 'auto' or a list of array-like, default='auto'<br><br>Categories (unique values) per feature:<br><br>- 'auto' : Determine categories automatically from the training data.<br>- list : ``categories[i]`` holds the categories expected in the ith<br>  column. The passed categories should not mix strings and numeric<br>  values, and should be sorted in case of numeric values.<br><br>The used categories can be found in the ``categories_`` attribute.</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('dtype',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=dtype,-number%20type%2C%20default%3Dnp.float64">
                dtype
                <span class="param-doc-description">dtype: number type, default=np.float64<br><br>Desired dtype of output.</span>
            </a>
        </td>
                <td class="value">&lt;class &#x27;numpy.float64&#x27;&gt;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('handle_unknown',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=handle_unknown,-%7B%27error%27%2C%20%27use_encoded_value%27%7D%2C%20default%3D%27error%27">
                handle_unknown
                <span class="param-doc-description">handle_unknown: {'error', 'use_encoded_value'}, default='error'<br><br>When set to 'error' an error will be raised in case an unknown<br>categorical feature is present during transform. When set to<br>'use_encoded_value', the encoded value of unknown categories will be<br>set to the value given for the parameter `unknown_value`. In<br>:meth:`inverse_transform`, an unknown category will be denoted as None.<br><br>.. versionadded:: 0.24</span>
            </a>
        </td>
                <td class="value">&#x27;use_encoded_value&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('unknown_value',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=unknown_value,-int%20or%20np.nan%2C%20default%3DNone">
                unknown_value
                <span class="param-doc-description">unknown_value: int or np.nan, default=None<br><br>When the parameter handle_unknown is set to 'use_encoded_value', this<br>parameter is required and will set the encoded value of unknown<br>categories. It has to be distinct from the values used to encode any of<br>the categories in `fit`. If set to np.nan, the `dtype` parameter must<br>be a float dtype.<br><br>.. versionadded:: 0.24</span>
            </a>
        </td>
                <td class="value">-1</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('encoded_missing_value',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=encoded_missing_value,-int%20or%20np.nan%2C%20default%3Dnp.nan">
                encoded_missing_value
                <span class="param-doc-description">encoded_missing_value: int or np.nan, default=np.nan<br><br>Encoded value of missing categories. If set to `np.nan`, then the `dtype`<br>parameter must be a float dtype.<br><br>.. versionadded:: 1.1</span>
            </a>
        </td>
                <td class="value">nan</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('min_frequency',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=min_frequency,-int%20or%20float%2C%20default%3DNone">
                min_frequency
                <span class="param-doc-description">min_frequency: int or float, default=None<br><br>Specifies the minimum frequency below which a category will be<br>considered infrequent.<br><br>- If `int`, categories with a smaller cardinality will be considered<br>  infrequent.<br><br>- If `float`, categories with a smaller cardinality than<br>  `min_frequency * n_samples`  will be considered infrequent.<br><br>.. versionadded:: 1.3<br>    Read more in the :ref:`User Guide <encoder_infrequent_categories>`.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_categories',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#:~:text=max_categories,-int%2C%20default%3DNone">
                max_categories
                <span class="param-doc-description">max_categories: int, default=None<br><br>Specifies an upper limit to the number of output categories for each input<br>feature when considering infrequent categories. If there are infrequent<br>categories, `max_categories` includes the category representing the<br>infrequent categories along with the frequent categories. If `None`,<br>there is no limit to the number of output features.<br><br>`max_categories` do **not** take into account missing or unknown<br>categories. Setting `unknown_value` or `encoded_missing_value` to an<br>integer will increase the number of unique integer codes by one each.<br>This can result in up to `max_categories + 2` integer codes.<br><br>.. versionadded:: 1.3<br>    Read more in the :ref:`User Guide <encoder_infrequent_categories>`.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div></div></div><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-354" type="checkbox" ><label for="sk-estimator-id-354" class="sk-toggleable__label  sk-toggleable__label-arrow"><div><div>HistGradientBoostingRegressor</div></div><div><a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html">?<span>Documentation for HistGradientBoostingRegressor</span></a></div></label><div class="sk-toggleable__content " data-param-prefix="histgradientboostingregressor__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('loss',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=loss,-%7B%27squared_error%27%2C%20%27absolute_error%27%2C%20%27gamma%27%2C%20%27poisson%27%2C%20%27quantile%27%7D%2C%20%20%20%20%20%20%20%20%20%20%20%20%20default%3D%27squared_error%27">
                loss
                <span class="param-doc-description">loss: {'squared_error', 'absolute_error', 'gamma', 'poisson', 'quantile'},             default='squared_error'<br><br>The loss function to use in the boosting process. Note that the<br>"squared error", "gamma" and "poisson" losses actually implement<br>"half least squares loss", "half gamma deviance" and "half poisson<br>deviance" to simplify the computation of the gradient. Furthermore,<br>"gamma" and "poisson" losses internally use a log-link, "gamma"<br>requires ``y > 0`` and "poisson" requires ``y >= 0``.<br>"quantile" uses the pinball loss.<br><br>.. versionchanged:: 0.23<br>   Added option 'poisson'.<br><br>.. versionchanged:: 1.1<br>   Added option 'quantile'.<br><br>.. versionchanged:: 1.3<br>   Added option 'gamma'.</span>
            </a>
        </td>
                <td class="value">&#x27;squared_error&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('quantile',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=quantile,-float%2C%20default%3DNone">
                quantile
                <span class="param-doc-description">quantile: float, default=None<br><br>If loss is "quantile", this parameter specifies which quantile to be estimated<br>and must be between 0 and 1.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('learning_rate',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=learning_rate,-float%2C%20default%3D0.1">
                learning_rate
                <span class="param-doc-description">learning_rate: float, default=0.1<br><br>The learning rate, also known as *shrinkage*. This is used as a<br>multiplicative factor for the leaves values. Use ``1`` for no<br>shrinkage.</span>
            </a>
        </td>
                <td class="value">0.1</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_iter',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=max_iter,-int%2C%20default%3D100">
                max_iter
                <span class="param-doc-description">max_iter: int, default=100<br><br>The maximum number of iterations of the boosting process, i.e. the<br>maximum number of trees.</span>
            </a>
        </td>
                <td class="value">20</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_leaf_nodes',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=max_leaf_nodes,-int%20or%20None%2C%20default%3D31">
                max_leaf_nodes
                <span class="param-doc-description">max_leaf_nodes: int or None, default=31<br><br>The maximum number of leaves for each tree. Must be strictly greater<br>than 1. If None, there is no maximum limit.</span>
            </a>
        </td>
                <td class="value">31</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_depth',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=max_depth,-int%20or%20None%2C%20default%3DNone">
                max_depth
                <span class="param-doc-description">max_depth: int or None, default=None<br><br>The maximum depth of each tree. The depth of a tree is the number of<br>edges to go from the root to the deepest leaf.<br>Depth isn't constrained by default.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('min_samples_leaf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=min_samples_leaf,-int%2C%20default%3D20">
                min_samples_leaf
                <span class="param-doc-description">min_samples_leaf: int, default=20<br><br>The minimum number of samples per leaf. For small datasets with less<br>than a few hundred samples, it is recommended to lower this value<br>since only very shallow trees would be built.</span>
            </a>
        </td>
                <td class="value">20</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('l2_regularization',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=l2_regularization,-float%2C%20default%3D0">
                l2_regularization
                <span class="param-doc-description">l2_regularization: float, default=0<br><br>The L2 regularization parameter penalizing leaves with small hessians.<br>Use ``0`` for no regularization (default).</span>
            </a>
        </td>
                <td class="value">0.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_features',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=max_features,-float%2C%20default%3D1.0">
                max_features
                <span class="param-doc-description">max_features: float, default=1.0<br><br>Proportion of randomly chosen features in each and every node split.<br>This is a form of regularization, smaller values make the trees weaker<br>learners and might prevent overfitting.<br>If interaction constraints from `interaction_cst` are present, only allowed<br>features are taken into account for the subsampling.<br><br>.. versionadded:: 1.4</span>
            </a>
        </td>
                <td class="value">1.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_bins',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=max_bins,-int%2C%20default%3D255">
                max_bins
                <span class="param-doc-description">max_bins: int, default=255<br><br>The maximum number of bins to use for non-missing values. Before<br>training, each feature of the input array `X` is binned into<br>integer-valued bins, which allows for a much faster training stage.<br>Features with a small number of unique values may use less than<br>``max_bins`` bins. In addition to the ``max_bins`` bins, one more bin<br>is always reserved for missing values. Must be no larger than 255.</span>
            </a>
        </td>
                <td class="value">255</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('categorical_features',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=categorical_features,-array-like%20of%20%7Bbool%2C%20int%2C%20str%7D%20of%20shape%20%28n_features%29%20%20%20%20%20%20%20%20%20%20%20%20%20or%20shape%20%28n_categorical_features%2C%29%2C%20default%3D%27from_dtype%27">
                categorical_features
                <span class="param-doc-description">categorical_features: array-like of {bool, int, str} of shape (n_features)             or shape (n_categorical_features,), default='from_dtype'<br><br>Indicates the categorical features.<br><br>- None : no feature will be considered categorical.<br>- boolean array-like : boolean mask indicating categorical features.<br>- integer array-like : integer indices indicating categorical<br>  features.<br>- str array-like: names of categorical features (assuming the training<br>  data has feature names).<br>- `"from_dtype"`: dataframe columns with dtype "category" are<br>  considered to be categorical features. The input must be an object<br>  exposing a ``__dataframe__`` method such as pandas or polars<br>  DataFrames to use this feature.<br><br>For each categorical feature, there must be at most `max_bins` unique<br>categories. Negative values for categorical features encoded as numeric<br>dtypes are treated as missing values. All categorical values are<br>converted to floating point numbers. This means that categorical values<br>of 1.0 and 1 are treated as the same category.<br><br>Read more in the :ref:`User Guide <categorical_support_gbdt>` and<br>:ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py`.<br><br>.. versionadded:: 0.24<br><br>.. versionchanged:: 1.2<br>   Added support for feature names.<br><br>.. versionchanged:: 1.4<br>   Added `"from_dtype"` option.<br><br>.. versionchanged:: 1.6<br>   The default value changed from `None` to `"from_dtype"`.</span>
            </a>
        </td>
                <td class="value">Index([&#x27;count...type=&#x27;object&#x27;)</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('monotonic_cst',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=monotonic_cst,-array-like%20of%20int%20of%20shape%20%28n_features%29%20or%20dict%2C%20default%3DNone">
                monotonic_cst
                <span class="param-doc-description">monotonic_cst: array-like of int of shape (n_features) or dict, default=None<br><br>Monotonic constraint to enforce on each feature are specified using the<br>following integer values:<br><br>- 1: monotonic increase<br>- 0: no constraint<br>- -1: monotonic decrease<br><br>If a dict with str keys, map feature to monotonic constraints by name.<br>If an array, the features are mapped to constraints by position. See<br>:ref:`monotonic_cst_features_names` for a usage example.<br><br>Read more in the :ref:`User Guide <monotonic_cst_gbdt>`.<br><br>.. versionadded:: 0.23<br><br>.. versionchanged:: 1.2<br>   Accept dict of constraints with feature names as keys.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('interaction_cst',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=interaction_cst,-%7B%22pairwise%22%2C%20%22no_interactions%22%7D%20or%20sequence%20of%20lists/tuples/sets%20%20%20%20%20%20%20%20%20%20%20%20%20of%20int%2C%20default%3DNone">
                interaction_cst
                <span class="param-doc-description">interaction_cst: {"pairwise", "no_interactions"} or sequence of lists/tuples/sets             of int, default=None<br><br>Specify interaction constraints, the sets of features which can<br>interact with each other in child node splits.<br><br>Each item specifies the set of feature indices that are allowed<br>to interact with each other. If there are more features than<br>specified in these constraints, they are treated as if they were<br>specified as an additional set.<br><br>The strings "pairwise" and "no_interactions" are shorthands for<br>allowing only pairwise or no interactions, respectively.<br><br>For instance, with 5 features in total, `interaction_cst=[{0, 1}]`<br>is equivalent to `interaction_cst=[{0, 1}, {2, 3, 4}]`,<br>and specifies that each branch of a tree will either only split<br>on features 0 and 1 or only split on features 2, 3 and 4.<br><br>See :ref:`this example<ice-vs-pdp>` on how to use `interaction_cst`.<br><br>.. versionadded:: 1.2</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('warm_start',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=warm_start,-bool%2C%20default%3DFalse">
                warm_start
                <span class="param-doc-description">warm_start: bool, default=False<br><br>When set to ``True``, reuse the solution of the previous call to fit<br>and add more estimators to the ensemble. For results to be valid, the<br>estimator should be re-trained on the same data only.<br>See :term:`the Glossary <warm_start>`.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('early_stopping',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=early_stopping,-%27auto%27%20or%20bool%2C%20default%3D%27auto%27">
                early_stopping
                <span class="param-doc-description">early_stopping: 'auto' or bool, default='auto'<br><br>If 'auto', early stopping is enabled if the sample size is larger than<br>10000 or if `X_val` and `y_val` are passed to `fit`. If True, early stopping<br>is enabled, otherwise early stopping is disabled.<br><br>.. versionadded:: 0.23</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('scoring',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=scoring,-str%20or%20callable%20or%20None%2C%20default%3D%27loss%27">
                scoring
                <span class="param-doc-description">scoring: str or callable or None, default='loss'<br><br>Scoring method to use for early stopping. Only used if `early_stopping`<br>is enabled. Options:<br><br>- str: see :ref:`scoring_string_names` for options.<br>- callable: a scorer callable object (e.g., function) with signature<br>  ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details.<br>- `None`: the :ref:`coefficient of determination <r2_score>`<br>  (:math:`R^2`) is used.<br>- 'loss': early stopping is checked w.r.t the loss value.</span>
            </a>
        </td>
                <td class="value">&#x27;loss&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('validation_fraction',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=validation_fraction,-int%20or%20float%20or%20None%2C%20default%3D0.1">
                validation_fraction
                <span class="param-doc-description">validation_fraction: int or float or None, default=0.1<br><br>Proportion (or absolute size) of training data to set aside as<br>validation data for early stopping. If None, early stopping is done on<br>the training data.<br>The value is ignored if either early stopping is not performed, e.g.<br>`early_stopping=False`, or if `X_val` and `y_val` are passed to fit.</span>
            </a>
        </td>
                <td class="value">0.1</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_iter_no_change',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=n_iter_no_change,-int%2C%20default%3D10">
                n_iter_no_change
                <span class="param-doc-description">n_iter_no_change: int, default=10<br><br>Used to determine when to "early stop". The fitting process is<br>stopped when none of the last ``n_iter_no_change`` scores are better<br>than the ``n_iter_no_change - 1`` -th-to-last one, up to some<br>tolerance. Only used if early stopping is performed.</span>
            </a>
        </td>
                <td class="value">10</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('tol',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=tol,-float%2C%20default%3D1e-7">
                tol
                <span class="param-doc-description">tol: float, default=1e-7<br><br>The absolute tolerance to use when comparing scores during early<br>stopping. The higher the tolerance, the more likely we are to early<br>stop: higher tolerance means that it will be harder for subsequent<br>iterations to be considered an improvement upon the reference score.</span>
            </a>
        </td>
                <td class="value">1e-07</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=verbose,-int%2C%20default%3D0">
                verbose
                <span class="param-doc-description">verbose: int, default=0<br><br>The verbosity level. If not zero, print some information about the<br>fitting process. ``1`` prints only summary info, ``2`` prints info per<br>iteration.</span>
            </a>
        </td>
                <td class="value">0</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('random_state',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html#:~:text=random_state,-int%2C%20RandomState%20instance%20or%20None%2C%20default%3DNone">
                random_state
                <span class="param-doc-description">random_state: int, RandomState instance or None, default=None<br><br>Pseudo-random number generator to control the subsampling in the<br>binning process, and the train/validation data split if early stopping<br>is enabled.<br>Pass an int for reproducible output across multiple function calls.<br>See :term:`Glossary <random_state>`.</span>
            </a>
        </td>
                <td class="value">0</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div></div><script>function copyToClipboard(text, element) {
        // Get the parameter prefix from the closest toggleable content
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const fullParamName = paramPrefix ? `${paramPrefix}${text}` : text;

        const originalStyle = element.style;
        const computedStyle = window.getComputedStyle(element);
        const originalWidth = computedStyle.width;
        const originalHTML = element.innerHTML.replace('Copied!', '');

        navigator.clipboard.writeText(fullParamName)
            .then(() => {
                element.style.width = originalWidth;
                element.style.color = 'green';
                element.innerHTML = "Copied!";

                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            })
            .catch(err => {
                console.error('Failed to copy:', err);
                element.style.color = 'red';
                element.innerHTML = "Failed!";
                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            });
        return false;
    }

    document.querySelectorAll('.copy-paste-icon').forEach(function(element) {
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const paramName = element.parentElement.nextElementSibling
            .textContent.trim().split(' ')[0];
        const fullParamName = paramPrefix ? `${paramPrefix}${paramName}` : paramName;

        element.setAttribute('title', fullParamName);
    });


    /**
     * Adapted from Skrub
     * https://github.com/skrub-data/skrub/blob/403466d1d5d4dc76a7ef569b3f8228db59a31dc3/skrub/_reporting/_data/templates/report.js#L789
     * @returns "light" or "dark"
     */
    function detectTheme(element) {
        const body = document.querySelector('body');

        // Check VSCode theme
        const themeKindAttr = body.getAttribute('data-vscode-theme-kind');
        const themeNameAttr = body.getAttribute('data-vscode-theme-name');

        if (themeKindAttr && themeNameAttr) {
            const themeKind = themeKindAttr.toLowerCase();
            const themeName = themeNameAttr.toLowerCase();

            if (themeKind.includes("dark") || themeName.includes("dark")) {
                return "dark";
            }
            if (themeKind.includes("light") || themeName.includes("light")) {
                return "light";
            }
        }

        // Check Jupyter theme
        if (body.getAttribute('data-jp-theme-light') === 'false') {
            return 'dark';
        } else if (body.getAttribute('data-jp-theme-light') === 'true') {
            return 'light';
        }

        // Guess based on a parent element's color
        const color = window.getComputedStyle(element.parentNode, null).getPropertyValue('color');
        const match = color.match(/^rgb\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)\s*$/i);
        if (match) {
            const [r, g, b] = [
                parseFloat(match[1]),
                parseFloat(match[2]),
                parseFloat(match[3])
            ];

            // https://en.wikipedia.org/wiki/HSL_and_HSV#Lightness
            const luma = 0.299 * r + 0.587 * g + 0.114 * b;

            if (luma > 180) {
                // If the text is very bright we have a dark theme
                return 'dark';
            }
            if (luma < 75) {
                // If the text is very dark we have a light theme
                return 'light';
            }
            // Otherwise fall back to the next heuristic.
        }

        // Fallback to system preference
        return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
    }


    function forceTheme(elementId) {
        const estimatorElement = document.querySelector(`#${elementId}`);
        if (estimatorElement === null) {
            console.error(`Element with id ${elementId} not found.`);
        } else {
            const theme = detectTheme(estimatorElement);
            estimatorElement.classList.add(theme);
        }
    }

    forceTheme('sk-container-id-80');</script></body>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 165-166

Finally, we evaluate the pipeline using cross validation and record the results:

.. GENERATED FROM PYTHON SOURCE LINES 166-168

.. code-block:: Python

    evaluate_model_and_store("mixed_target", mixed_pipe)








.. GENERATED FROM PYTHON SOURCE LINES 169-172

Plotting the Results
====================
In this section, we display the results by plotting the test and train scores:

.. GENERATED FROM PYTHON SOURCE LINES 172-204

.. code-block:: Python

    import matplotlib.pyplot as plt
    import pandas as pd

    results_df = (
        pd.DataFrame(results).set_index("preprocessor").sort_values("rmse_test_mean")
    )

    fig, (ax1, ax2) = plt.subplots(
        1, 2, figsize=(12, 8), sharey=True, constrained_layout=True
    )
    xticks = range(len(results_df))
    name_to_color = dict(
        zip((r["preprocessor"] for r in results), ["C0", "C1", "C2", "C3", "C4"])
    )

    for subset, ax in zip(["test", "train"], [ax1, ax2]):
        mean, std = f"rmse_{subset}_mean", f"rmse_{subset}_std"
        data = results_df[[mean, std]].sort_values(mean)
        ax.bar(
            x=xticks,
            height=data[mean],
            yerr=data[std],
            width=0.9,
            color=[name_to_color[name] for name in data.index],
        )
        ax.set(
            title=f"RMSE ({subset.title()})",
            xlabel="Encoding Scheme",
            xticks=xticks,
            xticklabels=data.index,
        )




.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_target_encoder_002.png
   :alt: RMSE (Test), RMSE (Train)
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_target_encoder_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 205-229

When evaluating the predictive performance on the test set, dropping the
categories perform the worst and the target encoders performs the best. This
can be explained as follows:

- Dropping the categorical features makes the pipeline less expressive and
  underfitting as a result;
- Due to the high cardinality and to reduce the training time, the one-hot
  encoding scheme uses `max_categories=20` which prevents the features from
  expanding too much, which can result in underfitting.
- If we had not set `max_categories=20`, the one-hot encoding scheme would have
  likely made the pipeline overfitting as the number of features explodes with rare
  category occurrences that are correlated with the target by chance (on the training
  set only);
- The ordinal encoding imposes an arbitrary order to the features which are then
  treated as numerical values by the
  :class:`~sklearn.ensemble.HistGradientBoostingRegressor`. Since this
  model groups numerical features in 256 bins per feature, many unrelated categories
  can be grouped together and as a result overall pipeline can underfit;
- When using the target encoder, the same binning happens, but since the encoded
  values are statistically ordered by marginal association with the target variable,
  the binning use by the :class:`~sklearn.ensemble.HistGradientBoostingRegressor`
  makes sense and leads to good results: the combination of smoothed target
  encoding and binning works as a good regularizing strategy against
  overfitting while not limiting the expressiveness of the pipeline too much.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 21.865 seconds)


.. _sphx_glr_download_auto_examples_preprocessing_plot_target_encoder.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/preprocessing/plot_target_encoder.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/preprocessing/plot_target_encoder.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_target_encoder.ipynb <plot_target_encoder.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_target_encoder.py <plot_target_encoder.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_target_encoder.zip <plot_target_encoder.zip>`


.. include:: plot_target_encoder.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
