
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_grid_search_refit_callable.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_grid_search_refit_callable.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_grid_search_refit_callable.py:


==================================================
Balance model complexity and cross-validated score
==================================================

This example demonstrates how to balance model complexity and cross-validated score by
finding a decent accuracy within 1 standard deviation of the best accuracy score while
minimising the number of :class:`~sklearn.decomposition.PCA` components [1]_. It uses
:class:`~sklearn.model_selection.GridSearchCV` with a custom refit callable to select
the optimal model.

The figure shows the trade-off between cross-validated score and the number
of PCA components. The balanced case is when `n_components=10` and `accuracy=0.88`,
which falls into the range within 1 standard deviation of the best accuracy
score.

References
----------
.. [1] Hastie, T., Tibshirani, R., Friedman, J. (2001). Model Assessment and
   Selection. The Elements of Statistical Learning (pp. 219-260). New York,
   NY, USA: Springer New York Inc.

.. GENERATED FROM PYTHON SOURCE LINES 23-37

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause

    import matplotlib.pyplot as plt
    import numpy as np
    import polars as pl

    from sklearn.datasets import load_digits
    from sklearn.decomposition import PCA
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import GridSearchCV, ShuffleSplit
    from sklearn.pipeline import Pipeline








.. GENERATED FROM PYTHON SOURCE LINES 38-46

Introduction
------------

When tuning hyperparameters, we often want to balance model complexity and
performance. The "one-standard-error" rule is a common approach: select the simplest
model whose performance is within one standard error of the best model's performance.
This helps to avoid overfitting by preferring simpler models when their performance is
statistically comparable to more complex ones.

.. GENERATED FROM PYTHON SOURCE LINES 48-58

Helper functions
----------------

We define two helper functions:

1. `lower_bound`: Calculates the threshold for acceptable performance
   (best score - 1 std)

2. `best_low_complexity`: Selects the model with the fewest PCA components that
   exceeds this threshold

.. GENERATED FROM PYTHON SOURCE LINES 58-108

.. code-block:: Python



    def lower_bound(cv_results):
        """
        Calculate the lower bound within 1 standard deviation
        of the best `mean_test_scores`.

        Parameters
        ----------
        cv_results : dict of numpy(masked) ndarrays
            See attribute cv_results_ of `GridSearchCV`

        Returns
        -------
        float
            Lower bound within 1 standard deviation of the
            best `mean_test_score`.
        """
        best_score_idx = np.argmax(cv_results["mean_test_score"])

        return (
            cv_results["mean_test_score"][best_score_idx]
            - cv_results["std_test_score"][best_score_idx]
        )


    def best_low_complexity(cv_results):
        """
        Balance model complexity with cross-validated score.

        Parameters
        ----------
        cv_results : dict of numpy(masked) ndarrays
            See attribute cv_results_ of `GridSearchCV`.

        Return
        ------
        int
            Index of a model that has the fewest PCA components
            while has its test score within 1 standard deviation of the best
            `mean_test_score`.
        """
        threshold = lower_bound(cv_results)
        candidate_idx = np.flatnonzero(cv_results["mean_test_score"] >= threshold)
        best_idx = candidate_idx[
            cv_results["param_reduce_dim__n_components"][candidate_idx].argmin()
        ]
        return best_idx









.. GENERATED FROM PYTHON SOURCE LINES 109-119

Set up the pipeline and parameter grid
--------------------------------------

We create a pipeline with two steps:

1. Dimensionality reduction using PCA

2. Classification using LogisticRegression

We'll search over different numbers of PCA components to find the optimal complexity.

.. GENERATED FROM PYTHON SOURCE LINES 119-129

.. code-block:: Python


    pipe = Pipeline(
        [
            ("reduce_dim", PCA(random_state=42)),
            ("classify", LogisticRegression(random_state=42, C=0.01, max_iter=1000)),
        ]
    )

    param_grid = {"reduce_dim__n_components": [6, 8, 10, 15, 20, 25, 35, 45, 55]}








.. GENERATED FROM PYTHON SOURCE LINES 130-136

Perform the search with GridSearchCV
------------------------------------

We use `GridSearchCV` with our custom `best_low_complexity` function as the refit
parameter. This function will select the model with the fewest PCA components that
still performs within one standard deviation of the best model.

.. GENERATED FROM PYTHON SOURCE LINES 136-149

.. code-block:: Python


    grid = GridSearchCV(
        pipe,
        # Use a non-stratified CV strategy to make sure that the inter-fold
        # standard deviation of the test scores is informative.
        cv=ShuffleSplit(n_splits=30, random_state=0),
        n_jobs=1,  # increase this on your machine to use more physical cores
        param_grid=param_grid,
        scoring="accuracy",
        refit=best_low_complexity,
        return_train_score=True,
    )








.. GENERATED FROM PYTHON SOURCE LINES 150-152

Load the digits dataset and fit the model
-----------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 152-156

.. code-block:: Python


    X, y = load_digits(return_X_y=True)
    grid.fit(X, y)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-64 {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: #000;
      --sklearn-color-text-muted: #666;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;
    }

    #sk-container-id-64.light {
      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: black;
      --sklearn-color-background: white;
      --sklearn-color-border-box: black;
      --sklearn-color-icon: #696969;
    }

    #sk-container-id-64.dark {
      --sklearn-color-text-on-default-background: white;
      --sklearn-color-background: #111;
      --sklearn-color-border-box: white;
      --sklearn-color-icon: #878787;
    }

    #sk-container-id-64 {
      color: var(--sklearn-color-text);
    }

    #sk-container-id-64 pre {
      padding: 0;
    }

    #sk-container-id-64 input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    #sk-container-id-64 div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    #sk-container-id-64 div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    #sk-container-id-64 div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    #sk-container-id-64 div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    #sk-container-id-64 div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    #sk-container-id-64 div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    #sk-container-id-64 div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    #sk-container-id-64 div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    #sk-container-id-64 div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    #sk-container-id-64 div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    #sk-container-id-64 div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    #sk-container-id-64 label.sk-toggleable__label {
      cursor: pointer;
      display: flex;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
      align-items: center;
      justify-content: center;
      gap: 0.5em;
    }

    #sk-container-id-64 label.sk-toggleable__label .caption {
      font-size: 0.6rem;
      font-weight: lighter;
      color: var(--sklearn-color-text-muted);
    }

    #sk-container-id-64 label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    #sk-container-id-64 label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    #sk-container-id-64 div.sk-toggleable__content {
      display: none;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-64 div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-64 div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-64 div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-64 input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      display: block;
      width: 100%;
      overflow: visible;
    }

    #sk-container-id-64 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    #sk-container-id-64 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-64 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    #sk-container-id-64 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-64 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    #sk-container-id-64 div.sk-label label.sk-toggleable__label,
    #sk-container-id-64 div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    #sk-container-id-64 div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    #sk-container-id-64 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    #sk-container-id-64 div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      line-height: 1.2em;
    }

    #sk-container-id-64 div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    #sk-container-id-64 div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-64 div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    #sk-container-id-64 div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-64 div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 0.5em;
      text-align: center;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-3) 1pt solid;
      color: var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3) 1pt solid;
      color: var(--sklearn-color-fitted-level-3);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-unfitted-level-0);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-fitted-level-0);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    #sk-container-id-64 a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    #sk-container-id-64 a.estimator_doc_link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    #sk-container-id-64 a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    #sk-container-id-64 a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }

    .estimator-table {
        font-family: monospace;
    }

    .estimator-table summary {
        padding: .5rem;
        cursor: pointer;
    }

    .estimator-table summary::marker {
        font-size: 0.7rem;
    }

    .estimator-table details[open] {
        padding-left: 0.1rem;
        padding-right: 0.1rem;
        padding-bottom: 0.3rem;
    }

    .estimator-table .parameters-table {
        margin-left: auto !important;
        margin-right: auto !important;
        margin-top: 0;
    }

    .estimator-table .parameters-table tr:nth-child(odd) {
        background-color: #fff;
    }

    .estimator-table .parameters-table tr:nth-child(even) {
        background-color: #f6f6f6;
    }

    .estimator-table .parameters-table tr:hover {
        background-color: #e0e0e0;
    }

    .estimator-table table td {
        border: 1px solid rgba(106, 105, 104, 0.232);
    }

    /*
        `table td`is set in notebook with right text-align.
        We need to overwrite it.
    */
    .estimator-table table td.param {
        text-align: left;
        position: relative;
        padding: 0;
    }

    .user-set td {
        color:rgb(255, 94, 0);
        text-align: left !important;
    }

    .user-set td.value {
        color:rgb(255, 94, 0);
        background-color: transparent;
    }

    .default td {
        color: black;
        text-align: left !important;
    }

    .user-set td i,
    .default td i {
        color: black;
    }

    /*
        Styles for parameter documentation links
        We need styling for visited so jupyter doesn't overwrite it
    */
    a.param-doc-link,
    a.param-doc-link:link,
    a.param-doc-link:visited {
        text-decoration: underline dashed;
        text-underline-offset: .3em;
        color: inherit;
        display: block;
        padding: .5em;
    }

    /* "hack" to make the entire area of the cell containing the link clickable */
    a.param-doc-link::before {
        position: absolute;
        content: "";
        inset: 0;
    }

    .param-doc-description {
        display: none;
        position: absolute;
        z-index: 9999;
        left: 0;
        padding: .5ex;
        margin-left: 1.5em;
        color: var(--sklearn-color-text);
        box-shadow: .3em .3em .4em #999;
        width: max-content;
        text-align: left;
        max-height: 10em;
        overflow-y: auto;

        /* unfitted */
        background: var(--sklearn-color-unfitted-level-0);
        border: thin solid var(--sklearn-color-unfitted-level-3);
    }

    /* Fitted state for parameter tooltips */
    .fitted .param-doc-description {
        /* fitted */
        background: var(--sklearn-color-fitted-level-0);
        border: thin solid var(--sklearn-color-fitted-level-3);
    }

    .param-doc-link:hover .param-doc-description {
        display: block;
    }

    .copy-paste-icon {
        background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0NDggNTEyIj48IS0tIUZvbnQgQXdlc29tZSBGcmVlIDYuNy4yIGJ5IEBmb250YXdlc29tZSAtIGh0dHBzOi8vZm9udGF3ZXNvbWUuY29tIExpY2Vuc2UgLSBodHRwczovL2ZvbnRhd2Vzb21lLmNvbS9saWNlbnNlL2ZyZWUgQ29weXJpZ2h0IDIwMjUgRm9udGljb25zLCBJbmMuLS0+PHBhdGggZD0iTTIwOCAwTDMzMi4xIDBjMTIuNyAwIDI0LjkgNS4xIDMzLjkgMTQuMWw2Ny45IDY3LjljOSA5IDE0LjEgMjEuMiAxNC4xIDMzLjlMNDQ4IDMzNmMwIDI2LjUtMjEuNSA0OC00OCA0OGwtMTkyIDBjLTI2LjUgMC00OC0yMS41LTQ4LTQ4bDAtMjg4YzAtMjYuNSAyMS41LTQ4IDQ4LTQ4ek00OCAxMjhsODAgMCAwIDY0LTY0IDAgMCAyNTYgMTkyIDAgMC0zMiA2NCAwIDAgNDhjMCAyNi41LTIxLjUgNDgtNDggNDhMNDggNTEyYy0yNi41IDAtNDgtMjEuNS00OC00OEwwIDE3NmMwLTI2LjUgMjEuNS00OCA0OC00OHoiLz48L3N2Zz4=);
        background-repeat: no-repeat;
        background-size: 14px 14px;
        background-position: 0;
        display: inline-block;
        width: 14px;
        height: 14px;
        cursor: pointer;
    }
    </style><body><div id="sk-container-id-64" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>GridSearchCV(cv=ShuffleSplit(n_splits=30, random_state=0, test_size=None, train_size=None),
                 estimator=Pipeline(steps=[(&#x27;reduce_dim&#x27;, PCA(random_state=42)),
                                           (&#x27;classify&#x27;,
                                            LogisticRegression(C=0.01,
                                                               max_iter=1000,
                                                               random_state=42))]),
                 n_jobs=1,
                 param_grid={&#x27;reduce_dim__n_components&#x27;: [6, 8, 10, 15, 20, 25, 35,
                                                          45, 55]},
                 refit=&lt;function best_low_complexity at 0x7fe86c9c3f60&gt;,
                 return_train_score=True, scoring=&#x27;accuracy&#x27;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-278" type="checkbox" ><label for="sk-estimator-id-278" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>GridSearchCV</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html">?<span>Documentation for GridSearchCV</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></div></label><div class="sk-toggleable__content fitted" data-param-prefix="">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('estimator',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=estimator,-estimator%20object">
                estimator
                <span class="param-doc-description">estimator: estimator object<br><br>This is assumed to implement the scikit-learn estimator interface.<br>Either estimator needs to provide a ``score`` function,<br>or ``scoring`` must be passed.</span>
            </a>
        </td>
                <td class="value">Pipeline(step...m_state=42))])</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('param_grid',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=param_grid,-dict%20or%20list%20of%20dictionaries">
                param_grid
                <span class="param-doc-description">param_grid: dict or list of dictionaries<br><br>Dictionary with parameters names (`str`) as keys and lists of<br>parameter settings to try as values, or a list of such<br>dictionaries, in which case the grids spanned by each dictionary<br>in the list are explored. This enables searching over any sequence<br>of parameter settings.</span>
            </a>
        </td>
                <td class="value">{&#x27;reduce_dim__n_components&#x27;: [6, 8, ...]}</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('scoring',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=scoring,-str%2C%20callable%2C%20list%2C%20tuple%20or%20dict%2C%20default%3DNone">
                scoring
                <span class="param-doc-description">scoring: str, callable, list, tuple or dict, default=None<br><br>Strategy to evaluate the performance of the cross-validated model on<br>the test set.<br><br>If `scoring` represents a single score, one can use:<br><br>- a single string (see :ref:`scoring_string_names`);<br>- a callable (see :ref:`scoring_callable`) that returns a single value;<br>- `None`, the `estimator`'s<br>  :ref:`default evaluation criterion <scoring_api_overview>` is used.<br><br>If `scoring` represents multiple scores, one can use:<br><br>- a list or tuple of unique strings;<br>- a callable returning a dictionary where the keys are the metric<br>  names and the values are the metric scores;<br>- a dictionary with metric names as keys and callables as values.<br><br>See :ref:`multimetric_grid_search` for an example.</span>
            </a>
        </td>
                <td class="value">&#x27;accuracy&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_jobs',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=n_jobs,-int%2C%20default%3DNone">
                n_jobs
                <span class="param-doc-description">n_jobs: int, default=None<br><br>Number of jobs to run in parallel.<br>``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.<br>``-1`` means using all processors. See :term:`Glossary <n_jobs>`<br>for more details.<br><br>.. versionchanged:: v0.20<br>   `n_jobs` default changed from 1 to None</span>
            </a>
        </td>
                <td class="value">1</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('refit',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=refit,-bool%2C%20str%2C%20or%20callable%2C%20default%3DTrue">
                refit
                <span class="param-doc-description">refit: bool, str, or callable, default=True<br><br>Refit an estimator using the best found parameters on the whole<br>dataset.<br><br>For multiple metric evaluation, this needs to be a `str` denoting the<br>scorer that would be used to find the best parameters for refitting<br>the estimator at the end.<br><br>Where there are considerations other than maximum score in<br>choosing a best estimator, ``refit`` can be set to a function which<br>returns the selected ``best_index_`` given ``cv_results_``. In that<br>case, the ``best_estimator_`` and ``best_params_`` will be set<br>according to the returned ``best_index_`` while the ``best_score_``<br>attribute will not be available.<br><br>The refitted estimator is made available at the ``best_estimator_``<br>attribute and permits using ``predict`` directly on this<br>``GridSearchCV`` instance.<br><br>Also for multiple metric evaluation, the attributes ``best_index_``,<br>``best_score_`` and ``best_params_`` will only be available if<br>``refit`` is set and all of them will be determined w.r.t this specific<br>scorer.<br><br>See ``scoring`` parameter to know more about multiple metric<br>evaluation.<br><br>See :ref:`sphx_glr_auto_examples_model_selection_plot_grid_search_digits.py`<br>to see how to design a custom selection strategy using a callable<br>via `refit`.<br><br>See :ref:`this example<br><sphx_glr_auto_examples_model_selection_plot_grid_search_refit_callable.py>`<br>for an example of how to use ``refit=callable`` to balance model<br>complexity and cross-validated score.<br><br>.. versionchanged:: 0.20<br>    Support for callable added.</span>
            </a>
        </td>
                <td class="value">&lt;function bes...x7fe86c9c3f60&gt;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('cv',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=cv,-int%2C%20cross-validation%20generator%20or%20an%20iterable%2C%20default%3DNone">
                cv
                <span class="param-doc-description">cv: int, cross-validation generator or an iterable, default=None<br><br>Determines the cross-validation splitting strategy.<br>Possible inputs for cv are:<br><br>- None, to use the default 5-fold cross validation,<br>- integer, to specify the number of folds in a `(Stratified)KFold`,<br>- :term:`CV splitter`,<br>- An iterable yielding (train, test) splits as arrays of indices.<br><br>For integer/None inputs, if the estimator is a classifier and ``y`` is<br>either binary or multiclass, :class:`StratifiedKFold` is used. In all<br>other cases, :class:`KFold` is used. These splitters are instantiated<br>with `shuffle=False` so the splits will be the same across calls.<br><br>Refer :ref:`User Guide <cross_validation>` for the various<br>cross-validation strategies that can be used here.<br><br>.. versionchanged:: 0.22<br>    ``cv`` default value if None changed from 3-fold to 5-fold.</span>
            </a>
        </td>
                <td class="value">ShuffleSplit(...ain_size=None)</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=verbose,-int">
                verbose
                <span class="param-doc-description">verbose: int<br><br>Controls the verbosity: the higher, the more messages.<br><br>- >1 : the computation time for each fold and parameter candidate is<br>  displayed;<br>- >2 : the score is also displayed;<br>- >3 : the fold and candidate parameter indexes are also displayed<br>  together with the starting time of the computation.</span>
            </a>
        </td>
                <td class="value">0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('pre_dispatch',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=pre_dispatch,-int%2C%20or%20str%2C%20default%3D%272%2An_jobs%27">
                pre_dispatch
                <span class="param-doc-description">pre_dispatch: int, or str, default='2*n_jobs'<br><br>Controls the number of jobs that get dispatched during parallel<br>execution. Reducing this number can be useful to avoid an<br>explosion of memory consumption when more jobs get dispatched<br>than CPUs can process. This parameter can be:<br><br>- None, in which case all the jobs are immediately created and spawned. Use<br>  this for lightweight and fast-running jobs, to avoid delays due to on-demand<br>  spawning of the jobs<br>- An int, giving the exact number of total jobs that are spawned<br>- A str, giving an expression as a function of n_jobs, as in '2*n_jobs'</span>
            </a>
        </td>
                <td class="value">&#x27;2*n_jobs&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('error_score',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=error_score,-%27raise%27%20or%20numeric%2C%20default%3Dnp.nan">
                error_score
                <span class="param-doc-description">error_score: 'raise' or numeric, default=np.nan<br><br>Value to assign to the score if an error occurs in estimator fitting.<br>If set to 'raise', the error is raised. If a numeric value is given,<br>FitFailedWarning is raised. This parameter does not affect the refit<br>step, which will always raise the error.</span>
            </a>
        </td>
                <td class="value">nan</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('return_train_score',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.model_selection.GridSearchCV.html#:~:text=return_train_score,-bool%2C%20default%3DFalse">
                return_train_score
                <span class="param-doc-description">return_train_score: bool, default=False<br><br>If ``False``, the ``cv_results_`` attribute will not include training<br>scores.<br>Computing training scores is used to get insights on how different<br>parameter settings impact the overfitting/underfitting trade-off.<br>However computing the scores on the training set can be computationally<br>expensive and is not strictly required to select the parameters that<br>yield the best generalization performance.<br><br>.. versionadded:: 0.19<br><br>.. versionchanged:: 0.21<br>    Default value was changed from ``True`` to ``False``</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-279" type="checkbox" ><label for="sk-estimator-id-279" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>best_estimator_: Pipeline</div></div></label><div class="sk-toggleable__content fitted" data-param-prefix="best_estimator___"></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-280" type="checkbox" ><label for="sk-estimator-id-280" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>PCA</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html">?<span>Documentation for PCA</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="best_estimator___reduce_dim__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_components',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=n_components,-int%2C%20float%20or%20%27mle%27%2C%20default%3DNone">
                n_components
                <span class="param-doc-description">n_components: int, float or 'mle', default=None<br><br>Number of components to keep.<br>if n_components is not set all components are kept::<br><br>    n_components == min(n_samples, n_features)<br><br>If ``n_components == 'mle'`` and ``svd_solver == 'full'``, Minka's<br>MLE is used to guess the dimension. Use of ``n_components == 'mle'``<br>will interpret ``svd_solver == 'auto'`` as ``svd_solver == 'full'``.<br><br>If ``0 < n_components < 1`` and ``svd_solver == 'full'``, select the<br>number of components such that the amount of variance that needs to be<br>explained is greater than the percentage specified by n_components.<br><br>If ``svd_solver == 'arpack'``, the number of components must be<br>strictly less than the minimum of n_features and n_samples.<br><br>Hence, the None case results in::<br><br>    n_components == min(n_samples, n_features) - 1</span>
            </a>
        </td>
                <td class="value">25</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('copy',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=copy,-bool%2C%20default%3DTrue">
                copy
                <span class="param-doc-description">copy: bool, default=True<br><br>If False, data passed to fit are overwritten and running<br>fit(X).transform(X) will not yield the expected results,<br>use fit_transform(X) instead.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('whiten',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=whiten,-bool%2C%20default%3DFalse">
                whiten
                <span class="param-doc-description">whiten: bool, default=False<br><br>When True (False by default) the `components_` vectors are multiplied<br>by the square root of n_samples and then divided by the singular values<br>to ensure uncorrelated outputs with unit component-wise variances.<br><br>Whitening will remove some information from the transformed signal<br>(the relative variance scales of the components) but can sometime<br>improve the predictive accuracy of the downstream estimators by<br>making their data respect some hard-wired assumptions.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('svd_solver',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=svd_solver,-%7B%27auto%27%2C%20%27full%27%2C%20%27covariance_eigh%27%2C%20%27arpack%27%2C%20%27randomized%27%7D%2C%20%20%20%20%20%20%20%20%20%20%20%20default%3D%27auto%27">
                svd_solver
                <span class="param-doc-description">svd_solver: {'auto', 'full', 'covariance_eigh', 'arpack', 'randomized'},            default='auto'<br><br>"auto" :<br>    The solver is selected by a default 'auto' policy is based on `X.shape` and<br>    `n_components`: if the input data has fewer than 1000 features and<br>    more than 10 times as many samples, then the "covariance_eigh"<br>    solver is used. Otherwise, if the input data is larger than 500x500<br>    and the number of components to extract is lower than 80% of the<br>    smallest dimension of the data, then the more efficient<br>    "randomized" method is selected. Otherwise the exact "full" SVD is<br>    computed and optionally truncated afterwards.<br>"full" :<br>    Run exact full SVD calling the standard LAPACK solver via<br>    `scipy.linalg.svd` and select the components by postprocessing<br>"covariance_eigh" :<br>    Precompute the covariance matrix (on centered data), run a<br>    classical eigenvalue decomposition on the covariance matrix<br>    typically using LAPACK and select the components by postprocessing.<br>    This solver is very efficient for n_samples >> n_features and small<br>    n_features. It is, however, not tractable otherwise for large<br>    n_features (large memory footprint required to materialize the<br>    covariance matrix). Also note that compared to the "full" solver,<br>    this solver effectively doubles the condition number and is<br>    therefore less numerical stable (e.g. on input data with a large<br>    range of singular values).<br>"arpack" :<br>    Run SVD truncated to `n_components` calling ARPACK solver via<br>    `scipy.sparse.linalg.svds`. It requires strictly<br>    `0 < n_components < min(X.shape)`<br>"randomized" :<br>    Run randomized SVD by the method of Halko et al.<br><br>.. versionadded:: 0.18.0<br><br>.. versionchanged:: 1.5<br>    Added the 'covariance_eigh' solver.</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('tol',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=tol,-float%2C%20default%3D0.0">
                tol
                <span class="param-doc-description">tol: float, default=0.0<br><br>Tolerance for singular values computed by svd_solver == 'arpack'.<br>Must be of range [0.0, infinity).<br><br>.. versionadded:: 0.18.0</span>
            </a>
        </td>
                <td class="value">0.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('iterated_power',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=iterated_power,-int%20or%20%27auto%27%2C%20default%3D%27auto%27">
                iterated_power
                <span class="param-doc-description">iterated_power: int or 'auto', default='auto'<br><br>Number of iterations for the power method computed by<br>svd_solver == 'randomized'.<br>Must be of range [0, infinity).<br><br>.. versionadded:: 0.18.0</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_oversamples',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=n_oversamples,-int%2C%20default%3D10">
                n_oversamples
                <span class="param-doc-description">n_oversamples: int, default=10<br><br>This parameter is only relevant when `svd_solver="randomized"`.<br>It corresponds to the additional number of random vectors to sample the<br>range of `X` so as to ensure proper conditioning. See<br>:func:`~sklearn.utils.extmath.randomized_svd` for more details.<br><br>.. versionadded:: 1.1</span>
            </a>
        </td>
                <td class="value">10</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('power_iteration_normalizer',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=power_iteration_normalizer,-%7B%27auto%27%2C%20%27QR%27%2C%20%27LU%27%2C%20%27none%27%7D%2C%20default%3D%27auto%27">
                power_iteration_normalizer
                <span class="param-doc-description">power_iteration_normalizer: {'auto', 'QR', 'LU', 'none'}, default='auto'<br><br>Power iteration normalizer for randomized SVD solver.<br>Not used by ARPACK. See :func:`~sklearn.utils.extmath.randomized_svd`<br>for more details.<br><br>.. versionadded:: 1.1</span>
            </a>
        </td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('random_state',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.decomposition.PCA.html#:~:text=random_state,-int%2C%20RandomState%20instance%20or%20None%2C%20default%3DNone">
                random_state
                <span class="param-doc-description">random_state: int, RandomState instance or None, default=None<br><br>Used when the 'arpack' or 'randomized' solvers are used. Pass an int<br>for reproducible results across multiple function calls.<br>See :term:`Glossary <random_state>`.<br><br>.. versionadded:: 0.18.0</span>
            </a>
        </td>
                <td class="value">42</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-281" type="checkbox" ><label for="sk-estimator-id-281" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>LogisticRegression</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html">?<span>Documentation for LogisticRegression</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="best_estimator___classify__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('penalty',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=penalty,-%7B%27l1%27%2C%20%27l2%27%2C%20%27elasticnet%27%2C%20None%7D%2C%20default%3D%27l2%27">
                penalty
                <span class="param-doc-description">penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'<br><br>Specify the norm of the penalty:<br><br>- `None`: no penalty is added;<br>- `'l2'`: add a L2 penalty term and it is the default choice;<br>- `'l1'`: add a L1 penalty term;<br>- `'elasticnet'`: both L1 and L2 penalty terms are added.<br><br>.. warning::<br>   Some penalties may not work with some solvers. See the parameter<br>   `solver` below, to know the compatibility between the penalty and<br>   solver.<br><br>.. versionadded:: 0.19<br>   l1 penalty with SAGA solver (allowing 'multinomial' + L1)<br><br>.. deprecated:: 1.8<br>   `penalty` was deprecated in version 1.8 and will be removed in 1.10.<br>   Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for<br>   `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for<br>   `'penalty='elasticnet'`.</span>
            </a>
        </td>
                <td class="value">&#x27;deprecated&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('C',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=C,-float%2C%20default%3D1.0">
                C
                <span class="param-doc-description">C: float, default=1.0<br><br>Inverse of regularization strength; must be a positive float.<br>Like in support vector machines, smaller values specify stronger<br>regularization. `C=np.inf` results in unpenalized logistic regression.<br>For a visual example on the effect of tuning the `C` parameter<br>with an L1 penalty, see:<br>:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.</span>
            </a>
        </td>
                <td class="value">0.01</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('l1_ratio',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=l1_ratio,-float%2C%20default%3D0.0">
                l1_ratio
                <span class="param-doc-description">l1_ratio: float, default=0.0<br><br>The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting<br>`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.<br>Any value between 0 and 1 gives an Elastic-Net penalty of the form<br>`l1_ratio * L1 + (1 - l1_ratio) * L2`.<br><br>.. warning::<br>   Certain values of `l1_ratio`, i.e. some penalties, may not work with some<br>   solvers. See the parameter `solver` below, to know the compatibility between<br>   the penalty and solver.<br><br>.. versionchanged:: 1.8<br>    Default value changed from None to 0.0.<br><br>.. deprecated:: 1.8<br>    `None` is deprecated and will be removed in version 1.10. Always use<br>    `l1_ratio` to specify the penalty type.</span>
            </a>
        </td>
                <td class="value">0.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('dual',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=dual,-bool%2C%20default%3DFalse">
                dual
                <span class="param-doc-description">dual: bool, default=False<br><br>Dual (constrained) or primal (regularized, see also<br>:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation<br>is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`<br>when n_samples > n_features.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('tol',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=tol,-float%2C%20default%3D1e-4">
                tol
                <span class="param-doc-description">tol: float, default=1e-4<br><br>Tolerance for stopping criteria.</span>
            </a>
        </td>
                <td class="value">0.0001</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('fit_intercept',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=fit_intercept,-bool%2C%20default%3DTrue">
                fit_intercept
                <span class="param-doc-description">fit_intercept: bool, default=True<br><br>Specifies if a constant (a.k.a. bias or intercept) should be<br>added to the decision function.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('intercept_scaling',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=intercept_scaling,-float%2C%20default%3D1">
                intercept_scaling
                <span class="param-doc-description">intercept_scaling: float, default=1<br><br>Useful only when the solver `liblinear` is used<br>and `self.fit_intercept` is set to `True`. In this case, `x` becomes<br>`[x, self.intercept_scaling]`,<br>i.e. a "synthetic" feature with constant value equal to<br>`intercept_scaling` is appended to the instance vector.<br>The intercept becomes<br>``intercept_scaling * synthetic_feature_weight``.<br><br>.. note::<br>    The synthetic feature weight is subject to L1 or L2<br>    regularization as all other features.<br>    To lessen the effect of regularization on synthetic feature weight<br>    (and therefore on the intercept) `intercept_scaling` has to be increased.</span>
            </a>
        </td>
                <td class="value">1</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('class_weight',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=class_weight,-dict%20or%20%27balanced%27%2C%20default%3DNone">
                class_weight
                <span class="param-doc-description">class_weight: dict or 'balanced', default=None<br><br>Weights associated with classes in the form ``{class_label: weight}``.<br>If not given, all classes are supposed to have weight one.<br><br>The "balanced" mode uses the values of y to automatically adjust<br>weights inversely proportional to class frequencies in the input data<br>as ``n_samples / (n_classes * np.bincount(y))``.<br><br>Note that these weights will be multiplied with sample_weight (passed<br>through the fit method) if sample_weight is specified.<br><br>.. versionadded:: 0.17<br>   *class_weight='balanced'*</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('random_state',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=random_state,-int%2C%20RandomState%20instance%2C%20default%3DNone">
                random_state
                <span class="param-doc-description">random_state: int, RandomState instance, default=None<br><br>Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the<br>data. See :term:`Glossary <random_state>` for details.</span>
            </a>
        </td>
                <td class="value">42</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('solver',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=solver,-%7B%27lbfgs%27%2C%20%27liblinear%27%2C%20%27newton-cg%27%2C%20%27newton-cholesky%27%2C%20%27sag%27%2C%20%27saga%27%7D%2C%20%20%20%20%20%20%20%20%20%20%20%20%20default%3D%27lbfgs%27">
                solver
                <span class="param-doc-description">solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'},             default='lbfgs'<br><br>Algorithm to use in the optimization problem. Default is 'lbfgs'.<br>To choose a solver, you might want to consider the following aspects:<br><br>- 'lbfgs' is a good default solver because it works reasonably well for a wide<br>  class of problems.<br>- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except<br>  'liblinear' minimize the full multinomial loss, 'liblinear' will raise an<br>  error.<br>- 'newton-cholesky' is a good choice for<br>  `n_samples` >> `n_features * n_classes`, especially with one-hot encoded<br>  categorical features with rare categories. Be aware that the memory usage<br>  of this solver has a quadratic dependency on `n_features * n_classes`<br>  because it explicitly computes the full Hessian matrix.<br>- For small datasets, 'liblinear' is a good choice, whereas 'sag'<br>  and 'saga' are faster for large ones;<br>- 'liblinear' can only handle binary classification by default. To apply a<br>  one-versus-rest scheme for the multiclass setting one can wrap it with the<br>  :class:`~sklearn.multiclass.OneVsRestClassifier`.<br><br>.. warning::<br>   The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`<br>   for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for<br>   Elastic-Net) and on (multinomial) multiclass support:<br><br>   ================= ======================== ======================<br>   solver            l1_ratio                 multinomial multiclass<br>   ================= ======================== ======================<br>   'lbfgs'           l1_ratio=0               yes<br>   'liblinear'       l1_ratio=1 or l1_ratio=0 no<br>   'newton-cg'       l1_ratio=0               yes<br>   'newton-cholesky' l1_ratio=0               yes<br>   'sag'             l1_ratio=0               yes<br>   'saga'            0<=l1_ratio<=1           yes<br>   ================= ======================== ======================<br><br>.. note::<br>   'sag' and 'saga' fast convergence is only guaranteed on features<br>   with approximately the same scale. You can preprocess the data with<br>   a scaler from :mod:`sklearn.preprocessing`.<br><br>.. seealso::<br>   Refer to the :ref:`User Guide <Logistic_regression>` for more<br>   information regarding :class:`LogisticRegression` and more specifically the<br>   :ref:`Table <logistic_regression_solvers>`<br>   summarizing solver/penalty supports.<br><br>.. versionadded:: 0.17<br>   Stochastic Average Gradient (SAG) descent solver. Multinomial support in<br>   version 0.18.<br>.. versionadded:: 0.19<br>   SAGA solver.<br>.. versionchanged:: 0.22<br>   The default solver changed from 'liblinear' to 'lbfgs' in 0.22.<br>.. versionadded:: 1.2<br>   newton-cholesky solver. Multinomial support in version 1.6.</span>
            </a>
        </td>
                <td class="value">&#x27;lbfgs&#x27;</td>
            </tr>
    

            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_iter',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=max_iter,-int%2C%20default%3D100">
                max_iter
                <span class="param-doc-description">max_iter: int, default=100<br><br>Maximum number of iterations taken for the solvers to converge.</span>
            </a>
        </td>
                <td class="value">1000</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=verbose,-int%2C%20default%3D0">
                verbose
                <span class="param-doc-description">verbose: int, default=0<br><br>For the liblinear and lbfgs solvers set verbose to any positive<br>number for verbosity.</span>
            </a>
        </td>
                <td class="value">0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('warm_start',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=warm_start,-bool%2C%20default%3DFalse">
                warm_start
                <span class="param-doc-description">warm_start: bool, default=False<br><br>When set to True, reuse the solution of the previous call to fit as<br>initialization, otherwise, just erase the previous solution.<br>Useless for liblinear solver. See :term:`the Glossary <warm_start>`.<br><br>.. versionadded:: 0.17<br>   *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('n_jobs',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.8/modules/generated/sklearn.linear_model.LogisticRegression.html#:~:text=n_jobs,-int%2C%20default%3DNone">
                n_jobs
                <span class="param-doc-description">n_jobs: int, default=None<br><br>Does not have any effect.<br><br>.. deprecated:: 1.8<br>   `n_jobs` is deprecated in version 1.8 and will be removed in 1.10.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div></div></div></div></div></div></div><script>function copyToClipboard(text, element) {
        // Get the parameter prefix from the closest toggleable content
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const fullParamName = paramPrefix ? `${paramPrefix}${text}` : text;

        const originalStyle = element.style;
        const computedStyle = window.getComputedStyle(element);
        const originalWidth = computedStyle.width;
        const originalHTML = element.innerHTML.replace('Copied!', '');

        navigator.clipboard.writeText(fullParamName)
            .then(() => {
                element.style.width = originalWidth;
                element.style.color = 'green';
                element.innerHTML = "Copied!";

                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            })
            .catch(err => {
                console.error('Failed to copy:', err);
                element.style.color = 'red';
                element.innerHTML = "Failed!";
                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            });
        return false;
    }

    document.querySelectorAll('.copy-paste-icon').forEach(function(element) {
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const paramName = element.parentElement.nextElementSibling
            .textContent.trim().split(' ')[0];
        const fullParamName = paramPrefix ? `${paramPrefix}${paramName}` : paramName;

        element.setAttribute('title', fullParamName);
    });


    /**
     * Adapted from Skrub
     * https://github.com/skrub-data/skrub/blob/403466d1d5d4dc76a7ef569b3f8228db59a31dc3/skrub/_reporting/_data/templates/report.js#L789
     * @returns "light" or "dark"
     */
    function detectTheme(element) {
        const body = document.querySelector('body');

        // Check VSCode theme
        const themeKindAttr = body.getAttribute('data-vscode-theme-kind');
        const themeNameAttr = body.getAttribute('data-vscode-theme-name');

        if (themeKindAttr && themeNameAttr) {
            const themeKind = themeKindAttr.toLowerCase();
            const themeName = themeNameAttr.toLowerCase();

            if (themeKind.includes("dark") || themeName.includes("dark")) {
                return "dark";
            }
            if (themeKind.includes("light") || themeName.includes("light")) {
                return "light";
            }
        }

        // Check Jupyter theme
        if (body.getAttribute('data-jp-theme-light') === 'false') {
            return 'dark';
        } else if (body.getAttribute('data-jp-theme-light') === 'true') {
            return 'light';
        }

        // Guess based on a parent element's color
        const color = window.getComputedStyle(element.parentNode, null).getPropertyValue('color');
        const match = color.match(/^rgb\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)\s*$/i);
        if (match) {
            const [r, g, b] = [
                parseFloat(match[1]),
                parseFloat(match[2]),
                parseFloat(match[3])
            ];

            // https://en.wikipedia.org/wiki/HSL_and_HSV#Lightness
            const luma = 0.299 * r + 0.587 * g + 0.114 * b;

            if (luma > 180) {
                // If the text is very bright we have a dark theme
                return 'dark';
            }
            if (luma < 75) {
                // If the text is very dark we have a light theme
                return 'light';
            }
            // Otherwise fall back to the next heuristic.
        }

        // Fallback to system preference
        return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
    }


    function forceTheme(elementId) {
        const estimatorElement = document.querySelector(`#${elementId}`);
        if (estimatorElement === null) {
            console.error(`Element with id ${elementId} not found.`);
        } else {
            const theme = detectTheme(estimatorElement);
            estimatorElement.classList.add(theme);
        }
    }

    forceTheme('sk-container-id-64');</script></body>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 157-163

Visualize the results
---------------------

We'll create a bar chart showing the test scores for different numbers of PCA
components, along with horizontal lines indicating the best score and the
one-standard-deviation threshold.

.. GENERATED FROM PYTHON SOURCE LINES 163-328

.. code-block:: Python


    n_components = grid.cv_results_["param_reduce_dim__n_components"]
    test_scores = grid.cv_results_["mean_test_score"]

    # Create a polars DataFrame for better data manipulation and visualization
    results_df = pl.DataFrame(
        {
            "n_components": n_components,
            "mean_test_score": test_scores,
            "std_test_score": grid.cv_results_["std_test_score"],
            "mean_train_score": grid.cv_results_["mean_train_score"],
            "std_train_score": grid.cv_results_["std_train_score"],
            "mean_fit_time": grid.cv_results_["mean_fit_time"],
            "rank_test_score": grid.cv_results_["rank_test_score"],
        }
    )

    # Sort by number of components
    results_df = results_df.sort("n_components")

    # Calculate the lower bound threshold
    lower = lower_bound(grid.cv_results_)

    # Get the best model information
    best_index_ = grid.best_index_
    best_components = n_components[best_index_]
    best_score = grid.cv_results_["mean_test_score"][best_index_]

    # Add a column to mark the selected model
    results_df = results_df.with_columns(
        pl.when(pl.col("n_components") == best_components)
        .then(pl.lit("Selected"))
        .otherwise(pl.lit("Regular"))
        .alias("model_type")
    )

    # Get the number of CV splits from the results
    n_splits = sum(
        1
        for key in grid.cv_results_.keys()
        if key.startswith("split") and key.endswith("test_score")
    )

    # Extract individual scores for each split
    test_scores = np.array(
        [
            [grid.cv_results_[f"split{i}_test_score"][j] for i in range(n_splits)]
            for j in range(len(n_components))
        ]
    )
    train_scores = np.array(
        [
            [grid.cv_results_[f"split{i}_train_score"][j] for i in range(n_splits)]
            for j in range(len(n_components))
        ]
    )

    # Calculate mean and std of test scores
    mean_test_scores = np.mean(test_scores, axis=1)
    std_test_scores = np.std(test_scores, axis=1)

    # Find best score and threshold
    best_mean_score = np.max(mean_test_scores)
    threshold = best_mean_score - std_test_scores[np.argmax(mean_test_scores)]

    # Create a single figure for visualization
    fig, ax = plt.subplots(figsize=(12, 8))

    # Plot individual points
    for i, comp in enumerate(n_components):
        # Plot individual test points
        plt.scatter(
            [comp] * n_splits,
            test_scores[i],
            alpha=0.2,
            color="blue",
            s=20,
            label="Individual test scores" if i == 0 else "",
        )
        # Plot individual train points
        plt.scatter(
            [comp] * n_splits,
            train_scores[i],
            alpha=0.2,
            color="green",
            s=20,
            label="Individual train scores" if i == 0 else "",
        )

    # Plot mean lines with error bands
    plt.plot(
        n_components,
        np.mean(test_scores, axis=1),
        "-",
        color="blue",
        linewidth=2,
        label="Mean test score",
    )
    plt.fill_between(
        n_components,
        np.mean(test_scores, axis=1) - np.std(test_scores, axis=1),
        np.mean(test_scores, axis=1) + np.std(test_scores, axis=1),
        alpha=0.15,
        color="blue",
    )

    plt.plot(
        n_components,
        np.mean(train_scores, axis=1),
        "-",
        color="green",
        linewidth=2,
        label="Mean train score",
    )
    plt.fill_between(
        n_components,
        np.mean(train_scores, axis=1) - np.std(train_scores, axis=1),
        np.mean(train_scores, axis=1) + np.std(train_scores, axis=1),
        alpha=0.15,
        color="green",
    )

    # Add threshold lines
    plt.axhline(
        best_mean_score,
        color="#9b59b6",  # Purple
        linestyle="--",
        label="Best score",
        linewidth=2,
    )
    plt.axhline(
        threshold,
        color="#e67e22",  # Orange
        linestyle="--",
        label="Best score - 1 std",
        linewidth=2,
    )

    # Highlight selected model
    plt.axvline(
        best_components,
        color="#9b59b6",  # Purple
        alpha=0.2,
        linewidth=8,
        label="Selected model",
    )

    # Set titles and labels
    plt.xlabel("Number of PCA components", fontsize=12)
    plt.ylabel("Score", fontsize=12)
    plt.title("Model Selection: Balancing Complexity and Performance", fontsize=14)
    plt.grid(True, linestyle="--", alpha=0.7)
    plt.legend(
        bbox_to_anchor=(1.02, 1),
        loc="upper left",
        borderaxespad=0,
    )

    # Set axis properties
    plt.xticks(n_components)
    plt.ylim((0.85, 1.0))

    # # Adjust layout
    plt.tight_layout()




.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_grid_search_refit_callable_001.png
   :alt: Model Selection: Balancing Complexity and Performance
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_grid_search_refit_callable_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 329-334

Print the results
-----------------

We print information about the selected model, including its complexity and
performance. We also show a summary table of all models using polars.

.. GENERATED FROM PYTHON SOURCE LINES 334-363

.. code-block:: Python


    print("Best model selected by the one-standard-error rule:")
    print(f"Number of PCA components: {best_components}")
    print(f"Accuracy score: {best_score:.4f}")
    print(f"Best possible accuracy: {np.max(test_scores):.4f}")
    print(f"Accuracy threshold (best - 1 std): {lower:.4f}")

    # Create a summary table with polars
    summary_df = results_df.select(
        pl.col("n_components"),
        pl.col("mean_test_score").round(4).alias("test_score"),
        pl.col("std_test_score").round(4).alias("test_std"),
        pl.col("mean_train_score").round(4).alias("train_score"),
        pl.col("std_train_score").round(4).alias("train_std"),
        pl.col("mean_fit_time").round(3).alias("fit_time"),
        pl.col("rank_test_score").alias("rank"),
    )

    # Add a column to mark the selected model
    summary_df = summary_df.with_columns(
        pl.when(pl.col("n_components") == best_components)
        .then(pl.lit("*"))
        .otherwise(pl.lit(""))
        .alias("selected")
    )

    print("\nModel comparison table:")
    print(summary_df)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Best model selected by the one-standard-error rule:
    Number of PCA components: 25
    Accuracy score: 0.9643
    Best possible accuracy: 0.9944
    Accuracy threshold (best - 1 std): 0.9623

    Model comparison table:
    shape: (9, 8)
    ┌──────────────┬────────────┬──────────┬─────────────┬───────────┬──────────┬──────┬──────────┐
    │ n_components ┆ test_score ┆ test_std ┆ train_score ┆ train_std ┆ fit_time ┆ rank ┆ selected │
    │ ---          ┆ ---        ┆ ---      ┆ ---         ┆ ---       ┆ ---      ┆ ---  ┆ ---      │
    │ i64          ┆ f64        ┆ f64      ┆ f64         ┆ f64       ┆ f64      ┆ i32  ┆ str      │
    ╞══════════════╪════════════╪══════════╪═════════════╪═══════════╪══════════╪══════╪══════════╡
    │ 6            ┆ 0.8631     ┆ 0.0241   ┆ 0.8697      ┆ 0.0048    ┆ 0.092    ┆ 9    ┆          │
    │ 8            ┆ 0.9037     ┆ 0.0192   ┆ 0.9146      ┆ 0.0028    ┆ 0.083    ┆ 8    ┆          │
    │ 10           ┆ 0.9341     ┆ 0.0148   ┆ 0.9493      ┆ 0.0023    ┆ 0.058    ┆ 7    ┆          │
    │ 15           ┆ 0.95       ┆ 0.0162   ┆ 0.9662      ┆ 0.0022    ┆ 0.055    ┆ 6    ┆          │
    │ 20           ┆ 0.9563     ┆ 0.0144   ┆ 0.9759      ┆ 0.0019    ┆ 0.054    ┆ 5    ┆          │
    │ 25           ┆ 0.9643     ┆ 0.0126   ┆ 0.9836      ┆ 0.0014    ┆ 0.051    ┆ 4    ┆ *        │
    │ 35           ┆ 0.9685     ┆ 0.0115   ┆ 0.9903      ┆ 0.0013    ┆ 0.054    ┆ 3    ┆          │
    │ 45           ┆ 0.9711     ┆ 0.0093   ┆ 0.9926      ┆ 0.001     ┆ 0.058    ┆ 2    ┆          │
    │ 55           ┆ 0.9717     ┆ 0.0093   ┆ 0.993       ┆ 0.001     ┆ 0.06     ┆ 1    ┆          │
    └──────────────┴────────────┴──────────┴─────────────┴───────────┴──────────┴──────┴──────────┘




.. GENERATED FROM PYTHON SOURCE LINES 364-387

Conclusion
----------

The one-standard-error rule helps us select a simpler model (fewer PCA components)
while maintaining performance statistically comparable to the best model.
This approach can help prevent overfitting and improve model interpretability
and efficiency.

In this example, we've seen how to implement this rule using a custom refit
callable with :class:`~sklearn.model_selection.GridSearchCV`.

Key takeaways:

1. The one-standard-error rule provides a good rule of thumb to select simpler models

2. Custom refit callables in :class:`~sklearn.model_selection.GridSearchCV` allow for
   flexible model selection strategies

3. Visualizing both train and test scores helps identify potential overfitting

This approach can be applied to other model selection scenarios where balancing
complexity and performance is important, or in cases where a use-case specific
selection of the "best" model is desired.

.. GENERATED FROM PYTHON SOURCE LINES 387-390

.. code-block:: Python


    # Display the figure
    plt.show()








.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 18.163 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_grid_search_refit_callable.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_grid_search_refit_callable.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/model_selection/plot_grid_search_refit_callable.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_grid_search_refit_callable.ipynb <plot_grid_search_refit_callable.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_grid_search_refit_callable.py <plot_grid_search_refit_callable.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_grid_search_refit_callable.zip <plot_grid_search_refit_callable.zip>`


.. include:: plot_grid_search_refit_callable.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
