
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_train_error_vs_test_error.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_train_error_vs_test_error.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_train_error_vs_test_error.py:


=========================================================
Effect of model regularization on training and test error
=========================================================

In this example, we evaluate the impact of the regularization parameter in a
linear model called :class:`~sklearn.linear_model.ElasticNet`. To carry out this
evaluation, we use a validation curve using
:class:`~sklearn.model_selection.ValidationCurveDisplay`. This curve shows the
training and test scores of the model for different values of the regularization
parameter.

Once we identify the optimal regularization parameter, we compare the true and
estimated coefficients of the model to determine if the model is able to recover
the coefficients from the noisy input data.

.. GENERATED FROM PYTHON SOURCE LINES 17-21

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 22-29

Generate sample data
--------------------

We generate a regression dataset that contains many features relative to the
number of samples. However, only 10% of the features are informative. In this context,
linear models exposing L1 penalization are commonly used to recover a sparse
set of coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 29-46

.. code-block:: Python

    from sklearn.datasets import make_regression
    from sklearn.model_selection import train_test_split

    n_samples_train, n_samples_test, n_features = 150, 300, 500
    X, y, true_coef = make_regression(
        n_samples=n_samples_train + n_samples_test,
        n_features=n_features,
        n_informative=50,
        shuffle=False,
        noise=1.0,
        coef=True,
        random_state=42,
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False
    )








.. GENERATED FROM PYTHON SOURCE LINES 47-61

Model definition
----------------

Here, we do not use a model that only exposes an L1 penalty. Instead, we use
an :class:`~sklearn.linear_model.ElasticNet` model that exposes both L1 and L2
penalties.

We fix the `l1_ratio` parameter such that the solution found by the model is still
sparse. Therefore, this type of model tries to find a sparse solution but at the same
time also tries to shrink all coefficients towards zero.

In addition, we force the coefficients of the model to be positive since we know that
`make_regression` generates a response with a positive signal. So we use this
pre-knowledge to get a better model.

.. GENERATED FROM PYTHON SOURCE LINES 61-67

.. code-block:: Python


    from sklearn.linear_model import ElasticNet

    enet = ElasticNet(l1_ratio=0.9, positive=True, max_iter=10_000)









.. GENERATED FROM PYTHON SOURCE LINES 68-88

Evaluate the impact of the regularization parameter
---------------------------------------------------

To evaluate the impact of the regularization parameter, we use a validation
curve. This curve shows the training and test scores of the model for different
values of the regularization parameter.

The regularization `alpha` is a parameter applied to the coefficients of the model:
when it tends to zero, no regularization is applied and the model tries to fit the
training data with the least amount of error. However, it leads to overfitting when
features are noisy. When `alpha` increases, the model coefficients are constrained,
and thus the model cannot fit the training data as closely, avoiding overfitting.
However, if too much regularization is applied, the model underfits the data and
is not able to properly capture the signal.

The validation curve helps in finding a good trade-off between both extremes: the
model is not regularized and thus flexible enough to fit the signal, but not too
flexible to overfit. The :class:`~sklearn.model_selection.ValidationCurveDisplay`
allows us to display the training and validation scores across a range of alpha
values.

.. GENERATED FROM PYTHON SOURCE LINES 88-122

.. code-block:: Python

    import numpy as np

    from sklearn.model_selection import ValidationCurveDisplay

    alphas = np.logspace(-5, 1, 60)
    disp = ValidationCurveDisplay.from_estimator(
        enet,
        X_train,
        y_train,
        param_name="alpha",
        param_range=alphas,
        scoring="r2",
        n_jobs=2,
        score_type="both",
    )
    disp.ax_.set(
        title=r"Validation Curve for ElasticNet (R$^2$ Score)",
        xlabel=r"alpha (regularization strength)",
        ylabel="R$^2$ Score",
    )

    test_scores_mean = disp.test_scores.mean(axis=1)
    idx_avg_max_test_score = np.argmax(test_scores_mean)
    disp.ax_.vlines(
        alphas[idx_avg_max_test_score],
        disp.ax_.get_ylim()[0],
        test_scores_mean[idx_avg_max_test_score],
        color="k",
        linewidth=2,
        linestyle="--",
        label=f"Optimum on test\n$\\alpha$ = {alphas[idx_avg_max_test_score]:.2e}",
    )
    _ = disp.ax_.legend(loc="lower right")




.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_001.png
   :alt: Validation Curve for ElasticNet (R$^2$ Score)
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 123-134

To find the optimal regularization parameter, we can select the value of `alpha`
that maximizes the validation score.

Coefficients comparison
-----------------------

Now that we have identified the optimal regularization parameter, we can compare the
true coefficients and the estimated coefficients.

First, let's set the regularization parameter to the optimal value and fit the
model on the training data. In addition, we'll show the test score for this model.

.. GENERATED FROM PYTHON SOURCE LINES 134-139

.. code-block:: Python

    enet.set_params(alpha=alphas[idx_avg_max_test_score]).fit(X_train, y_train)
    print(
        f"Test score: {enet.score(X_test, y_test):.3f}",
    )





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Test score: 0.884




.. GENERATED FROM PYTHON SOURCE LINES 140-141

Now, we plot the true coefficients and the estimated coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 141-158

.. code-block:: Python

    import matplotlib.pyplot as plt

    fig, axs = plt.subplots(ncols=2, figsize=(12, 6), sharex=True, sharey=True)
    for ax, coef, title in zip(axs, [true_coef, enet.coef_], ["True", "Model"]):
        ax.stem(coef)
        ax.set(
            title=f"{title} Coefficients",
            xlabel="Feature Index",
            ylabel="Coefficient Value",
        )
    fig.suptitle(
        "Comparison of the coefficients of the true generative model and \n"
        "the estimated elastic net coefficients"
    )

    plt.show()




.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_002.png
   :alt: Comparison of the coefficients of the true generative model and  the estimated elastic net coefficients, True Coefficients, Model Coefficients
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_train_error_vs_test_error_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 159-167

While the original coefficients are sparse, the estimated coefficients are not
as sparse. The reason is that we fixed the `l1_ratio` parameter to 0.9. We could
force the model to get a sparser solution by increasing the `l1_ratio` parameter.

However, we observed that for the estimated coefficients that are close to zero in
the true generative model, our model shrinks them towards zero. So we don't recover
the true coefficients, but we get a sensible outcome in line with the performance
obtained on the test set.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 5.170 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_train_error_vs_test_error.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_train_error_vs_test_error.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/model_selection/plot_train_error_vs_test_error.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_train_error_vs_test_error.ipynb <plot_train_error_vs_test_error.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_train_error_vs_test_error.py <plot_train_error_vs_test_error.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_train_error_vs_test_error.zip <plot_train_error_vs_test_error.zip>`


.. include:: plot_train_error_vs_test_error.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
