
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/gaussian_process/plot_gpr_noisy_targets.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_gaussian_process_plot_gpr_noisy_targets.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_gaussian_process_plot_gpr_noisy_targets.py:


=========================================================
Gaussian Processes regression: basic introductory example
=========================================================

A simple one-dimensional regression example computed in two different ways:

1. A noise-free case
2. A noisy case with known noise-level per datapoint

In both cases, the kernel's parameters are estimated using the maximum
likelihood principle.

The figures illustrate the interpolating property of the Gaussian Process model
as well as its probabilistic nature in the form of a pointwise 95% confidence
interval.

Note that `alpha` is a parameter to control the strength of the Tikhonov
regularization on the assumed training points' covariance matrix.

.. GENERATED FROM PYTHON SOURCE LINES 21-25

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 26-31

Dataset generation
------------------

We will start by generating a synthetic dataset. The true generative process
is defined as :math:`f(x) = x \sin(x)`.

.. GENERATED FROM PYTHON SOURCE LINES 31-36

.. code-block:: Python

    import numpy as np

    X = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
    y = np.squeeze(X * np.sin(X))








.. GENERATED FROM PYTHON SOURCE LINES 37-45

.. code-block:: Python

    import matplotlib.pyplot as plt

    plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
    plt.legend()
    plt.xlabel("$x$")
    plt.ylabel("$f(x)$")
    _ = plt.title("True generative process")




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_001.png
   :alt: True generative process
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 46-55

We will use this dataset in the next experiment to illustrate how Gaussian
Process regression is working.

Example with noise-free target
------------------------------

In this first example, we will use the true generative process without
adding any noise. For training the Gaussian Process regression, we will only
select few samples.

.. GENERATED FROM PYTHON SOURCE LINES 55-59

.. code-block:: Python

    rng = np.random.RandomState(1)
    training_indices = rng.choice(np.arange(y.size), size=6, replace=False)
    X_train, y_train = X[training_indices], y[training_indices]








.. GENERATED FROM PYTHON SOURCE LINES 60-63

Now, we fit a Gaussian process on these few training data samples. We will
use a radial basis function (RBF) kernel and a constant parameter to fit the
amplitude.

.. GENERATED FROM PYTHON SOURCE LINES 63-71

.. code-block:: Python

    from sklearn.gaussian_process import GaussianProcessRegressor
    from sklearn.gaussian_process.kernels import RBF

    kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
    gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
    gaussian_process.fit(X_train, y_train)
    gaussian_process.kernel_





.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    5.02**2 * RBF(length_scale=1.43)



.. GENERATED FROM PYTHON SOURCE LINES 72-75

After fitting our model, we see that the hyperparameters of the kernel have
been optimized. Now, we will use our kernel to compute the mean prediction
of the full dataset and plot the 95% confidence interval.

.. GENERATED FROM PYTHON SOURCE LINES 75-92

.. code-block:: Python

    mean_prediction, std_prediction = gaussian_process.predict(X, return_std=True)

    plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
    plt.scatter(X_train, y_train, label="Observations")
    plt.plot(X, mean_prediction, label="Mean prediction")
    plt.fill_between(
        X.ravel(),
        mean_prediction - 1.96 * std_prediction,
        mean_prediction + 1.96 * std_prediction,
        alpha=0.5,
        label=r"95% confidence interval",
    )
    plt.legend()
    plt.xlabel("$x$")
    plt.ylabel("$f(x)$")
    _ = plt.title("Gaussian process regression on noise-free dataset")




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_002.png
   :alt: Gaussian process regression on noise-free dataset
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 93-106

We see that for a prediction made on a data point close to the one from the
training set, the 95% confidence has a small amplitude. Whenever a sample
falls far from training data, our model's prediction is less accurate and the
model prediction is less precise (higher uncertainty).

Example with noisy targets
--------------------------

We can repeat a similar experiment adding an additional noise to the target
this time. It will allow seeing the effect of the noise on the fitted model.

We add some random Gaussian noise to the target with an arbitrary
standard deviation.

.. GENERATED FROM PYTHON SOURCE LINES 106-109

.. code-block:: Python

    noise_std = 0.75
    y_train_noisy = y_train + rng.normal(loc=0.0, scale=noise_std, size=y_train.shape)








.. GENERATED FROM PYTHON SOURCE LINES 110-113

We create a similar Gaussian process model. In addition to the kernel, this
time, we specify the parameter `alpha` which can be interpreted as the
variance of a Gaussian noise.

.. GENERATED FROM PYTHON SOURCE LINES 113-119

.. code-block:: Python

    gaussian_process = GaussianProcessRegressor(
        kernel=kernel, alpha=noise_std**2, n_restarts_optimizer=9
    )
    gaussian_process.fit(X_train, y_train_noisy)
    mean_prediction, std_prediction = gaussian_process.predict(X, return_std=True)








.. GENERATED FROM PYTHON SOURCE LINES 120-121

Let's plot the mean prediction and the uncertainty region as before.

.. GENERATED FROM PYTHON SOURCE LINES 121-146

.. code-block:: Python

    plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
    plt.errorbar(
        X_train,
        y_train_noisy,
        noise_std,
        linestyle="None",
        color="tab:blue",
        marker=".",
        markersize=10,
        label="Observations",
    )
    plt.plot(X, mean_prediction, label="Mean prediction")
    plt.fill_between(
        X.ravel(),
        mean_prediction - 1.96 * std_prediction,
        mean_prediction + 1.96 * std_prediction,
        color="tab:orange",
        alpha=0.5,
        label=r"95% confidence interval",
    )
    plt.legend()
    plt.xlabel("$x$")
    plt.ylabel("$f(x)$")
    _ = plt.title("Gaussian process regression on a noisy dataset")




.. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_003.png
   :alt: Gaussian process regression on a noisy dataset
   :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_targets_003.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 147-151

The noise affects the predictions close to the training samples: the
predictive uncertainty near to the training samples is larger because we
explicitly model a given level target noise independent of the input
variable.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.440 seconds)


.. _sphx_glr_download_auto_examples_gaussian_process_plot_gpr_noisy_targets.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/gaussian_process/plot_gpr_noisy_targets.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/gaussian_process/plot_gpr_noisy_targets.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_gpr_noisy_targets.ipynb <plot_gpr_noisy_targets.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_gpr_noisy_targets.py <plot_gpr_noisy_targets.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_gpr_noisy_targets.zip <plot_gpr_noisy_targets.zip>`


.. include:: plot_gpr_noisy_targets.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
