
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/decomposition/plot_pca_iris.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_decomposition_plot_pca_iris.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_decomposition_plot_pca_iris.py:


==================================================
Principal Component Analysis (PCA) on Iris Dataset
==================================================

This example shows a well known decomposition technique known as Principal Component
Analysis (PCA) on the
`Iris dataset <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_.

This dataset is made of 4 features: sepal length, sepal width, petal length, petal
width. We use PCA to project this 4 feature space into a 3-dimensional space.

.. GENERATED FROM PYTHON SOURCE LINES 13-17

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 18-25

Loading the Iris dataset
------------------------

The Iris dataset is directly available as part of scikit-learn. It can be loaded
using the :func:`~sklearn.datasets.load_iris` function. With the default parameters,
a :class:`~sklearn.utils.Bunch` object is returned, containing the data, the
target values, the feature names, and the target names.

.. GENERATED FROM PYTHON SOURCE LINES 25-30

.. code-block:: Python

    from sklearn.datasets import load_iris

    iris = load_iris(as_frame=True)
    print(iris.keys())





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])




.. GENERATED FROM PYTHON SOURCE LINES 31-35

Plot of pairs of features of the Iris dataset
---------------------------------------------

Let's first plot the pairs of features of the Iris dataset.

.. GENERATED FROM PYTHON SOURCE LINES 35-41

.. code-block:: Python

    import seaborn as sns

    # Rename classes using the iris target names
    iris.frame["target"] = iris.target_names[iris.target]
    _ = sns.pairplot(iris.frame, hue="target")




.. image-sg:: /auto_examples/decomposition/images/sphx_glr_plot_pca_iris_001.png
   :alt: plot pca iris
   :srcset: /auto_examples/decomposition/images/sphx_glr_plot_pca_iris_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 42-60

Each data point on each scatter plot refers to one of the 150 iris flowers
in the dataset, with the color indicating their respective type
(Setosa, Versicolor, and Virginica).

You can already see a pattern regarding the Setosa type, which is
easily identifiable based on its short and wide sepal. Only
considering these two dimensions, sepal width and length, there's still
overlap between the Versicolor and Virginica types.

The diagonal of the plot shows the distribution of each feature. We observe
that the petal width and the petal length are the most discriminant features
for the three types.

Plot a PCA representation
-------------------------
Let's apply a Principal Component Analysis (PCA) to the iris dataset
and then plot the irises across the first three principal components.
This will allow us to better differentiate among the three types!

.. GENERATED FROM PYTHON SOURCE LINES 60-101

.. code-block:: Python


    import matplotlib.pyplot as plt

    # unused but required import for doing 3d projections with matplotlib < 3.2
    import mpl_toolkits.mplot3d  # noqa: F401

    from sklearn.decomposition import PCA

    fig = plt.figure(1, figsize=(8, 6))
    ax = fig.add_subplot(111, projection="3d", elev=-150, azim=110)

    X_reduced = PCA(n_components=3).fit_transform(iris.data)
    scatter = ax.scatter(
        X_reduced[:, 0],
        X_reduced[:, 1],
        X_reduced[:, 2],
        c=iris.target,
        s=40,
    )

    ax.set(
        title="First three principal components",
        xlabel="1st Principal Component",
        ylabel="2nd Principal Component",
        zlabel="3rd Principal Component",
    )
    ax.xaxis.set_ticklabels([])
    ax.yaxis.set_ticklabels([])
    ax.zaxis.set_ticklabels([])

    # Add a legend
    legend1 = ax.legend(
        scatter.legend_elements()[0],
        iris.target_names.tolist(),
        loc="upper right",
        title="Classes",
    )
    ax.add_artist(legend1)

    plt.show()




.. image-sg:: /auto_examples/decomposition/images/sphx_glr_plot_pca_iris_002.png
   :alt: First three principal components
   :srcset: /auto_examples/decomposition/images/sphx_glr_plot_pca_iris_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 102-105

PCA will create 3 new features that are a linear combination of the 4 original
features. In addition, this transformation maximizes the variance. With this
transformation, we can identify each species using only the first principal component.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 2.060 seconds)


.. _sphx_glr_download_auto_examples_decomposition_plot_pca_iris.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/decomposition/plot_pca_iris.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/decomposition/plot_pca_iris.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_pca_iris.ipynb <plot_pca_iris.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_pca_iris.py <plot_pca_iris.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_pca_iris.zip <plot_pca_iris.zip>`


.. include:: plot_pca_iris.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
