
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/preprocessing/plot_discretization_strategies.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_preprocessing_plot_discretization_strategies.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_preprocessing_plot_discretization_strategies.py:


==========================================================
Demonstrating the different strategies of KBinsDiscretizer
==========================================================

This example presents the different strategies implemented in KBinsDiscretizer:

- 'uniform': The discretization is uniform in each feature, which means that
  the bin widths are constant in each dimension.
- 'quantile': The discretization is done on the quantiled values, which means
  that each bin has approximately the same number of samples.
- 'kmeans': The discretization is based on the centroids of a KMeans clustering
  procedure.

The plot shows the regions where the discretized encoding is constant.

.. GENERATED FROM PYTHON SOURCE LINES 18-109



.. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_discretization_strategies_001.png
   :alt: Input data, strategy='uniform', strategy='quantile', strategy='kmeans'
   :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_discretization_strategies_001.png
   :class: sphx-glr-single-img





.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause

    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn.datasets import make_blobs
    from sklearn.preprocessing import KBinsDiscretizer

    strategies = ["uniform", "quantile", "kmeans"]

    n_samples = 200
    centers_0 = np.array([[0, 0], [0, 5], [2, 4], [8, 8]])
    centers_1 = np.array([[0, 0], [3, 1]])

    # construct the datasets
    random_state = 42
    X_list = [
        np.random.RandomState(random_state).uniform(-3, 3, size=(n_samples, 2)),
        make_blobs(
            n_samples=[
                n_samples // 10,
                n_samples * 4 // 10,
                n_samples // 10,
                n_samples * 4 // 10,
            ],
            cluster_std=0.5,
            centers=centers_0,
            random_state=random_state,
        )[0],
        make_blobs(
            n_samples=[n_samples // 5, n_samples * 4 // 5],
            cluster_std=0.5,
            centers=centers_1,
            random_state=random_state,
        )[0],
    ]

    figure = plt.figure(figsize=(14, 9))
    i = 1
    for ds_cnt, X in enumerate(X_list):
        ax = plt.subplot(len(X_list), len(strategies) + 1, i)
        ax.scatter(X[:, 0], X[:, 1], edgecolors="k")
        if ds_cnt == 0:
            ax.set_title("Input data", size=14)

        xx, yy = np.meshgrid(
            np.linspace(X[:, 0].min(), X[:, 0].max(), 300),
            np.linspace(X[:, 1].min(), X[:, 1].max(), 300),
        )
        grid = np.c_[xx.ravel(), yy.ravel()]

        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())

        i += 1
        # transform the dataset with KBinsDiscretizer
        for strategy in strategies:
            enc = KBinsDiscretizer(
                n_bins=4,
                encode="ordinal",
                quantile_method="averaged_inverted_cdf",
                strategy=strategy,
            )
            enc.fit(X)
            grid_encoded = enc.transform(grid)

            ax = plt.subplot(len(X_list), len(strategies) + 1, i)

            # horizontal stripes
            horizontal = grid_encoded[:, 0].reshape(xx.shape)
            ax.contourf(xx, yy, horizontal, alpha=0.5)
            # vertical stripes
            vertical = grid_encoded[:, 1].reshape(xx.shape)
            ax.contourf(xx, yy, vertical, alpha=0.5)

            ax.scatter(X[:, 0], X[:, 1], edgecolors="k")
            ax.set_xlim(xx.min(), xx.max())
            ax.set_ylim(yy.min(), yy.max())
            ax.set_xticks(())
            ax.set_yticks(())
            if ds_cnt == 0:
                ax.set_title("strategy='%s'" % (strategy,), size=14)

            i += 1

    plt.tight_layout()
    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.553 seconds)


.. _sphx_glr_download_auto_examples_preprocessing_plot_discretization_strategies.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/preprocessing/plot_discretization_strategies.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/preprocessing/plot_discretization_strategies.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_discretization_strategies.ipynb <plot_discretization_strategies.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_discretization_strategies.py <plot_discretization_strategies.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_discretization_strategies.zip <plot_discretization_strategies.zip>`


.. include:: plot_discretization_strategies.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
