
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cluster/plot_dbscan.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_cluster_plot_dbscan.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_dbscan.py:


===================================
Demo of DBSCAN clustering algorithm
===================================

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) finds core
samples in regions of high density and expands clusters from them. This
algorithm is good for data which contains clusters of similar density.

See the :ref:`sphx_glr_auto_examples_cluster_plot_cluster_comparison.py` example
for a demo of different clustering algorithms on 2D datasets.

.. GENERATED FROM PYTHON SOURCE LINES 14-18

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 19-23

Data generation
---------------

We use :class:`~sklearn.datasets.make_blobs` to create 3 synthetic clusters.

.. GENERATED FROM PYTHON SOURCE LINES 23-34

.. code-block:: Python


    from sklearn.datasets import make_blobs
    from sklearn.preprocessing import StandardScaler

    centers = [[1, 1], [-1, -1], [1, -1]]
    X, labels_true = make_blobs(
        n_samples=750, centers=centers, cluster_std=0.4, random_state=0
    )

    X = StandardScaler().fit_transform(X)








.. GENERATED FROM PYTHON SOURCE LINES 35-36

We can visualize the resulting data:

.. GENERATED FROM PYTHON SOURCE LINES 36-42

.. code-block:: Python


    import matplotlib.pyplot as plt

    plt.scatter(X[:, 0], X[:, 1])
    plt.show()




.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_dbscan_001.png
   :alt: plot dbscan
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_dbscan_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 43-48

Compute DBSCAN
--------------

One can access the labels assigned by :class:`~sklearn.cluster.DBSCAN` using
the `labels_` attribute. Noisy samples are given the label :math:`-1`.

.. GENERATED FROM PYTHON SOURCE LINES 48-64

.. code-block:: Python


    import numpy as np

    from sklearn import metrics
    from sklearn.cluster import DBSCAN

    db = DBSCAN(eps=0.3, min_samples=10).fit(X)
    labels = db.labels_

    # Number of clusters in labels, ignoring noise if present.
    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise_ = list(labels).count(-1)

    print("Estimated number of clusters: %d" % n_clusters_)
    print("Estimated number of noise points: %d" % n_noise_)





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Estimated number of clusters: 3
    Estimated number of noise points: 18




.. GENERATED FROM PYTHON SOURCE LINES 65-80

Clustering algorithms are fundamentally unsupervised learning methods.
However, since :class:`~sklearn.datasets.make_blobs` gives access to the true
labels of the synthetic clusters, it is possible to use evaluation metrics
that leverage this "supervised" ground truth information to quantify the
quality of the resulting clusters. Examples of such metrics are the
homogeneity, completeness, V-measure, Rand-Index, Adjusted Rand-Index and
Adjusted Mutual Information (AMI).

If the ground truth labels are not known, evaluation can only be performed
using the model results itself. In that case, the Silhouette Coefficient comes
in handy.

For more information, see the
:ref:`sphx_glr_auto_examples_cluster_plot_adjusted_for_chance_measures.py`
example or the :ref:`clustering_evaluation` module.

.. GENERATED FROM PYTHON SOURCE LINES 80-91

.. code-block:: Python


    print(f"Homogeneity: {metrics.homogeneity_score(labels_true, labels):.3f}")
    print(f"Completeness: {metrics.completeness_score(labels_true, labels):.3f}")
    print(f"V-measure: {metrics.v_measure_score(labels_true, labels):.3f}")
    print(f"Adjusted Rand Index: {metrics.adjusted_rand_score(labels_true, labels):.3f}")
    print(
        "Adjusted Mutual Information:"
        f" {metrics.adjusted_mutual_info_score(labels_true, labels):.3f}"
    )
    print(f"Silhouette Coefficient: {metrics.silhouette_score(X, labels):.3f}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Homogeneity: 0.953
    Completeness: 0.883
    V-measure: 0.917
    Adjusted Rand Index: 0.952
    Adjusted Mutual Information: 0.916
    Silhouette Coefficient: 0.626




.. GENERATED FROM PYTHON SOURCE LINES 92-98

Plot results
------------

Core samples (large dots) and non-core samples (small dots) are color-coded
according to the assigned cluster. Samples tagged as noise are represented in
black.

.. GENERATED FROM PYTHON SOURCE LINES 98-133

.. code-block:: Python


    unique_labels = set(labels)
    core_samples_mask = np.zeros_like(labels, dtype=bool)
    core_samples_mask[db.core_sample_indices_] = True

    colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
    for k, col in zip(unique_labels, colors):
        if k == -1:
            # Black used for noise.
            col = [0, 0, 0, 1]

        class_member_mask = labels == k

        xy = X[class_member_mask & core_samples_mask]
        plt.plot(
            xy[:, 0],
            xy[:, 1],
            "o",
            markerfacecolor=tuple(col),
            markeredgecolor="k",
            markersize=14,
        )

        xy = X[class_member_mask & ~core_samples_mask]
        plt.plot(
            xy[:, 0],
            xy[:, 1],
            "o",
            markerfacecolor=tuple(col),
            markeredgecolor="k",
            markersize=6,
        )

    plt.title(f"Estimated number of clusters: {n_clusters_}")
    plt.show()



.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_dbscan_002.png
   :alt: Estimated number of clusters: 3
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_dbscan_002.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.142 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_dbscan.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_dbscan.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/cluster/plot_dbscan.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_dbscan.ipynb <plot_dbscan.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_dbscan.py <plot_dbscan.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_dbscan.zip <plot_dbscan.zip>`


.. include:: plot_dbscan.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
