
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/linear_model/plot_lasso_dense_vs_sparse_data.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_linear_model_plot_lasso_dense_vs_sparse_data.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_linear_model_plot_lasso_dense_vs_sparse_data.py:


==============================
Lasso on dense and sparse data
==============================

We show that linear_model.Lasso provides the same results for dense and sparse
data and that in the case of sparse data the speed is improved.

.. GENERATED FROM PYTHON SOURCE LINES 10-21

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause

    from time import time

    from scipy import linalg, sparse

    from sklearn.datasets import make_regression
    from sklearn.linear_model import Lasso








.. GENERATED FROM PYTHON SOURCE LINES 22-32

Comparing the two Lasso implementations on Dense data
-----------------------------------------------------

We create a linear regression problem that is suitable for the Lasso,
that is to say, with more features than samples. We then store the data
matrix in both dense (the usual) and sparse format, and train a Lasso on
each. We compute the runtime of both and check that they learned the
same model by computing the Euclidean norm of the difference between the
coefficients they learned. Because the data is dense, we expect better
runtime with a dense data format.

.. GENERATED FROM PYTHON SOURCE LINES 32-54

.. code-block:: Python


    X, y = make_regression(n_samples=200, n_features=5000, random_state=0)
    # create a copy of X in sparse format
    X_sp = sparse.coo_matrix(X)

    alpha = 1
    sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
    dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)

    t0 = time()
    sparse_lasso.fit(X_sp, y)
    print(f"Sparse Lasso done in {(time() - t0):.3f}s")

    t0 = time()
    dense_lasso.fit(X, y)
    print(f"Dense Lasso done in {(time() - t0):.3f}s")

    # compare the regression coefficients
    coeff_diff = linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)
    print(f"Distance between coefficients : {coeff_diff:.2e}")

    #




.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Sparse Lasso done in 0.108s
    Dense Lasso done in 0.043s
    Distance between coefficients : 5.09e-14




.. GENERATED FROM PYTHON SOURCE LINES 55-61

Comparing the two Lasso implementations on Sparse data
------------------------------------------------------

We make the previous problem sparse by replacing all small values with 0
and run the same comparisons as above. Because the data is now sparse, we
expect the implementation that uses the sparse data format to be faster.

.. GENERATED FROM PYTHON SOURCE LINES 61-89

.. code-block:: Python


    # make a copy of the previous data
    Xs = X.copy()
    # make Xs sparse by replacing the values lower than 2.5 with 0s
    Xs[Xs < 2.5] = 0.0
    # create a copy of Xs in sparse format
    Xs_sp = sparse.coo_matrix(Xs)
    Xs_sp = Xs_sp.tocsc()

    # compute the proportion of non-zero coefficient in the data matrix
    print(f"Matrix density : {(Xs_sp.nnz / float(X.size) * 100):.3f}%")

    alpha = 0.1
    sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
    dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)

    t0 = time()
    sparse_lasso.fit(Xs_sp, y)
    print(f"Sparse Lasso done in {(time() - t0):.3f}s")

    t0 = time()
    dense_lasso.fit(Xs, y)
    print(f"Dense Lasso done in  {(time() - t0):.3f}s")

    # compare the regression coefficients
    coeff_diff = linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)
    print(f"Distance between coefficients : {coeff_diff:.2e}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Matrix density : 0.626%
    Sparse Lasso done in 0.146s
    Dense Lasso done in  0.717s
    Distance between coefficients : 4.22e-13





.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.084 seconds)


.. _sphx_glr_download_auto_examples_linear_model_plot_lasso_dense_vs_sparse_data.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/1.8.X?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_lasso_dense_vs_sparse_data.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/linear_model/plot_lasso_dense_vs_sparse_data.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_lasso_dense_vs_sparse_data.ipynb <plot_lasso_dense_vs_sparse_data.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_lasso_dense_vs_sparse_data.py <plot_lasso_dense_vs_sparse_data.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_lasso_dense_vs_sparse_data.zip <plot_lasso_dense_vs_sparse_data.zip>`


.. include:: plot_lasso_dense_vs_sparse_data.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
