

.. _sphx_glr_auto_examples_text:

.. _text_examples:

Working with text documents
----------------------------

Examples concerning the :mod:`sklearn.feature_extraction.text` module.



.. raw:: html

    <div class="sphx-glr-thumbnails">

.. thumbnail-parent-div-open

.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="This is an example showing how scikit-learn can be used to classify documents by topics using a Bag of Words approach. This example uses a Tf-idf-weighted document-term sparse matrix to encode the features and demonstrates various classifiers that can efficiently handle sparse matrices.">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_document_classification_20newsgroups_thumb.png
    :alt:

  :doc:`/auto_examples/text/plot_document_classification_20newsgroups`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">Classification of text documents using sparse features</div>
    </div>


.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="This is an example showing how the scikit-learn API can be used to cluster documents by topics using a Bag of Words approach.">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_document_clustering_thumb.png
    :alt:

  :doc:`/auto_examples/text/plot_document_clustering`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">Clustering text documents using k-means</div>
    </div>


.. raw:: html

    <div class="sphx-glr-thumbcontainer" tooltip="In this example we illustrate text vectorization, which is the process of representing non-numerical input data (such as dictionaries or text documents) as vectors of real numbers.">

.. only:: html

  .. image:: /auto_examples/text/images/thumb/sphx_glr_plot_hashing_vs_dict_vectorizer_thumb.png
    :alt:

  :doc:`/auto_examples/text/plot_hashing_vs_dict_vectorizer`

.. raw:: html

      <div class="sphx-glr-thumbnail-title">FeatureHasher and DictVectorizer Comparison</div>
    </div>


.. thumbnail-parent-div-close

.. raw:: html

    </div>


.. toctree::
   :hidden:

   /auto_examples/text/plot_document_classification_20newsgroups
   /auto_examples/text/plot_document_clustering
   /auto_examples/text/plot_hashing_vs_dict_vectorizer

