.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gallery_examples\MNIST.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_gallery_examples_MNIST.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gallery_examples_MNIST.py:


MNIST Examples
==========================

We illustrate some basic considerations while manipulating the class :class:`codpy.kernel.Kernel` classes , applying it to the `MNIST <https://en.wikipedia.org/wiki/MNIST_database>`_ problem.
It illustrates the following interpolation / extrapolation method

    $$f_{k,\theta}(\cdot) = K(\cdot, Y) \theta, \quad \theta = K(X, Y)^{-1} f(X),$$
where
    - $K(X, Y)$ is the Gram matrix, see :func:`codpy.kernel.Kernel.Knm`
    - $K(X, Y)^{-1} = (K(Y, X)K(X, Y))^{-1}K(Y,X)$ is computed as a least-square method without regularization terms, see :func:`codpy.kernel.Kernel.get_knm_inv`.

This notebook illustrates various choices+ for the set $Y$

.. GENERATED FROM PYTHON SOURCE LINES 15-38

.. code-block:: Python


    import os
    import random

    import numpy as np
    import pandas as pd

    # A multi scale kernel method.
    from sklearn.metrics import confusion_matrix

    import codpy.core as core
    from codpy.clustering import MiniBatchkmeans

    # We use a custom hot encoder for performances reasons.
    from codpy.data_processing import hot_encoder

    # Standard codpy kernel class.
    from codpy.kernel import KernelClassifier

    os.environ["OPENBLAS_NUM_THREADS"] = "32"
    os.environ["OMP_NUM_THREADS"] = "32"


.. GENERATED FROM PYTHON SOURCE LINES 39-48

We pick-up MNIST data using tensorflow (tf). pip install tf on your installation prior running this notebook !

Note :

           - The MNIST corresponds to features $X\in \mathbb{R}^{60000,784}$. Each image, represented as $28\times 28$ black and white pixel is a feature described as a vector in dimension $D=784$.

           - We hot encode the MNIST classes : $f(X) \in \mathbb{R}^{60000,10}$ is defined as
                       $$f(x) = (\delta_i(c(x)), \quad i=1,\ldots,10,$$
              where $c(x)$ is the indice of the label of the class, and $\delta_i(j) = \{i==j\}$.

.. GENERATED FROM PYTHON SOURCE LINES 48-71

.. code-block:: Python

    def get_MNIST_data(N=-1):
        import tensorflow as tf

        (x, fx), (z, fz) = tf.keras.datasets.mnist.load_data()
        x, z = x / 255.0, z / 255.0
        x, z, fx, fz = (
            x.reshape(len(x), -1),
            z.reshape(len(z), -1),
            fx.reshape(len(fx), -1),
            fz.reshape(len(fz), -1),
        )
        fx, fz = (
            hot_encoder(pd.DataFrame(data=fx), cat_cols_include=[0], sort_columns=True),
            hot_encoder(pd.DataFrame(data=fz), cat_cols_include=[0], sort_columns=True),
        )
        x, fx, z, fz = (x, fx.values, z, fz.values)
        if N != -1:
            indices = random.sample(range(x.shape[0]), N)
            x, fx = x[indices], fx[indices]

        return x, fx, z, fz


.. GENERATED FROM PYTHON SOURCE LINES 72-73

We perform basic tests on MNIST results : confusion matrix and scores.

.. GENERATED FROM PYTHON SOURCE LINES 73-86

.. code-block:: Python


    def show_confusion_matrix(z, fz, predictor=None, cm=True):
        f_z = predictor(z)
        fz, f_z = fz.argmax(axis=-1), f_z.argmax(axis=-1)
        out = confusion_matrix(fz, f_z)
        if cm:
            print("confusion matrix:")
            print(out)
        print("score MNIST:", np.trace(out) / np.sum(out))
        pass


.. GENERATED FROM PYTHON SOURCE LINES 87-88

Run codpy silently on/off.

.. GENERATED FROM PYTHON SOURCE LINES 88-90

.. code-block:: Python

    core.kernel_interface.set_verbose(False)


.. GENERATED FROM PYTHON SOURCE LINES 91-94

Set variables and pick MNIST data for the test.
N_MNIST_pics is used to pick a smaller set than the original one.
The training set is `x,fx`, the test set is `z,fz`.

.. GENERATED FROM PYTHON SOURCE LINES 94-98

.. code-block:: Python

    N_clusters = 100
    N_MNIST_pics = 5000
    x, fx, z, fz = get_MNIST_data(N_MNIST_pics)


.. GENERATED FROM PYTHON SOURCE LINES 99-100

First pick $Y$ at random. Output confusion matrix for two sets : the training set $X$ and the test set $Z$

.. GENERATED FROM PYTHON SOURCE LINES 100-109

.. code-block:: Python

    indices = np.random.choice(range(x.shape[0]), size=N_clusters)
    y, fy = x[indices], fx[indices]
    predictor = KernelClassifier(x=x, y=y, fx=fx)
    print("Output with the training set - reproductibility test:")
    show_confusion_matrix(x, fx, predictor, cm=False)
    print("Output with the test set :")
    show_confusion_matrix(z, fz, predictor)
    print("Discrepancy(x,y):", predictor.discrepancy(y))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Output with the training set - reproductibility test:
    score MNIST: 0.8732
    Output with the test set :
    confusion matrix:
    [[ 958    0    0    0    0    5    8    1    8    0]
     [   0 1088    7    3    1    2    4    1   29    0]
     [  15   18  879   26   19    0   11   26   37    1]
     [  11    2   39  864    1   23    8   17   34   11]
     [   1    9    6    1  847    4   16    1   11   86]
     [  24    9    8   55   18  707   20   16   28    7]
     [  20    5    9    0   19   20  875    1    9    0]
     [   4   28   22    4   20    0    2  910    6   32]
     [   5    7   11   66   10   21   18    9  812   15]
     [  16    9    6   12   94    7    1   22   11  831]]
    score MNIST: 0.8771
    Discrepancy(x,y): 0.006751273339628827


.. GENERATED FROM PYTHON SOURCE LINES 110-111

Select $Y$ having a lowest discrepancy with a greedy algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 111-120

.. code-block:: Python

    predictor = KernelClassifier(x=x, fx=fx).greedy_select(
        N=N_clusters, all=True, start_indices={indices[0]}
    )
    print("Reproductibility test:")
    show_confusion_matrix(x, fx, predictor, cm=False)
    print("Performance test:")
    show_confusion_matrix(z, fz, predictor, cm=False)
    print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y()))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Reproductibility test:
    score MNIST: 0.8782
    Performance test:
    score MNIST: 0.8827
    Discrepancy(x,y): 0.005728688575252161


.. GENERATED FROM PYTHON SOURCE LINES 121-122

Select $Y$ adapted to $f(x)$ using a greedy algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 122-130

.. code-block:: Python

    predictor = KernelClassifier(x=x, fx=fx).greedy_select(N=N_clusters, fx=fx, all=True)
    print("Reproductibility test:")
    show_confusion_matrix(x, fx, predictor, cm=False)
    print("Performance test:")
    show_confusion_matrix(z, fz, predictor, cm=False)
    print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y()))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Reproductibility test:
    score MNIST: 0.8976
    Performance test:
    score MNIST: 0.891
    Discrepancy(x,y): 0.008942885551873503


.. GENERATED FROM PYTHON SOURCE LINES 131-132

Select $Y$ using a k-means algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 132-139

.. code-block:: Python

    y = MiniBatchkmeans(x, N=N_clusters).cluster_centers_
    predictor = KernelClassifier(x=x, y=y, fx=fx)
    print("Reproductibility test:")
    show_confusion_matrix(x, fx, predictor, cm=False)
    print("Performance test:")
    show_confusion_matrix(z, fz, predictor, cm=False)
    print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y()))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Reproductibility test:
    score MNIST: 0.9156
    Performance test:
    score MNIST: 0.9181
    Discrepancy(x,y): 0.05833591511492753


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 28.764 seconds)


.. _sphx_glr_download_gallery_examples_MNIST.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: MNIST.ipynb <MNIST.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: MNIST.py <MNIST.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: MNIST.zip <MNIST.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_