.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery_examples\MNIST.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_examples_MNIST.py: MNIST Examples ========================== We illustrate some basic considerations while manipulating the class :class:`codpy.kernel.Kernel` classes , applying it to the `MNIST `_ problem. It illustrates the following interpolation / extrapolation method $$f_{k,\theta}(\cdot) = K(\cdot, Y) \theta, \quad \theta = K(X, Y)^{-1} f(X),$$ where - $K(X, Y)$ is the Gram matrix, see :func:`codpy.kernel.Kernel.Knm` - $K(X, Y)^{-1} = (K(Y, X)K(X, Y))^{-1}K(Y,X)$ is computed as a least-square method without regularization terms, see :func:`codpy.kernel.Kernel.get_knm_inv`. This notebook illustrates various choices+ for the set $Y$ .. GENERATED FROM PYTHON SOURCE LINES 15-38 .. code-block:: Python import os import random import numpy as np import pandas as pd # A multi scale kernel method. from sklearn.metrics import confusion_matrix import codpy.core as core from codpy.clustering import MiniBatchkmeans # We use a custom hot encoder for performances reasons. from codpy.data_processing import hot_encoder # Standard codpy kernel class. from codpy.kernel import KernelClassifier os.environ["OPENBLAS_NUM_THREADS"] = "32" os.environ["OMP_NUM_THREADS"] = "32" .. GENERATED FROM PYTHON SOURCE LINES 39-48 We pick-up MNIST data using tensorflow (tf). pip install tf on your installation prior running this notebook ! Note : - The MNIST corresponds to features $X\in \mathbb{R}^{60000,784}$. Each image, represented as $28\times 28$ black and white pixel is a feature described as a vector in dimension $D=784$. - We hot encode the MNIST classes : $f(X) \in \mathbb{R}^{60000,10}$ is defined as $$f(x) = (\delta_i(c(x)), \quad i=1,\ldots,10,$$ where $c(x)$ is the indice of the label of the class, and $\delta_i(j) = \{i==j\}$. .. GENERATED FROM PYTHON SOURCE LINES 48-71 .. code-block:: Python def get_MNIST_data(N=-1): import tensorflow as tf (x, fx), (z, fz) = tf.keras.datasets.mnist.load_data() x, z = x / 255.0, z / 255.0 x, z, fx, fz = ( x.reshape(len(x), -1), z.reshape(len(z), -1), fx.reshape(len(fx), -1), fz.reshape(len(fz), -1), ) fx, fz = ( hot_encoder(pd.DataFrame(data=fx), cat_cols_include=[0], sort_columns=True), hot_encoder(pd.DataFrame(data=fz), cat_cols_include=[0], sort_columns=True), ) x, fx, z, fz = (x, fx.values, z, fz.values) if N != -1: indices = random.sample(range(x.shape[0]), N) x, fx = x[indices], fx[indices] return x, fx, z, fz .. GENERATED FROM PYTHON SOURCE LINES 72-73 We perform basic tests on MNIST results : confusion matrix and scores. .. GENERATED FROM PYTHON SOURCE LINES 73-86 .. code-block:: Python def show_confusion_matrix(z, fz, predictor=None, cm=True): f_z = predictor(z) fz, f_z = fz.argmax(axis=-1), f_z.argmax(axis=-1) out = confusion_matrix(fz, f_z) if cm: print("confusion matrix:") print(out) print("score MNIST:", np.trace(out) / np.sum(out)) pass .. GENERATED FROM PYTHON SOURCE LINES 87-88 Run codpy silently on/off. .. GENERATED FROM PYTHON SOURCE LINES 88-90 .. code-block:: Python core.kernel_interface.set_verbose(False) .. GENERATED FROM PYTHON SOURCE LINES 91-94 Set variables and pick MNIST data for the test. N_MNIST_pics is used to pick a smaller set than the original one. The training set is `x,fx`, the test set is `z,fz`. .. GENERATED FROM PYTHON SOURCE LINES 94-98 .. code-block:: Python N_clusters = 100 N_MNIST_pics = 5000 x, fx, z, fz = get_MNIST_data(N_MNIST_pics) .. GENERATED FROM PYTHON SOURCE LINES 99-100 First pick $Y$ at random. Output confusion matrix for two sets : the training set $X$ and the test set $Z$ .. GENERATED FROM PYTHON SOURCE LINES 100-109 .. code-block:: Python indices = np.random.choice(range(x.shape[0]), size=N_clusters) y, fy = x[indices], fx[indices] predictor = KernelClassifier(x=x, y=y, fx=fx) print("Output with the training set - reproductibility test:") show_confusion_matrix(x, fx, predictor, cm=False) print("Output with the test set :") show_confusion_matrix(z, fz, predictor) print("Discrepancy(x,y):", predictor.discrepancy(y)) .. rst-class:: sphx-glr-script-out .. code-block:: none Output with the training set - reproductibility test: score MNIST: 0.8732 Output with the test set : confusion matrix: [[ 958 0 0 0 0 5 8 1 8 0] [ 0 1088 7 3 1 2 4 1 29 0] [ 15 18 879 26 19 0 11 26 37 1] [ 11 2 39 864 1 23 8 17 34 11] [ 1 9 6 1 847 4 16 1 11 86] [ 24 9 8 55 18 707 20 16 28 7] [ 20 5 9 0 19 20 875 1 9 0] [ 4 28 22 4 20 0 2 910 6 32] [ 5 7 11 66 10 21 18 9 812 15] [ 16 9 6 12 94 7 1 22 11 831]] score MNIST: 0.8771 Discrepancy(x,y): 0.006751273339628827 .. GENERATED FROM PYTHON SOURCE LINES 110-111 Select $Y$ having a lowest discrepancy with a greedy algorithm. .. GENERATED FROM PYTHON SOURCE LINES 111-120 .. code-block:: Python predictor = KernelClassifier(x=x, fx=fx).greedy_select( N=N_clusters, all=True, start_indices={indices[0]} ) print("Reproductibility test:") show_confusion_matrix(x, fx, predictor, cm=False) print("Performance test:") show_confusion_matrix(z, fz, predictor, cm=False) print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y())) .. rst-class:: sphx-glr-script-out .. code-block:: none Reproductibility test: score MNIST: 0.8782 Performance test: score MNIST: 0.8827 Discrepancy(x,y): 0.005728688575252161 .. GENERATED FROM PYTHON SOURCE LINES 121-122 Select $Y$ adapted to $f(x)$ using a greedy algorithm. .. GENERATED FROM PYTHON SOURCE LINES 122-130 .. code-block:: Python predictor = KernelClassifier(x=x, fx=fx).greedy_select(N=N_clusters, fx=fx, all=True) print("Reproductibility test:") show_confusion_matrix(x, fx, predictor, cm=False) print("Performance test:") show_confusion_matrix(z, fz, predictor, cm=False) print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y())) .. rst-class:: sphx-glr-script-out .. code-block:: none Reproductibility test: score MNIST: 0.8976 Performance test: score MNIST: 0.891 Discrepancy(x,y): 0.008942885551873503 .. GENERATED FROM PYTHON SOURCE LINES 131-132 Select $Y$ using a k-means algorithm. .. GENERATED FROM PYTHON SOURCE LINES 132-139 .. code-block:: Python y = MiniBatchkmeans(x, N=N_clusters).cluster_centers_ predictor = KernelClassifier(x=x, y=y, fx=fx) print("Reproductibility test:") show_confusion_matrix(x, fx, predictor, cm=False) print("Performance test:") show_confusion_matrix(z, fz, predictor, cm=False) print("Discrepancy(x,y):", predictor.discrepancy(predictor.get_y())) .. rst-class:: sphx-glr-script-out .. code-block:: none Reproductibility test: score MNIST: 0.9156 Performance test: score MNIST: 0.9181 Discrepancy(x,y): 0.05833591511492753 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 28.764 seconds) .. _sphx_glr_download_gallery_examples_MNIST.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: MNIST.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: MNIST.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: MNIST.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_