Kernel class

class codpy.kernel.Kernel(x=None, y=None, fx=None, max_pool: int = 1000, max_nystrom: int = 1000, reg: float = 1e-09, order: int | None = None, dim: int = 1, set_kernel: callable | None = None, **kwargs: dict)[source]

Bases: object

A class to manipulate datas for various kernel-based operations, such as interpolations or extrapolations of functions, or mapping between distributions.

Note:

This class is similar to libraries as scikit-learn or XGBoost, in the sense that they use a fit / predict pattern, with the following correspondances and differences.

Datas are loaded into memory in the contructor __init__(), or via set()
For matching distributions, use map(),
The predict function is made directly through __call__()

It implements the following methods :

In the context of functions interpolation / extrapolation
\[f_{k,\theta}(\cdot) = K(\cdot, Y) \theta, \quad \theta = K(X, Y)^{-1} f(X),\]
- $K(X, Y)$ is the Gram matrix, see Knm()
- $K(X, Y)^{-1} = (K(Y, X)K(X, Y) + \epsilon R(Y,Y))^{-1}K(Y,X)$ is computed as a least-square method with optional regularization terms, , see get_knm_inv().
For matching distributions
\[f_{k,\theta}(\cdot) = K(\cdot, Y) K(X, Y)^{-1} f(X\circ \sigma)\]
, where $\sigma$ is a permutation.
Fitting is done just-in-time (at first prediction), and means computing the parameters $\theta = K(X, Y)^{-1} f(X)$, together with $\sigma$ for distributions. The function get_theta() performs those computations and corresponds to fit in others frameworks.

__init__(x=None, y=None, fx=None, max_pool: int = 1000, max_nystrom: int = 1000, reg: float = 1e-09, order: int | None = None, dim: int = 1, set_kernel: callable | None = None, **kwargs: dict) → None[source]

Initializes the Kernel class with default or user-defined parameters.

Parameters:

x – A bi-dimensional numpy array.
fx – A bi-dimensional numpy array. If x or fx is not None, then call set()
max_pool (int, optional) – Maximum pool size for the kernel operations. Defaults to 1000.
max_nystrom (int, optional) – Maximum number of Nystrom samples. Defaults to 1000.
reg (float, optional) – Regularization parameter for kernel operations. Defaults to 1e-9.
order (int, optional) – Polynomial order for polynomial kernel functions. Defaults to None.
dim (int, optional) – Dimensionality of the input data. Defaults to 1.
set_kernel (callable, optional) – A custom kernel function initializer. If not provided, a default kernel is used.
kwargs (dict) – Additional keyword arguments for further customization.

default_kernel_functor() → callable[source]

Initialize and return a default kernel function.

This method provides a default kernel initialization. We picked up a quite simple but robust kernel functor

>>> core.kernel_setter("maternnorm", "standardmean", 0, 1e-9)

defining the maternnorm kernel with the standardmean map. It sets a polynomial order of 0 and a regularization value of 1e-9.

Returns:: The initialized default kernel function using core.kernel_setter().
Return type:: callable

Example

>>> default_kernel = kernel.default_kernel_functor()

set_custom_kernel(kernel_name: str, map_name: str, poly_order: int = 0, reg: float | None = None, bandwidth: float = 1.0, **kwargs) → None[source]

Provide a downlink to internal codpy kernel with flexible parameters.

Parameters:

kernel_name (str) – Name of the kernel function to use (e.g., 'gaussian').
map_name (str) – Name of the mapping function (e.g., 'standardmin').
poly_order (int, optional) – The polynomial order if using a polynomial kernel. Defaults to 0.
reg (float, optional) – Regularization parameter. If not provided, uses the instance’s reg value.
bandwidth (float, optional) – Bandwidth for kernel functions that require it. Defaults to 1.0.

Returns:

None

get_order(**kwargs) → int[source]

Retrieve the polynomial order for the kernel.

Returns:: The polynomial order if available, otherwise None.
Return type:: int or None

Example

>>> order = kernel.get_order()

get_polynomial_values(**kwargs) → ndarray[source]

Retrieve the predicted polynomial values based on the current input data.

This method returns the values obtained from the polynomial regression model. If the polynomial values are not yet computed, it calls _set_polynomial_regressor() to set up the polynomial regressor using the current input data x and function values fx.

Parameters:: kwargs – Additional keyword arguments for flexibility (not used directly).
Returns:: The predicted polynomial values or None if the polynomial order is not set.
Return type:: numpy.ndarray or None

Example

>>> poly_values = kernel.get_polynomial_values()

get_polynomial_regressor(z: ndarray, x: ndarray | None = None, fx: ndarray | None = None, **kwargs) → ndarray[source]

Set up the polynomial regressor based on the input data and the polynomial order.

Parameters:

z (numpy.ndarray) – New input data points for the regressor.
x (numpy.ndarray, optional) – Input data points.
fx (numpy.ndarray, optional) – Function values corresponding to the input data.

Returns:

The predicted polynomial values or None if unavailable.

Return type:

numpy.ndarray or None

Example

>>> z_data = np.random.rand(100, 10)
>>> pred = kernel.get_polynomial_regressor(z_data)

Knm(x: ndarray, y: ndarray, fy: ndarray = [], **kwargs) → ndarray[source]

Compute the kernel matrix $K(X, Y)=k(x^i, y^j)_{i,j}$, where the kernel function $k$ is defined at class initialization, see self.set_kernel.

Parameters:

x (numpy.ndarray) – Input data points $(N, D)$, where $N$ is the number of points and $D$ is the dimensionality.
y (numpy.ndarray) – Secondary data points $(M, D)$, where $M$ is the number of points and $D$ is the dimensionality.
fy (numpy.ndarray, optional) – Optional matrix values for optimization purposes. If not None, perform and return the multiplication $K(X, Y)f_y$.

Returns:

The computed kernel matrix $K$ of size $(N, M)$.

Return type:

numpy.ndarray

Example

>>> x_data = np.array([...])
>>> y_data = np.array([...])
>>> kernel_matrix = Kernel(x=x_data,y=y_data).Knm()

get_knm_inv(epsilon: float | None = None, epsilon_delta: ndarray | None = None, **kwargs) → ndarray[source]

Retrieve the inverse of the kernel matrix $K^{-1}(x, y)$ using least squares computations.

Parameters:

epsilon (float, optional) – Regularization parameter for the inverse computation. Defaults to None.
epsilon_delta (numpy.ndarray, optional) – Delta values for adjusting regularization. Defaults to None.

Returns:

The inverse kernel matrix or the product with function values if provided.

Return type:

numpy.ndarray

Note

If the regularization parameter (reg) is empty:
- If fx is empty: Returns a NumPy array of size $(N, M)$, representing the least square inverse of $K(x, y)$.
- If fx is provided: Returns the product of $K^{-1}(x, y)$ and $f(x)$. This allows performance and memory optimizations.
If the regularization parameter (reg) is provided:
- If fx is empty: Returns a NumPy array of size $(N, M)$, computed as $(K(y, x) K(x, y) + \epsilon)^{-1} K(y, x)$
- If fx is provided: Returns the product of $K^{-1}(x, y)$ and $f(x)$.

Example

>>> x_data = np.random.rand(100, 10)
>>> y_data = np.random.rand(80, 10)
>>> fx_data = np.random.rand(80, 5)
>>> inv_kernel = kernel.get_knm_inv(x=x_data, y=y_data, fx=fx_data)

get_knm(**kwargs) → ndarray[source]

Retrieve or compute the Gram matrix $K(x, y)$ for the kernel.

Returns:: The Gram matrix $K(x,y)$.
Return type:: numpy.ndarray

get_x(**kwargs) → ndarray[source]

Retrieve the input data x.

Returns:: The input data or None if not set.
Return type:: numpy.ndarray or None

set_x(x: ndarray, set_polynomial_regressor: bool = True, **kwargs) → None[source]

Set the input data x for the kernel and update related internal states.

This method sets the input data and optionally recalculates the polynomial regressor and kernel matrices.

Parameters:

x (numpy.ndarray) – Input data points to be set.
set_polynomial_regressor (bool, optional) – Whether to recalculate the polynomial regressor after setting the data. Defaults to True.

set_y(y: ndarray | None = None, **kwargs) → None[source]

Set the target data y for the kernel. If no target data is provided, y is set equal to x.

If interpolation/extrapolation is used, the following formula is applied:

\[ f_{\theta}(.) = K(., Y)\theta, \quad \theta = K(X, Y)^{-1} f(X). \]

Parameters:: y (numpy.ndarray, optional) – Target data points. If None, y is set equal to x.

get_y(**kwargs) → ndarray[source]

Retrieve the target data y.

Returns:: The target data or the input data x if y is not set.
Return type:: numpy.ndarray

get_fx(**kwargs) → ndarray[source]

Retrieve the function values fx for the input data.

Returns:: The function values or None if not set.
Return type:: numpy.ndarray or None

set_fx(fx: ndarray, set_polynomial_regressor: bool = True, **kwargs) → None[source]

Set the function values fx for the input data.

Parameters:

fx (numpy.ndarray) – Function values corresponding to the input data.
set_polynomial_regressor (bool, optional) – Whether to recalculate the polynomial regressor after setting the function values. Defaults to True.

set_theta(theta: ndarray, **kwargs) → None[source]

Set the coefficient theta for the kernel regression.

The coefficient is computed by the formula:

\[\theta = K(X, Y)^{-1} f(X)\]

Parameters:: theta (numpy.ndarray) – Coefficients for kernel regression.

get_theta(**kwargs) → ndarray[source]

Retrieve the coefficient theta for kernel regression.

If fx is not defined, the polynomial regressor is used to adjust the function values.

Returns:: The regression coefficient theta.
Return type:: numpy.ndarray

get_Delta() → ndarray[source]

Compute and retrieve the discrete Laplace-Beltrami operator Delta.

Returns:: The Laplace-Beltrami operator.
Return type:: numpy.ndarray

greedy_select(N, x=None, fx=None, all=False, norm_='frobenius', **kwargs)[source]

Select a subset of points using a greedy Nystrom approximation technique :

\[Y^{n+1} = Y^{n} \cup \arg \sup_{x \in X} d(Y^n,x),\]

to quickly approximate the clustering problem $Y = \arg \inf_{Y \subset X} d(Y,X),$ where we suppose the following structure $d(Y,X) = \sum_i d(Y,x^i)$.

The selection is typically based on norms such as the discrepancy errors for distributions, Frobenius or classifier type distances.

Parameters:

x (numpy.ndarray) – Input data points.
N (int) – The number of points to select.
fx (numpy.ndarray, optional) –
Function values corresponding to x.
- if fx is None,
  \[d(Y,X) = \frac{1}{N_X} \sum_{n=1}^{N_x} k(x^n,\cdot) - \frac{2}{N_Y} \sum_{m=1}^{N_Y} k(\cdot,y^m)\]
  This choice corresponds to minimizing the discrepancy error, see core.op.discrepancy_error().
- if fx is not None, $d(X,Y) = \|f(X)-f_{k,\theta}(X)\|$
  In which case, we are interested in adaptive mesh or control variate technics.
all (bool, optional) – If True, all points are selected. Defaults to False.
norm (str, optional) –
a string to identify the norm used for selection. Can be “frobenius” or “classifier”.
- if “frobenius”, $d(X,Y) = \|f(X)-f_{k,\theta}(X)\|_{\ell2}^2$
- if “classifier”, $d(X,Y) = \|\softmax(f(X))-\softmax(f_{k,\theta}(X))\|_{\ell_2}^2$ to account for probabilities representation.
- user-defined functions coming soon.
start_indices (list, optional) – an array of indices to set $Y^0$, otherwise the first is chosen randomly.

Returns:

Indices of the selected points.

Return type:

numpy.ndarray

set(x: ndarray | None = None, fx: ndarray | None = None, y: ndarray | None = None, **kwargs) → None[source]

Set the input data x, function values fx, and target data y for the kernel.

Parameters:

x (numpy.ndarray) – Input data points.
fx (numpy.ndarray, optional) – Function values corresponding to the input data x.
y (numpy.ndarray, optional) – Target data points. If None, y is set equal to x.

map(x: ndarray, y: ndarray, distance: str | None = None, sub: bool = False) → None[source]

Maps the input data points x to the target data points y using the kernel and optimal transport techniques.

Parameters:

x (numpy.ndarray) – Input data points ($N$, $D_{source}$).
y (numpy.ndarray) – Target data points ($M$, $D_{target}$).
distance (str, optional) – Distance metric to use in mapping. Defaults to None.
sub (bool, optional) – Whether to apply a sub-permutation. Defaults to False.

Returns:

None

Example

>>> x_data = np.array([...])  # Input data with shape (N, D_source)
>>> y_data = np.array([...])  # Target data with shape (M, D_target)
>>> kernel.map(x_data, y_data)

Note

This method computes a permutation that maps $x$ to $y$ using the Linear Sum Assignment Problem (LSAP) or a descent method.

If the dimensionalities of $x$ and $y$ are the same ($D_{source} = D_{target}$), the classical LSAP algorithm is used.

If the dimensionalities differ ($D_{source} \neq D_{target}$), a descent-based method is used to encode the data into a lower-dimensional latent space before finding the optimal permutation, following principles of discrete optimal transport.

This permutation can be used to transform the input data $x$ to approximate the target data $y$.

update_set(z: ndarray, fz: ndarray) → Tuple[ndarray, ndarray][source]

Update the training set by limiting the data to a maximum pool size.

This method trims the input data z and corresponding function values fz to the size defined by the max_pool parameter.

Parameters:

z (numpy.ndarray) – Input data points to update.
fz (numpy.ndarray) – Function values corresponding to the input data z.

Returns:

The trimmed input data points and corresponding function values, limited by max_pool.

Return type:

Tuple[numpy.ndarray, numpy.ndarray]

update(z: ndarray, fz: ndarray, eps: float | None = None, **kwargs) → None[source]

Fit the regressor to new data points (z, fz) while maintaining the existing kernel structure.

This method allows fitting a kernel-based regressor that is originally defined on the set x but is updated to match new input values z and their corresponding function values fz.

The regression is defined by the formula:

\[ f_{k, \theta}(z) \approx K(z, X)\theta = f(z) \]

Where the coefficient theta is computed as:

\[ \theta = K(z, X)^{-1}f(z) \]

Parameters:

z (numpy.ndarray) – New input data points to update the regressor.
fz (numpy.ndarray) – Function values corresponding to the new data points z.
eps (float, optional) – Regularization parameter used in the least squares solution. Defaults to self.reg if not provided.

Returns:

Updates the internal state of the regressor with new z and fz values.

Return type:

None

add(y: ndarray | None = None, fy: ndarray | None = None) → None[source]

Augments the training set by adding new data points and their corresponding function values.

This method optimizes the computation for training set augmentation by efficiently updating the kernel matrix and applying a block-inversion algorithm, which reduces the overall complexity compared to recalculating the full kernel matrix.

Parameters:

y (numpy.ndarray) – New data points to be added to the training set.
fy (numpy.ndarray) – Function values corresponding to the new data points y.

Returns:

This method updates the internal state of the class, modifying the training set with the new data points and their function values.

Return type:

None

Note

The kernel matrix $K([X,Y], [X,Y])$ is of size $\mathbb{R}^{(N_X+N_Y) \times (N_X+N_Y)}$, and directly computing its inverse has a complexity of $(N_X + N_Y)^3$.

By using the block-inversion method, the complexity can be reduced to $N_X^3 + N_Y^3$, significantly improving performance.

The function $f_{k,\theta}(.)$ is then computed as:

\[ f_{k,\theta}(.) = K(., [X,Y])\theta, \quad \theta = K([X,Y], [X,Y])^{-1} \begin{bmatrix} f(X) \; f(Y) \end{bmatrix} \]

Here, $[.]$ denotes standard matrix concatenation, where $f(X)$ and $f(Y)$ are the function values for the original and new data points, respectively.

kernel_distance(z: ndarray) → ndarray[source]

Compute a MMD-like (Maximum Mean Discrepancy) based distance matrix between the input data x and the new data z.

The distance is computed as:

\[ D(X,Z) = \Big(d_k(x^i,z^j) \Big)_{i,j},\quad d_k(x,y)= k(x,x) + k(z,z)-2k(x,z) \]

Parameters:: z (numpy.ndarray) – New input data points.
Returns:: The computed MMD-based distance matrix.
Return type:: numpy.ndarray

discrepancy(z: ndarray) → float[source]

Compute the MMD (Maximum Mean Discrepancy) between the kernel features $x$ and $z$.

Parameters:: z (numpy.ndarray) – New input data points.
Returns:: The computed MMD-based distance matrix.
Return type:: numpy.ndarray

get_kernel() → callable[source]

Retrieve the current kernel function for the input data.

This method retrieves a positive semi-definite (PSD) kernel function, represented as: $k(S(x), S(y))$, where $S$ is a predefined mapping.

Returns:: The kernel function used by the current model.
Return type:: callable

set_kernel_ptr() → None[source]

Set the Codpy interface to use the current kernel function.

This method updates the Codpy kernel interface with the current kernel function, sets the polynomial order to zero, and applies the regularization parameter defined in the object.

rescale() → None[source]

Rescale the input data using the current mapping.

This method rescales the input data by applying the map function associated with the current kernel. It also retrieves and updates the internal kernel function based on the rescaled data.

If x is set, the rescaling is applied to x with a maximum number of points defined by max_nystrom.

codpy.kernel.clip_probs(probs, min=None, max=None)[source]

class codpy.kernel.KernelClassifier(x=None, y=None, fx=None, max_pool: int = 1000, max_nystrom: int = 1000, reg: float = 1e-09, order: int | None = None, dim: int = 1, set_kernel: callable | None = None, **kwargs: dict)[source]

Bases: Kernel

A simple overload of the kernel Kernel for proabability handling.

Note:: It overloads the prediction method as follows :

$$ ext{softmax} (log(f)_{k,theta})(cdot)$$

set_fx(fx: ndarray, set_polynomial_regressor: bool = True, **kwargs) → None[source]

Set the function values fx for the input data.

Parameters:

fx (numpy.ndarray) – Function values corresponding to the input data.
set_polynomial_regressor (bool, optional) – Whether to recalculate the polynomial regressor after setting the function values. Defaults to True.

greedy_select(N, x=None, fx=None, all=False, norm_='classifier', **kwargs)[source]

Select a subset of points using a greedy Nystrom approximation technique :

\[Y^{n+1} = Y^{n} \cup \arg \sup_{x \in X} d(Y^n,x),\]

to quickly approximate the clustering problem $Y = \arg \inf_{Y \subset X} d(Y,X),$ where we suppose the following structure $d(Y,X) = \sum_i d(Y,x^i)$.

The selection is typically based on norms such as the discrepancy errors for distributions, Frobenius or classifier type distances.

Parameters:

x (numpy.ndarray) – Input data points.
N (int) – The number of points to select.
fx (numpy.ndarray, optional) –
Function values corresponding to x.
- if fx is None,
  \[d(Y,X) = \frac{1}{N_X} \sum_{n=1}^{N_x} k(x^n,\cdot) - \frac{2}{N_Y} \sum_{m=1}^{N_Y} k(\cdot,y^m)\]
  This choice corresponds to minimizing the discrepancy error, see core.op.discrepancy_error().
- if fx is not None, $d(X,Y) = \|f(X)-f_{k,\theta}(X)\|$
  In which case, we are interested in adaptive mesh or control variate technics.
all (bool, optional) – If True, all points are selected. Defaults to False.
norm (str, optional) –
a string to identify the norm used for selection. Can be “frobenius” or “classifier”.
- if “frobenius”, $d(X,Y) = \|f(X)-f_{k,\theta}(X)\|_{\ell2}^2$
- if “classifier”, $d(X,Y) = \|\softmax(f(X))-\softmax(f_{k,\theta}(X))\|_{\ell_2}^2$ to account for probabilities representation.
- user-defined functions coming soon.
start_indices (list, optional) – an array of indices to set $Y^0$, otherwise the first is chosen randomly.

Returns:

Indices of the selected points.

Return type:

numpy.ndarray

Kernel.__call__(z: ndarray) → ndarray[source]

Predict the output using the kernel for input data z.

Parameters:: z (numpy.ndarray) – Input data points for prediction.
Returns:: The predicted values based on the kernel and function values.
Return type:: numpy.ndarray

Example

>>> z_data = np.array([...])
>>> prediction = kernel(z_data)

Note

This function is similar to predict in libraries like scikit-learn or XGBoost.

If fx is defined, the prediction is given by the formula $f_{k, \theta}(z)$.
If fx is not defined, the function returns the projection operator:

\[P_{k,\theta}(z) = K(Z, K) K(X, X)^{-1}\]