skbel.algorithms

This package contains some algorithms for the SKBEL project.

skbel.algorithms.extmath

This module contains some functions for matrix operations.

skbel.algorithms.extmath.get_block(pm: array, i: int) array[source]

Extracts block from a 2x2 partitioned matrix.

Parameters:
  • pm – Partitioned matrix

  • i – Block index [[1,2], [3,4]]

Returns:

Block

skbel.algorithms.extmath.matrix_paste(c_big: array, c_small: array) list[source]

Pastes a small matrix into a big matrix.

Parameters:
  • c_big – Big matrix

  • c_small – Small matrix

skbel.algorithms.statistics

This module contains some functions for statistics, such as the Kernel Density Estimation (KDE) inference and the Multivariate Normal inference (MVN).

class skbel.algorithms.statistics.KDE(*, kernel_type: str = None, bandwidth: float = None, grid_search: bool = True, bandwidth_space: array = None, gridsize: int = 200, cut: float = 1, clip: list = None)[source]

Bases: object

Uni/Bi-variate kernel density estimator.

This class is adapted from the class of the same name in the package Seaborn 0.11.1 https://seaborn.pydata.org/generated/seaborn.kdeplot.html

__call__(x1, x2=None)[source]

Fit and evaluate on univariate or bivariate data.

__init__(*, kernel_type: str = None, bandwidth: float = None, grid_search: bool = True, bandwidth_space: array = None, gridsize: int = 200, cut: float = 1, clip: list = None)[source]

Initialize the estimator with its parameters.

Parameters:
  • kernel_type – kernel type, one of ‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’

  • bandwidth – bandwidth

  • grid_search – perform a grid search for the bandwidth

  • bandwidth_space – array of bandwidths to try

  • gridsize – number of points on each dimension of the evaluation grid.

  • cut – Factor, multiplied by the smoothing bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, truncate the curve at the data limits.

  • clip – A list of two elements, the lower and upper bounds for the support of the density. If None, the support is the range of the data.

_define_support_bivariate(x1: array, x2: array)[source]

Create a 2D grid of evaluation points.

Parameters:
  • x1 – 1st dimension of the evaluation grid

  • x2 – 2nd dimension of the evaluation grid

Returns:

2D grid of evaluation points

static _define_support_grid(x: array, bandwidth: float, cut: float, clip: list, gridsize: int)[source]

Create the grid of evaluation points depending for vector x.

Parameters:
  • x – vector of values

  • bandwidth – bandwidth

  • cut – factor, multiplied by the smoothing bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, truncate the curve at the data limits.

  • clip – pair of numbers None, or a pair of such pairs Do not evaluate the density outside of these limits.

  • gridsize – number of points on each dimension of the evaluation grid.

Returns:

evaluation grid

_define_support_univariate(x: array)[source]

Create a 1D grid of evaluation points.

Parameters:

x – 1D array of data

Returns:

1D array of evaluation points

_eval_bivariate(x1: array, x2: array)[source]

Fit and evaluate on bivariate data.

Parameters:
  • x1 – First data set.

  • x2 – Second data set.

Returns:

(density, support)

_eval_univariate(x: array)[source]

Fit and evaluate on univariate data.

Parameters:

x – Data to evaluate.

Returns:

(density, support)

_fit(fit_data: array)[source]

Fit the scikit-learn KDE.

Parameters:

fit_data – Data to fit the KDE to.

Returns:

fitted KDE object

define_support(x1: array, x2: array = None, cache: bool = True)[source]

Create the evaluation grid for a given data set.

Parameters:
  • x1 – 1D array of data

  • x2 – 2D array of data

  • cache – if True, cache the support grid

Returns:

grid of evaluation points

skbel.algorithms.statistics.it_sampling(pdf, num_samples: int = 1, lower_bd=-inf, upper_bd=inf, k: int = None, cdf_y: array = None, return_cdf: bool = False)[source]

Sample from an arbitrary, un-normalized PDF.

Parameters:
  • pdf – function, float -> float The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • num_samples – The number of samples to be generated.

  • lower_bd – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd – Upper bound of the support of the pdf.

  • k – Step number between lower_bd and upper_bd

  • cdf_y – precomputed values of the CDF

  • return_cdf – Option to return the computed CDF values

Returns:

samples: An array of samples from the provided PDF, with support between lower_bd and upper_bd.

skbel.algorithms.statistics.kde_params(x: array = None, y: array = None, bw: float = None, bandwidth_space=None, gridsize: int = 200, cut: float = 1, clip=None)[source]

Computes the kernel density estimate (KDE) of one or two data sets.

Parameters:
  • x – The x-coordinates of the input data.

  • y – The y-coordinates of the input data.

  • gridsize – Number of discrete points in the evaluation grid.

  • bw – The bandwidth of the kernel.

  • bandwidth_space – The space to search for the bandwidth.

  • cut – Draw the estimate to cut * bw from the extreme data points.

  • clip – Lower and upper bounds for datapoints used to fit KDE. Can provide a pair of (low, high) bounds for bivariate plots.

Returns:

(density: The estimated probability density function evaluated at the support, support: The support of the density function, the x-axis of the KDE.)

skbel.algorithms.statistics.mvn_inference(X: array, Y: array, X_obs: array, **kwargs) -> (<built-in function array>, <built-in function array>)[source]
Estimates the posterior mean and covariance of the target.

Note that in this implementation, n_samples must be = 1.

Parameters:
  • X – Canonical Variate of the training data

  • Y – Canonical Variate of the training target, gaussian-distributed

  • X_obs – Canonical Variate of the observation (n_samples, n_features).

Returns:

y_posterior_mean, y_posterior_covariance

skbel.algorithms.statistics.normalize(pdf)[source]

Normalize a non-normalized PDF.

Parameters:

pdf – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

Returns:

pdf_norm: Function with same signature as pdf, but normalized so that the integral between lower_bd and upper_bd is close to 1. Maps nicely over iterables.

skbel.algorithms.statistics.posterior_conditional(X_obs: float = None, Y_obs: float = None, dens: array = None, support: array = None, k: int = None) -> (<built-in function array>, <built-in function array>)[source]

Computes the posterior distribution p(y|x_obs) or p(x|y_obs) by doing a cross-section of the KDE of (d, h).

Parameters:
  • X_obs – Observation (predictor, x-axis)

  • Y_obs – Observation (target, y-axis)

  • dens – The density values of the KDE of (X, Y).

  • support – The support grid of the KDE of (X, Y).

  • k – Used to set number of rows/columns

Returns:

The posterior distribution p(y|x_obs) or p(x|y_obs) and the support grid of the cross-section.

skbel.algorithms.statistics.remove_outliers(data)[source]

Removes outliers from the data.

Parameters:

data – array-like

Returns:

data without outliers