skbel.algorithms
This package contains some algorithms for the SKBEL project.
skbel.algorithms.extmath
This module contains some functions for matrix operations.
skbel.algorithms.statistics
This module contains some functions for statistics, such as the Kernel Density Estimation (KDE) inference and the Multivariate Normal inference (MVN).
- class skbel.algorithms.statistics.KDE(*, kernel_type: str = None, bandwidth: float = None, grid_search: bool = True, bandwidth_space: array = None, gridsize: int = 200, cut: float = 1, clip: list = None)[source]
Bases:
objectUni/Bi-variate kernel density estimator.
This class is adapted from the class of the same name in the package Seaborn 0.11.1 https://seaborn.pydata.org/generated/seaborn.kdeplot.html
- __init__(*, kernel_type: str = None, bandwidth: float = None, grid_search: bool = True, bandwidth_space: array = None, gridsize: int = 200, cut: float = 1, clip: list = None)[source]
Initialize the estimator with its parameters.
- Parameters:
kernel_type – kernel type, one of ‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’
bandwidth – bandwidth
grid_search – perform a grid search for the bandwidth
bandwidth_space – array of bandwidths to try
gridsize – number of points on each dimension of the evaluation grid.
cut – Factor, multiplied by the smoothing bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, truncate the curve at the data limits.
clip – A list of two elements, the lower and upper bounds for the support of the density. If None, the support is the range of the data.
- _define_support_bivariate(x1: array, x2: array)[source]
Create a 2D grid of evaluation points.
- Parameters:
x1 – 1st dimension of the evaluation grid
x2 – 2nd dimension of the evaluation grid
- Returns:
2D grid of evaluation points
- static _define_support_grid(x: array, bandwidth: float, cut: float, clip: list, gridsize: int)[source]
Create the grid of evaluation points depending for vector x.
- Parameters:
x – vector of values
bandwidth – bandwidth
cut – factor, multiplied by the smoothing bandwidth, that determines how far the evaluation grid extends past the extreme datapoints. When set to 0, truncate the curve at the data limits.
clip – pair of numbers None, or a pair of such pairs Do not evaluate the density outside of these limits.
gridsize – number of points on each dimension of the evaluation grid.
- Returns:
evaluation grid
- _define_support_univariate(x: array)[source]
Create a 1D grid of evaluation points.
- Parameters:
x – 1D array of data
- Returns:
1D array of evaluation points
- _eval_bivariate(x1: array, x2: array)[source]
Fit and evaluate on bivariate data.
- Parameters:
x1 – First data set.
x2 – Second data set.
- Returns:
(density, support)
- _eval_univariate(x: array)[source]
Fit and evaluate on univariate data.
- Parameters:
x – Data to evaluate.
- Returns:
(density, support)
- skbel.algorithms.statistics.it_sampling(pdf, num_samples: int = 1, lower_bd=-inf, upper_bd=inf, k: int = None, cdf_y: array = None, return_cdf: bool = False)[source]
Sample from an arbitrary, un-normalized PDF.
- Parameters:
pdf – function, float -> float The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
num_samples – The number of samples to be generated.
lower_bd – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd – Upper bound of the support of the pdf.
k – Step number between lower_bd and upper_bd
cdf_y – precomputed values of the CDF
return_cdf – Option to return the computed CDF values
- Returns:
samples: An array of samples from the provided PDF, with support between lower_bd and upper_bd.
- skbel.algorithms.statistics.kde_params(x: array = None, y: array = None, bw: float = None, bandwidth_space=None, gridsize: int = 200, cut: float = 1, clip=None)[source]
Computes the kernel density estimate (KDE) of one or two data sets.
- Parameters:
x – The x-coordinates of the input data.
y – The y-coordinates of the input data.
gridsize – Number of discrete points in the evaluation grid.
bw – The bandwidth of the kernel.
bandwidth_space – The space to search for the bandwidth.
cut – Draw the estimate to cut * bw from the extreme data points.
clip – Lower and upper bounds for datapoints used to fit KDE. Can provide a pair of (low, high) bounds for bivariate plots.
- Returns:
(density: The estimated probability density function evaluated at the support, support: The support of the density function, the x-axis of the KDE.)
- skbel.algorithms.statistics.mvn_inference(X: array, Y: array, X_obs: array, **kwargs) -> (<built-in function array>, <built-in function array>)[source]
- Estimates the posterior mean and covariance of the target.
Note that in this implementation, n_samples must be = 1.
- Parameters:
X – Canonical Variate of the training data
Y – Canonical Variate of the training target, gaussian-distributed
X_obs – Canonical Variate of the observation (n_samples, n_features).
- Returns:
y_posterior_mean, y_posterior_covariance
- skbel.algorithms.statistics.normalize(pdf)[source]
Normalize a non-normalized PDF.
- Parameters:
pdf – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
- Returns:
pdf_norm: Function with same signature as pdf, but normalized so that the integral between lower_bd and upper_bd is close to 1. Maps nicely over iterables.
- skbel.algorithms.statistics.posterior_conditional(X_obs: float = None, Y_obs: float = None, dens: array = None, support: array = None, k: int = None) -> (<built-in function array>, <built-in function array>)[source]
Computes the posterior distribution p(y|x_obs) or p(x|y_obs) by doing a cross-section of the KDE of (d, h).
- Parameters:
X_obs – Observation (predictor, x-axis)
Y_obs – Observation (target, y-axis)
dens – The density values of the KDE of (X, Y).
support – The support grid of the KDE of (X, Y).
k – Used to set number of rows/columns
- Returns:
The posterior distribution p(y|x_obs) or p(x|y_obs) and the support grid of the cross-section.