.. skbel documentation master file, created by sphinx-quickstart on Sat May 29 18:15:24 2021. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. SKBEL - Bayesian Evidential Learning for Python =============================================== .. currentmodule:: SKBEL Installation ------------ SKBEL is available through `PyPI `_, and may be installed using ``pip``: .. code:: sh $ pip install skbel Contents ----------------- .. toctree:: :maxdepth: 2 modules examples Bayesian Evidential Learning ---------------------------- Introduction ............. This package implements the Bayesian Evidential Learning (BEL) framework. BEL is a method that combines machine learning and Monte Carlo simulations to help improve the estimation of prediction uncertainty (Hermans et al., 2016, 2018; Michel et al., 2020; Thibaut et al., 2021). It uses a direct relationship between predictor (data) and target (prediction), learned from a training set sampled from the prior distribution, to perform the Bayesian inference (typically within a low-dimensional latent space). Its effectiveness has been shown through extensive synthetic validation, but also against rejection sampling (Scheidt et al., 2015), Markov chain Monte Carlo (McMC) algorithms (Michel et al., 2020, 2022), field data (Hermans et al., 2019), and experimental design (Thibaut et al., 2021; 2022). Previous studies have demonstrated that BEL can estimate the posterior distribution of parameters in a variety of contexts, including geothermal systems (Athens & Caers, 2019; Hermans et al., 2018, 2019), contaminant transport (Satija & Caers, 2015; Scheidt et al., 2015), and geophysical inversion (Hermans et al., 2016; Michel et al., 2020). Additionally, a variety of subsurface field cases, including groundwater, shallow and deep geothermal, and oil/gas predictions, have been successfully applied using the BEL framework (J. Park & Caers, 2020; Pradhan & Mukerji, 2020; Tadjer & Bratvold, 2021). - The basic idea of BEL is to find a direct relationship between `d` (predictor) and `h` (target) in a reduced dimensional space with machine learning. - Both `d` and `h` are generated by forward modelling from the same set of prior models `m` (Figure 1). - Given a new measured predictor `d*`, this relationship is used to infer the posterior probability distribution of the target, without the need for a computationally expensive inversion. - The posterior distribution of the target is then sampled and back-transformed from the reduced dimensional space to the original space to predict posterior realizations of `h` given `d*`. .. image:: /img/evidential.png :width: 800 Figure 1: The concept of BEL. d = predictor (observed data), h = target (parameter of interest), m = model. Typical Workflow ................. Forward modeling ++++++++++++++++ - Examples of both `d` and `h` are generated through forward modeling from the same model `m`. Target and predictor are real, multi-dimensional random variables. Pre-processing ++++++++++++++++ - Specific pre-processing is applied to the data if necessary (such as scaling). Dimensionality reduction +++++++++++++++++++++++++ - Principal Component Analysis (PCA) is applied to both target and predictor to aggregate the correlated variables into a few independent Principal Components (PC’s). Learning ++++++++++++++++ - Canonical Correlation Analysis (CCA) transforms the two sets into pairs of Canonical Variates (CV’s) independent of each other. Post-processing ++++++++++++++++ - Specific post-processing is applied to the CV's if necessary (such as CV normalization). Posterior distribution inference +++++++++++++++++++++++++++++++++++ - The mean `μ` and covariance `Σ` of the posterior distribution of an unknown target given an observed `d*` can be directly estimated from the CV's distribution. - Alternatively, the posterior conditional distribution can be inferred through KDE or transport maps. Sampling and back-transformation to the original space +++++++++++++++++++++++++++++++++++++++++++++++++++++++ - The posterior distribution is sampled to obtain realizations of `h` in canonical space, successively back-transformed to the original space. .. image:: /img/flow-01.png :width: 800 Figure 2: Typical BEL workflow. Taken from Thibaut et al. (2021). SKBEL implementation +++++++++++++++++++++ Here is an example blueprint of the BEL workflow implemented in SKBEL: .. code:: python from skbel import BEL from sklearn.preprocessing import StandardScaler, PowerTransformer from sklearn.decomposition import PCA from sklearn.cross_decomposition import CCA from sklearn.pipeline import Pipeline def init_bel(): """Set all BEL pipelines. This is the blueprint of the framework. """ # Pipeline before CCA X_pre_processing = Pipeline( [ ("pca", PCA(n_components=.99)), ("scaler", StandardScaler()), ] ) Y_pre_processing = Pipeline( [ ("pca", PCA(n_components=.99)), ("scaler", StandardScaler()), ] ) # Canonical Correlation Analysis cca = CCA(n_components=30) # Pipeline after CCA X_post_processing = Pipeline( [("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))] ) Y_post_processing = Pipeline( [("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))] ) # Initiate BEL object bel_model = BEL( X_pre_processing=X_pre_processing, X_post_processing=X_post_processing, Y_pre_processing=Y_pre_processing, Y_post_processing=Y_post_processing, regression_model=cca, ) return bel_model The BEL object is then fitted to the training data: .. code:: python bel_model = init_bel() bel_model.fit(X_train, Y_train) The posterior distribution of the target can then be inferred from the predictor: .. code:: python # Inference bel_model.predict(X_test) The posterior distribution of the target can also be sampled: .. code:: python # Sampling bel_model.random_sample(X_test, n_samples=100) Note that the `X_train` and `Y_train` are the predictor and target, respectively, generated from the same set of prior models `m`. The `X_test` is the predictor for which the posterior distribution of the target is inferred. The `predict` method in the SKBEL implementation differs slightly from the scikit-learn implementation. It determines the posterior distribution of the target given the predictor. The `random_sample` method returns the posterior realizations of the target given the predictor. Contributing ------------ Contributors and feedback from users are welcome. Don't hesitate to submit an issue or a PR, or request a new feature. Other resources ----------------- Hadrien Michel (University of Liège) and colleagues have been working on their own implementation of BEL called `pyBEL1D `_. pyBEL1D is a program for the stochastic 1D imaging of the subsurface based on geophysical data `(Michel et al., 2020) `_. About the authors ----------------- The first and main author is MSc. Ir. `Robin Thibaut `_, who developed this package as part of his doctoral research project: `A new framework for Experimental Design in Earth Sciences using Bayesian Evidential Learning (BEL4ED) `_ at Ghent University, Department of Geology, `Laboratory for Applied Geology and Hydrogeology (LTGH) `_. His advisors are Prof. Dr. Ir. `Thomas Hermans `_ (Ghent University) and Dr. Ir. `Eric Laloy `_ (SCK CEN: Belgian Nuclear Research Centre). The second author is Dr. `Maximilian Ramgraber `_ (Massachusetts Institute of Technology), who has implemented the transport maps algorithm in this package. References ============= The list of references below contains a collection of papers or books that use or describe the BEL framework (not necessarily using SKBEL). .. bibliography:: :all: Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`