SKBEL - Bayesian Evidential Learning for Python

Installation

SKBEL is available through PyPI, and may be installed using pip:

$ pip install skbel

Contents

Bayesian Evidential Learning

Introduction

This package implements the Bayesian Evidential Learning (BEL) framework. BEL is a method that combines machine learning and Monte Carlo simulations to help improve the estimation of prediction uncertainty (Hermans et al., 2016, 2018; Michel et al., 2020; Thibaut et al., 2021).

It uses a direct relationship between predictor (data) and target (prediction), learned from a training set sampled from the prior distribution, to perform the Bayesian inference (typically within a low-dimensional latent space).

Its effectiveness has been shown through extensive synthetic validation, but also against rejection sampling (Scheidt et al., 2015), Markov chain Monte Carlo (McMC) algorithms (Michel et al., 2020, 2022), field data (Hermans et al., 2019), and experimental design (Thibaut et al., 2021; 2022).

Previous studies have demonstrated that BEL can estimate the posterior distribution of parameters in a variety of contexts, including geothermal systems (Athens & Caers, 2019; Hermans et al., 2018, 2019), contaminant transport (Satija & Caers, 2015; Scheidt et al., 2015), and geophysical inversion (Hermans et al., 2016; Michel et al., 2020). Additionally, a variety of subsurface field cases, including groundwater, shallow and deep geothermal, and oil/gas predictions, have been successfully applied using the BEL framework (J. Park & Caers, 2020; Pradhan & Mukerji, 2020; Tadjer & Bratvold, 2021).

The basic idea of BEL is to find a direct relationship between d (predictor) and h (target) in a reduced dimensional space with machine learning.
Both d and h are generated by forward modelling from the same set of prior models m (Figure 1).
Given a new measured predictor d*, this relationship is used to infer the posterior probability distribution of the target, without the need for a computationally expensive inversion.
The posterior distribution of the target is then sampled and back-transformed from the reduced dimensional space to the original space to predict posterior realizations of h given d*.

Figure 1: The concept of BEL. d = predictor (observed data), h = target (parameter of interest), m = model.

Typical Workflow

Forward modeling

Examples of both d and h are generated through forward modeling from the same model m. Target and predictor are real, multi-dimensional random variables.

Pre-processing

Specific pre-processing is applied to the data if necessary (such as scaling).

Dimensionality reduction

Principal Component Analysis (PCA) is applied to both target and predictor to aggregate the correlated variables into a few independent Principal Components (PC’s).

Learning

Canonical Correlation Analysis (CCA) transforms the two sets into pairs of Canonical Variates (CV’s) independent of each other.

Post-processing

Specific post-processing is applied to the CV’s if necessary (such as CV normalization).

Posterior distribution inference

The mean μ and covariance Σ of the posterior distribution of an unknown target given an observed d* can be directly estimated from the CV’s distribution.
Alternatively, the posterior conditional distribution can be inferred through KDE or transport maps.

Sampling and back-transformation to the original space

The posterior distribution is sampled to obtain realizations of h in canonical space, successively back-transformed to the original space.

Figure 2: Typical BEL workflow. Taken from Thibaut et al. (2021).

SKBEL implementation

Here is an example blueprint of the BEL workflow implemented in SKBEL:

from skbel import BEL
from sklearn.preprocessing import StandardScaler, PowerTransformer
from sklearn.decomposition import PCA
from sklearn.cross_decomposition import CCA
from sklearn.pipeline import Pipeline

def init_bel():
    """Set all BEL pipelines.

    This is the blueprint of the framework.
    """
    # Pipeline before CCA
    X_pre_processing = Pipeline(
        [
            ("pca", PCA(n_components=.99)),
            ("scaler", StandardScaler()),
        ]
    )
    Y_pre_processing = Pipeline(
        [
            ("pca", PCA(n_components=.99)),
            ("scaler", StandardScaler()),
        ]
    )

    # Canonical Correlation Analysis
    cca = CCA(n_components=30)

    # Pipeline after CCA
    X_post_processing = Pipeline(
        [("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))]
    )
    Y_post_processing = Pipeline(
        [("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))]
    )

    # Initiate BEL object
    bel_model = BEL(
        X_pre_processing=X_pre_processing,
        X_post_processing=X_post_processing,
        Y_pre_processing=Y_pre_processing,
        Y_post_processing=Y_post_processing,
        regression_model=cca,
    )

    return bel_model

The BEL object is then fitted to the training data:

bel_model = init_bel()
bel_model.fit(X_train, Y_train)

The posterior distribution of the target can then be inferred from the predictor:

# Inference
bel_model.predict(X_test)

The posterior distribution of the target can also be sampled:

# Sampling
bel_model.random_sample(X_test, n_samples=100)

Note that the X_train and Y_train are the predictor and target, respectively, generated from the same set of prior models m. The X_test is the predictor for which the posterior distribution of the target is inferred.

The predict method in the SKBEL implementation differs slightly from the scikit-learn implementation. It determines the posterior distribution of the target given the predictor. The random_sample method returns the posterior realizations of the target given the predictor.

Contributing

Contributors and feedback from users are welcome. Don’t hesitate to submit an issue or a PR, or request a new feature.

Other resources

Hadrien Michel (University of Liège) and colleagues have been working on their own implementation of BEL called pyBEL1D. pyBEL1D is a program for the stochastic 1D imaging of the subsurface based on geophysical data (Michel et al., 2020).

About the authors

The first and main author is MSc. Ir. Robin Thibaut, who developed this package as part of his doctoral research project: A new framework for Experimental Design in Earth Sciences using Bayesian Evidential Learning (BEL4ED) at Ghent University, Department of Geology, Laboratory for Applied Geology and Hydrogeology (LTGH). His advisors are Prof. Dr. Ir. Thomas Hermans (Ghent University) and Dr. Ir. Eric Laloy (SCK CEN: Belgian Nuclear Research Centre).

The second author is Dr. Maximilian Ramgraber (Massachusetts Institute of Technology), who has implemented the transport maps algorithm in this package.

References

The list of references below contains a collection of papers or books that use or describe the BEL framework (not necessarily using SKBEL).

[1]

Robin Thibaut, Nicolas Compaire, Nolwenn Lesparre, Maximilian Ramgraber, Eric Laloy, and Thomas Hermans. Comparing well and geophysical data for temperature monitoring within a bayesian experimental design framework. Water Resources Research, 11 2022. URL: https://onlinelibrary.wiley.com/doi/10.1029/2022WR033045, doi:10.1029/2022WR033045.

[2]

Hadrien Michel, Thomas Hermans, and Frédéric Nguyen. Iterative prior resampling and rejection sampling to improve 1-d geophysical imaging based on bayesian evidential learning (bel1d). Geophysical Journal International, 232:958–974, 10 2022. URL: https://academic.oup.com/gji/article/232/2/958/6717770, doi:10.1093/gji/ggac372.

[3]

Robin Thibaut, Eric Laloy, and Thomas Hermans. A new framework for experimental design using bayesian evidential learning: the case of wellhead protection area. Journal of Hydrology, 603:126903, 12 2021. URL: https://linkinghub.elsevier.com/retrieve/pii/S0022169421009537, doi:10.1016/j.jhydrol.2021.126903.

[4]

Amine Tadjer and Reidar B. Bratvold. Managing uncertainty in geological co2 storage using bayesian evidential learning. Energies, 2021. doi:10.3390/en14061557.

[5]

Zhen Yin, Sebastien Strebelle, and Jef Caers. Automated monte carlo-based quantification and updating of geological uncertainty with borehole data (autobel v1.0). Geoscientific Model Development, 13:651–672, 10 2020. URL: https://gmd.copernicus.org/articles/13/651/2020/ https://gmd.copernicus.org/articles/13/651/2020/gmd-13-651-2020.pdf, doi:10.5194/gmd-13-651-2020.

[6]

Hadrien Michel, Frédéric Nguyen, Thomas Kremer, Ann Elen, and Thomas Hermans. 1d geological imaging of the subsurface from geophysical data with bayesian evidential learning. Computers and Geosciences, 138:104456, 5 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0098300419306028, doi:10.1016/j.cageo.2020.104456.

[7]

Anshuman Pradhan and Tapan Mukerji. Seismic bayesian evidential learning: estimation and uncertainty quantification of sub-resolution reservoir properties. Computational Geosciences, 24:1121–1140, 1 2020. URL: http://link.springer.com/10.1007/s10596-019-09929-1 https://arxiv.org/pdf/1905.05508, doi:10.1007/s10596-019-09929-1.

[8]

Thomas Kremer, Mike Müller-Petke, Hadrien Michel, Raphael Dlugosch, Trevor Irons, Thomas Hermans, and Frédéric Nguyen. Improving the accuracy of 1d surface nuclear magnetic resonance surveys using the multi-central-loop configuration. Journal of Applied Geophysics, 177:104042, 1 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0926985119306639, doi:10.1016/j.jappgeo.2020.104042.

[9]

Jihoon Park and Jef Caers. Direct forecasting of global and spatial model parameters from dynamic data. Computers and Geosciences, 143:104567, 1 2020. URL: https://linkinghub.elsevier.com/retrieve/pii/S0098300420305562, doi:10.1016/j.cageo.2020.104567.

[10]

Noah D. Athens and Jef K. Caers. A monte carlo-based framework for assessing the value of information and development risk in geothermal exploration. Applied Energy, 256:113932, 12 2019. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261919316198, doi:10.1016/j.apenergy.2019.113932.

[11]

Thomas Hermans, Nolwenn Lesparre, Guillaume De Schepper, and Tanguy Robert. Bayesian evidential learning: a field validation using push-pull tests. Hydrogeology Journal, 27:1661–1672, 2019. doi:10.1007/s10040-019-01962-9.

[12]

Thomas Hermans, Frédéric Nguyen, Maria Klepikova, Alain Dassargues, and Jef Caers. Uncertainty quantification of medium-term heat storage from short-term geophysical experiments using bayesian evidential learning. Water Resources Research, 54:2931–2948, 1 2018. URL: http://doi.wiley.com/10.1002/2017WR022135 https://biblio.ugent.be/publication/8557324/file/8562997.pdf, doi:10.1002/2017WR022135.

[13]

Thomas Hermans. Prediction-focused approaches: an opportunity for hydrology. Groundwater, 55:683–687, 1 2017. URL: http://doi.wiley.com/10.1111/gwat.12548 https://orbi.uliege.be/bitstream/2268/211530/1/Hermans_PFA_2017_preprint.pdf, doi:10.1111/gwat.12548.

[14]

Thomas Hermans, Erasmus Oware, and Jef Caers. Direct prediction of spatially and temporally varying physical properties from time-lapse electrical resistance data. Water Resources Research, 52:7262–7283, 9 2016. URL: http://doi.wiley.com/10.1002/2016WR019126, doi:10.1002/2016WR019126.

[15]

Aaditya Satija and Jef Caers. Direct forecasting of subsurface flow response from non-linear dynamic data by linear least-squares in canonical functional principal component space. Advances in Water Resources, 77:69–81, 2015. URL: http://dx.doi.org/10.1016/j.advwatres.2015.01.002, doi:10.1016/j.advwatres.2015.01.002.

[16]

Céline Scheidt, Philippe Renard, and Jef Caers. Prediction-focused subsurface modeling: investigating the need for accuracy in flow-based inverse modeling. Mathematical Geosciences, 47:173–191, 2 2015. URL: http://link.springer.com/10.1007/s11004-014-9521-6, doi:10.1007/s11004-014-9521-6.