{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Example 1 - Wellhead Protection Area (WHPA)\n", "- All the details about the example can be found in [1], and the code in `skbel/examples/demo.py`.\n", "- It concerns a hydrogeological experiment consisting of predicting the wellhead protection area (WHPA) around a pumping well from measured breakthrough curves at said pumping well.\n", "- Predictor and target are generated through forward modeling from a set of hydrogeological model with different hydraulic conductivity fields (not shown).\n", "- The predictor is the set of breakthrough curves coming from 6 different injection wells around the pumping well.\n", "- The target is the WHPA.\n", "\n", "For this example, the data is already pre-processed. We are working with 400 examples of both `d` and `h` and consider one extra pair to be predicted. See details in the reference.\n", "\n", "[1] [Thibaut, R., Laloy, E., Hermans, T., 2021. A new framework for experimental design using Bayesian Evidential Learning: the case of wellhead protection area. Journal of Hydrology](https://doi.org/10.1016/j.jhydrol.2021.126903)\n", "\n", "The pre-print is freely available on arXiv [arXiv:2105.05539](https://arxiv.org/abs/2105.05539)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import os\n", "from os.path import join as jp\n", "\n", "import demo_visualization as myvis\n", "import numpy as np\n", "from sklearn.cross_decomposition import CCA\n", "from sklearn.decomposition import PCA\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import PowerTransformer, StandardScaler\n", "\n", "from skbel.learning.bel import BEL" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "#### Load the dataset\n", "- The example dataset is saved as numpy arrays in `skbel/examples/dataset`.\n", "- An arbitrary choice has to be made on the number of PC to keep for the predictor and the target. In this case, they are set to 50 and 30, respectively.\n", "- The CCA operator `cca` is set to keep the maximum number of CV possible (30).\n", "- The variable `y_test` is the unknown target to predict. It is ignored during the training." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "pycharm": { "name": "#%% Load dataset\n" } }, "outputs": [], "source": [ "data_dir = jp(os.getcwd(), \"dataset\")\n", "# Directory in which to unload forecasts and save figures\n", "sub_dir = jp(os.getcwd(), \"results\")\n", "\n", "X_train = np.load(jp(data_dir, \"X_train.npy\"))\n", "X_test = np.load(jp(data_dir, \"X_test.npy\"))\n", "y_train = np.load(jp(data_dir, \"y_train.npy\"))\n", "y_test = np.load(jp(data_dir, \"y_test.npy\"))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's have a look at the dataset." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "##### Predictor\n", "The first figure is the whole training set and the test set in thick lines. The latter is highlighted on the second figure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "myvis.plot_predictor(X=X_train, X_obs=X_test, base_dir=sub_dir, show=True)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "##### Target" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "myvis.plot_target(Y=y_train, Y_obs=y_test, base_dir=sub_dir, show=True)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "#### Building the BEL model\n", "In this package, a BEL model consists of a succession of Pipelines (imported from scikit-learn).\n", "\n", "- The ```X_pre_processing``` and ```Y_pre_processing``` objects are pipelines which will first scale the data for predictor and target, then apply the dimension reduction through PCA.\n", "\n", "- The ```X_post_processing``` and ```Y_post_processing``` objects are pipelines which will normalize predictor and target CV's.\n", "\n", "- Finally, the BEL model is constructed by passing as arguments all these pipelines in the `BEL` object." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Pipeline before CCA\n", "X_pre_processing = Pipeline(\n", " [\n", " (\"scaler\", StandardScaler(with_mean=False)),\n", " (\"pca\", PCA(n_components=50)),\n", " ]\n", ")\n", "Y_pre_processing = Pipeline(\n", " [\n", " (\"scaler\", StandardScaler(with_mean=False)),\n", " (\"pca\", PCA(n_components=30)),\n", " ]\n", ")\n", "\n", "# Canonical Correlation Analysis\n", "cca = CCA(n_components=30)\n", "\n", "# Pipeline after CCA\n", "X_post_processing = Pipeline(\n", " [(\"normalizer\", PowerTransformer(method=\"yeo-johnson\", standardize=True))]\n", ")\n", "Y_post_processing = Pipeline(\n", " [(\"normalizer\", PowerTransformer(method=\"yeo-johnson\", standardize=True))]\n", ")\n", "\n", "# Initiate BEL object\n", "bel_model = BEL(\n", " X_pre_processing=X_pre_processing,\n", " X_post_processing=X_post_processing,\n", " Y_pre_processing=Y_pre_processing,\n", " Y_post_processing=Y_post_processing,\n", " regression_model=cca,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "#### Set model parameters" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "pycharm": { "name": "#%% Set model parameters\n" } }, "outputs": [], "source": [ "bel_model.mode = \"mvn\" # How to compute the posterior conditional distribution\n", "# Save original dimensions of both predictor and target\n", "bel_model.X_shape = (6, 200) # Six curves with 200 time steps each\n", "bel_model.Y_shape = (100, 87) # 100 rows and 87 columns\n", "# Number of samples to be extracted from the posterior distribution\n", "bel_model.n_posts = 400" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "#### Train the model" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "pycharm": { "name": "#%% Train the model\n" } }, "outputs": [], "source": [ "# Fit BEL model\n", "bel_model.fit(X=X_train, Y=y_train);" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "#### Infer the posterior distribution" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Now that the BEL model is fitted, we can sample WHPA from the inferred posterior distribution using the chosen method (MVN, KDE or transport maps).\n", "The sampling occurs in canonical space, so we have to inverse transform the samples and reshape them." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "pycharm": { "name": "#%% Sample for the observation\n" } }, "outputs": [], "source": [ "%%capture\n", "# The posterior distribution is computed within the method below.\n", "forecast_posterior = bel_model.predict(X_test.reshape(1, -1))\n", "forecast_posterior = forecast_posterior.reshape((-1, 100, 87)) # Reshape to original dimension" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Let's visualise the results:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "myvis.plot_posterior(\n", " forecast_posterior=forecast_posterior, Y=y_train, Y_obs=y_test, base_dir=sub_dir, show=True\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }