Example 1 - Wellhead Protection Area (WHPA)
All the details about the example can be found in [1], and the code in
skbel/examples/demo.py.It concerns a hydrogeological experiment consisting of predicting the wellhead protection area (WHPA) around a pumping well from measured breakthrough curves at said pumping well.
Predictor and target are generated through forward modeling from a set of hydrogeological model with different hydraulic conductivity fields (not shown).
The predictor is the set of breakthrough curves coming from 6 different injection wells around the pumping well.
The target is the WHPA.
For this example, the data is already pre-processed. We are working with 400 examples of both d and h and consider one extra pair to be predicted. See details in the reference.
The pre-print is freely available on arXiv arXiv:2105.05539
[1]:
import os
from os.path import join as jp
import demo_visualization as myvis
import numpy as np
from sklearn.cross_decomposition import CCA
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PowerTransformer, StandardScaler
from skbel.learning.bel import BEL
Load the dataset
The example dataset is saved as numpy arrays in
skbel/examples/dataset.An arbitrary choice has to be made on the number of PC to keep for the predictor and the target. In this case, they are set to 50 and 30, respectively.
The CCA operator
ccais set to keep the maximum number of CV possible (30).The variable
y_testis the unknown target to predict. It is ignored during the training.
[2]:
data_dir = jp(os.getcwd(), "dataset")
# Directory in which to unload forecasts and save figures
sub_dir = jp(os.getcwd(), "results")
X_train = np.load(jp(data_dir, "X_train.npy"))
X_test = np.load(jp(data_dir, "X_test.npy"))
y_train = np.load(jp(data_dir, "y_train.npy"))
y_test = np.load(jp(data_dir, "y_test.npy"))
Let’s have a look at the dataset.
Predictor
The first figure is the whole training set and the test set in thick lines. The latter is highlighted on the second figure.
[3]:
myvis.plot_predictor(X=X_train, X_obs=X_test, base_dir=sub_dir, show=True)
Target
[4]:
myvis.plot_target(Y=y_train, Y_obs=y_test, base_dir=sub_dir, show=True)
Building the BEL model
In this package, a BEL model consists of a succession of Pipelines (imported from scikit-learn).
The
X_pre_processingandY_pre_processingobjects are pipelines which will first scale the data for predictor and target, then apply the dimension reduction through PCA.The
X_post_processingandY_post_processingobjects are pipelines which will normalize predictor and target CV’s.Finally, the BEL model is constructed by passing as arguments all these pipelines in the
BELobject.
[5]:
# Pipeline before CCA
X_pre_processing = Pipeline(
[
("scaler", StandardScaler(with_mean=False)),
("pca", PCA(n_components=50)),
]
)
Y_pre_processing = Pipeline(
[
("scaler", StandardScaler(with_mean=False)),
("pca", PCA(n_components=30)),
]
)
# Canonical Correlation Analysis
cca = CCA(n_components=30)
# Pipeline after CCA
X_post_processing = Pipeline(
[("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))]
)
Y_post_processing = Pipeline(
[("normalizer", PowerTransformer(method="yeo-johnson", standardize=True))]
)
# Initiate BEL object
bel_model = BEL(
X_pre_processing=X_pre_processing,
X_post_processing=X_post_processing,
Y_pre_processing=Y_pre_processing,
Y_post_processing=Y_post_processing,
regression_model=cca,
)
Set model parameters
[6]:
bel_model.mode = "mvn" # How to compute the posterior conditional distribution
# Save original dimensions of both predictor and target
bel_model.X_shape = (6, 200) # Six curves with 200 time steps each
bel_model.Y_shape = (100, 87) # 100 rows and 87 columns
# Number of samples to be extracted from the posterior distribution
bel_model.n_posts = 400
Train the model
[7]:
# Fit BEL model
bel_model.fit(X=X_train, Y=y_train);
Infer the posterior distribution
Now that the BEL model is fitted, we can sample WHPA from the inferred posterior distribution using the chosen method (MVN, KDE or transport maps). The sampling occurs in canonical space, so we have to inverse transform the samples and reshape them.
[8]:
%%capture
# The posterior distribution is computed within the method below.
forecast_posterior = bel_model.predict(X_test.reshape(1, -1))
forecast_posterior = forecast_posterior.reshape((-1, 100, 87)) # Reshape to original dimension
Let’s visualise the results:
[9]:
myvis.plot_posterior(
forecast_posterior=forecast_posterior, Y=y_train, Y_obs=y_test, base_dir=sub_dir, show=True
)