From 3ac1f3aee810ec7fda497faa286a4407d0950120 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1=C5=A1=20Fr=C3=BDda?= <tomas.fryda@h2o.ai>
Date: Fri, 15 Sep 2023 18:01:23 +0200
Subject: [PATCH] Add minimal documentation

---
 .../product/performance-and-prediction.rst    | 139 ++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/h2o-docs/src/product/performance-and-prediction.rst b/h2o-docs/src/product/performance-and-prediction.rst
index a06dbf4e8d43..234e35f104f6 100644
--- a/h2o-docs/src/product/performance-and-prediction.rst
+++ b/h2o-docs/src/product/performance-and-prediction.rst
@@ -2659,6 +2659,9 @@ Using the previous example, run the following to predict the leaf node assignmen
 Predict Contributions
 ~~~~~~~~~~~~~~~~~~~~~
 
+
+Path Dependent TreeSHAP
+'''''''''''''''''''''''
 In H2O-3, each returned H2OFrame has the specific shape of *#rows, #features + 1*. This includes a feature contribution column for each input feature. The last column of this table is the model bias (the column is the same for each row). The sum of the feature contributions and the bias term is equal to the raw prediction of the model. Raw prediction of a tree-based model is the sum of the predictions of the individual trees prior to applying the inverse link function to retrieve the actual prediction. For Gaussian distribution, the sum of the contributions is equal to the model prediction.
 
 We support TreeSHAP for DRF, GBM, and XGBoost. For these problems, ``predict_contributions`` returns a new H2OFrame with the predicted feature contributions as SHAP (SHapley Additive exPlanation) values on an H2O model.
@@ -2747,6 +2750,142 @@ Retrieving graphs via R is not yet supported. An `.ipynb demo showing this examp
         shap.summary_plot(shap_values, X, plot_type="bar")
 
 
+Interventional / Marginal SHAP
+''''''''''''''''''''''''''''''
+
+**Supported Algos:** DeepLearning, DRF, GBM, GLM, StackedEnsemble, XGBoost
+
+**NOTE:** DeepLearning and StackedEnsemble support only approximate SHAP values using DeepSHAP with multiple baselines, and G-DeepSHAP respectively. 
+
+This method of SHAP calculation requires a background dataset and is now often called marginal SHAP to avoid confusion with causal approaches to calculate Shapley values (this method assumes flat causal graph). 
+Marginal SHAP is calculated as average baseline SHAP using each row from the background dataset as a baseline.
+
+**Baseline SHAP**
+In the baseline SHAP, the contribution for feature i (:math:`\phi_i`) can be imagined to be calculated by taking the powerset :math:`Q` of all features that differ
+between the explained point and the baseline except the feature i and for each subset from that powerset creating an artificial data points where
+features from the subset take the value from the reference point and the rest of the features take the value from
+the baseline point and then taking a difference of model evaluation on this point with feature i value from :math:`x^{(e)}` and :math:`x^{(b)}`.
+
+Let :math:`x^{(e)}` be the point for which we are calculating the contributions, 
+:math:`x^{(b)}` be the baseline point,
+:math:`h(x)` be the predictive function (the model),
+:math:`Q` be the powerset of indexes of features that differ between :math:`x^{(e)}` and :math:`x^{(b)}`,
+:math:`d` be the number of features that differ between between :math:`x^{(e)}` and :math:`x^{(b)}`,
+and :math:`r_i(x^{(e)}, x^{(b)}, S) = \begin{cases} x^{(e)}_i \quad \text{if}\ i \in S\\ x^{(b)}_i \quad \text{otherwise}\end{cases}`.
+
+.. math::
+    \phi_i(h, x^{(e)}, x^{(b)}) = \sum_{S \in Q} \frac{|S|! (d-|S| - 1)!}{d!} (h(r(x^{(e)}, x^{(b)}, S \cup \{i\}))-h(r(x^{(e)}, x^{(b)}, S)))
+
+The sum of contributions equals to the difference between a reference point and a baseline.
+
+.. math::
+     h(x^{(e)}) - h(x^{(b)}) = \sum_{i \in \text{feature}} \phi_i(h, x^{(e)}, x^{(b)}))
+
+In the path dependent TreeSHAP, there is another column called `BiasTerm` that is added to the sum of the contributions to get the 
+prediction :math:`h(x) = \text{BiasTerm} + \sum_{i=1}^d \phi_i`, in the baseline SHAP this `BiasTerm` is equal to :math:`h(x^{(b)})`.
+
+
+To get baseline SHAP, specify ``output_per_reference=True`` in the ``predict_contributions`` method.
+The result from ``predict_contributions`` method is enriched with ``RowIdx`` and ``BackgroundRowIdx`` columns so the contributions can be easily mapped to
+the explained frame and the background frame.
+
+**NOTE:** Using baseline SHAP produces a frame with the number of rows equal to number of rows of the explained frame times number of rows of the background frame.
+
+.. tabs::
+   .. code-tab:: r R
+  
+          # Compute Baseline SHAP, this enables looking at the contributions against different baselines, e.g., older males in this case
+          h2o.predict_contributions(model, prostate_test, background_frame=prostate_train[prostate_train$AGE > 70, ], output_per_reference = TRUE)
+
+                 RACE      DPROS      DCAPS        PSA        VOL      GLEASON BiasTerm RowIdx BackgroundRowIdx
+          1 0.0000000  0.6040196  0.1489700 0.08277533  0.0000000 -0.007383927 67.90027      0                0
+          2 0.3102472  0.0000000  0.0000000 1.10074197 -0.4141512 -0.086468272 67.81828      0                1
+          3 0.0000000  0.6251030  0.1074789 0.02255024  0.0000000 -0.052744012 68.02626      0                2
+          4 0.0000000  0.0000000 -0.1381250 0.42088614  0.0000000 -0.090785108 68.53667      0                3
+          5 0.0000000  1.3762679 -0.1959183 0.34541795  0.0000000 -0.343634803 67.54651      0                4
+          6 0.0000000 -0.4059084  0.0000000 0.18515649  0.0000000 -0.021399001 68.97080      0                5
+
+          [37620 rows x 9 columns] 
+          
+
+   .. code-tab:: python
+
+          # Compute Baseline SHAP, this enables looking at the contributions against different baselines, e.g., older males in this case
+          model.predict_contributions(prostate_test, background_frame=prostate_train[prostate_train["AGE"] > 70, :], output_per_reference = True)
+
+                 AGE       RACE        DPROS      DCAPS         PSA         VOL     GLEASON    BiasTerm    RowIdx    BackgroundRowIdx
+          ----------  ---------  -----------  ---------  ----------  ----------  ----------  ----------  --------  ------------------
+           0          0           0           0          -0.0406704   0          -0.0159229   0.624277          0                   0
+          -0.0265084  0.115568    0.0921354   0.0521548  -0.132599    0.0959191   0.115393    0.255621          0                   1
+           0          0          -0.00649863  0          -0.302238    0           0.0977098   0.77871           0                   2
+           0          0          -0.216909    0          -0.244149    0          -0.0233896   1.05213           0                   3
+           0.0294289  0          -0.0168443   0.0320507   0.0251351   0           0.52408    -0.0261672         0                   4
+          -0.0017294  0.0546553   0.122834    0.0234999   0.0147228   0           0.469046   -0.115345          0                   5
+          -0.0209216  0           0           0.0562942  -0.0875752  -0.184241   -0.0225276   0.826654          0                   6
+          -0.0285146  0           0.364188    0.0636343  -0.0719393  -0.0384352   0.0726108   0.20614           0                   7
+          -0.0673656  0           0.184373    0.0412827  -0.325982   -0.0931228   0.132599    0.695899          0                   8
+           0.0108649  0           0.225565    0.0221987  -0.112281    0           0.44203    -0.020694          0                   9
+
+          [37620 rows x 9 columns] 
+
+**Marginal SHAP**
+
+In the marginal SHAP, the contribution is calculated as :math:`\phi_i(h, x^{(e)}) = \frac{1}{|D|} \sum_{x^{(b)} \in D} \phi(h, x^{(e)}, x^{(b)})`, where :math:`D` is the background dataset.
+
+Using marginal SHAP enables to look for contributions against different baselines, e.g., different subpopulations. 
+
+.. tabs::
+   .. code-tab:: r R
+  
+          # Compute Marginal SHAP, this enables looking at the contributions against different baselines, e.g., older males in this case
+          h2o.predict_contributions(model, prostate_test, background_frame=prostate_train[prostate_train$AGE > 70, ])
+
+                   RACE      DPROS       DCAPS         PSA         VOL      GLEASON BiasTerm
+          1  0.01221071  0.1442253 -0.02368839  0.28334598 -0.27244853 -0.050375786 68.63538
+          2  0.01554721 -0.4885786  0.06818504  0.07093928 -0.40644674  0.005242058 68.63538
+          3  0.01619808  0.6368542  0.10928919  0.12617361 -0.43928950 -0.176276680 68.63538
+          4 -0.37412145  0.1527038 -0.02231085 -0.62897581  0.02538746  0.030216405 68.63538
+          5  0.02351126  0.6573612 -0.06684027 -0.01427528  0.66239009 -0.141467870 68.63538
+          6  0.01748617 -0.5110773  0.06850011  0.08616166 -0.37861648  0.108426911 68.63538
+          
+          [380 rows x 7 columns]
+
+
+          # Plot SHAP summary plot:
+          h2o.shap_summary_plot(model, prostate_test, background_frame=prostate_train[prostate_train$AGE > 70, ])
+
+          # Plot SHAP contributions for one instance (e.g., row 5):
+          h2o.shap_explain_row_plot(model, prostate_test, row_index = 5, background_frame=prostate_train[prostate_train$AGE > 70, ])
+
+
+   .. code-tab:: python
+
+          # Compute Marginal SHAP, this enables looking at the contributions against different baselines, e.g., older males in this case
+          model.predict_contributions(prostate_test, background_frame=prostate_train[prostate_train["AGE"] > 70, :])
+
+                   AGE         RACE       DPROS        DCAPS          PSA          VOL     GLEASON    BiasTerm
+          ------------  -----------  ----------  -----------  -----------  -----------  ----------  ----------
+          -0.00390396    0.00461162   0.0812151   0.0354627   -0.0880568    0.00987528   0.16727      0.361209
+           0.12961      -0.0974719    0.238144    0.0301782    0.0750851    0.0786154    0.0820873    0.361209
+           0.000218997   0.00607607  -0.165538   -0.00164295   0.110438    -0.00369223  -0.162637     0.361209
+          -0.0211114     0.00352612   0.0746521  -0.00214469  -0.166035    -0.15557     -0.199869     0.361209
+          -0.0165421     0.00456433  -0.0524966  -0.00352489  -0.087945    -0.00809267   0.0317739    0.361209
+           0.00363843    0.00609676   0.230796   -0.00115063  -0.0965648    0.0831312    0.144009     0.361209
+          -0.00072942    0.00471497  -0.127639    0.030633     0.145419     0.0442545    0.310656     0.361209
+           0.0762746     0.0039464   -0.0143102  -0.00549353   0.0590267   -0.0355882    0.0992434    0.361209
+          -0.023101      0.0024461    0.225077   -0.0036679   -0.00969025   0.0832767    0.161513     0.361209
+           0.0215677     0.00496158   0.100158    0.0296085    0.120057    -0.173633     0.217662     0.361209
+          
+          [380 rows x 7 columns]
+
+          # Plot SHAP summary plot:
+          model.shap_summary_plot(prostate_test, background_frame=prostate_train[prostate_train["AGE"] > 70, :])
+
+          # Plot SHAP contributions for one instance (e.g., row 5):
+          model.shap_explain_row_plot(prostate_test, row_index=5, background_frame=prostate_train[prostate_train["AGE"] > 70, :])
+
+
+
 Predict Stage Probabilities
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~