I'm working on a script that is supposed to explain the results of an XGBoost ML-model. The model contains, among others, one feature with a large number of categorical levels (approx. 2000). I noticed that when I sum the contributions generated by <model>.predict_contributions(), the outcome largely differs from the actual predictions from my model (i.e. <model>.predict()).
According to your website, "the sum of the feature contributions and the bias term is equal to the raw prediction of the model. Raw prediction of tree-based model is the sum of the predictions of the individual trees before the inverse link function is applied to get the actual prediction. For Gaussian distribution, the sum of the contributions is equal to the model prediction." [ref] I did not specify the distribution of the model, and the default for regression is Gaussian. [ref] Thus, the sum of the feature contributions + bias term should be equal to the model prediction.
I've attached a small piece of stand-alone example code that mimics the behaviour I'm observing, but to a lesser extend in magnitude (i.e. the differences between prediction and predict_contribution().sum() is smaller.)
Python version: 3.6.9
H2O version: 18.104.22.168/22.214.171.124