XGBoost: XGBoostModel.score(munged_bnpparibas_test_data) fails

Description

It seems that XGBoost.score() doesn't handle unlabeled test sets. I successfully built an XGBoost model on autodl-munged BNPParibas and it crashed when I called model.predict() on the test set:

INFO: POST /4/Predictions/models/XGBoost_grid_0_AutoML_20171011_152320_model_7/frames/test_munged1.hex
WARN: Test/Validation dataset is missing column 'target': substituting in a column of NaN
...
OSError: Job with key $0301ac1002c634d4ffffffff$_967c15560a64477424e09eadc12a42d4 failed with an exception: java.lang.IllegalArgumentException: Domain must have 2 class labels, b\
ut is [] for binomial metrics.
stacktrace:
java.lang.IllegalArgumentException: Domain must have 2 class labels, but is [] for binomial metrics.
at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:92)
at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:71)
at hex.tree.xgboost.XGBoostModel.makePreds(XGBoostModel.java:351)
at hex.tree.xgboost.XGBoostModel.makeMetrics(XGBoostModel.java:301)
at hex.tree.xgboost.XGBoostModel.score(XGBoostModel.java:462)

See the repro scripts in the directory specified here:

https://0xdata.atlassian.net/browse/PUBDEV-4997

Run single_xgboost.py to build the model. If it's successful, run xval_leaderboard.py to load the test set and run model.predict().

Assignee

Michal Kurka

Fix versions

None

Reporter

Raymond Peck

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Components

Priority

Blocker
Configure