DRF/GBM can't handle columns with 60% values

Description

Repro:

1) Import hdfs://mr-0xd6/datasets/1Mx2.2k.NAs.csv

Some columns have 60% or 80% missing values, the 60% missing values are not ignored by default.

2) Run DRF (same for GBM):

http://mr-0xd1:53322/2/DRF.query?destination_key=&source=X1Mx2_2k.NAs.hex&response=response&ignored_cols=63%2C238%2C296%2C818%2C1035%2C1441%2C1628%2C2124&classification=1&validation=&n_folds=0&holdout_fraction=0.0&keep_cross_validation_splits=0&ntrees=1&max_depth=5&min_rows=1&nbins=20&score_each_iteration=0&importance=1&balance_classes=0&max_after_balance_size=Infinity&checkpoint=&overwrite_checkpoint=1&mtries=-1&sample_rate=0.6666666865348816&seed=-1&build_tree_one_node=0

00:33:55.954 FJ-0-127 ERRR WATER: Got exception 'class java.lang.AssertionError', with msg 'Caller ensures Infinity>Infinity, since if max==min== the column C396 is all constants'
+ java.lang.AssertionError: Caller ensures Infinity>Infinity, since if max==min== the column C396 is all constants
+ at hex.gbm.DHistogram.<init>(DHistogram.java:78)
+ at hex.gbm.DBinomHistogram.<init>(DBinomHistogram.java:20)
+ at hex.gbm.DHistogram.make(DHistogram.java:200)
+ at hex.gbm.DHistogram.initialHist(DHistogram.java:193)
+ at hex.drf.DRF.buildNextKTrees(DRF.java:442)
+ at hex.drf.DRF.buildModel(DRF.java:270)
+ at hex.drf.DRF.buildModel(DRF.java:33)
+ at hex.gbm.SharedTreeModelBuilder.buildModel(SharedTreeModelBuilder.java:276)
+ at hex.drf.DRF.execImpl(DRF.java:192)
+ at water.Func.exec(Func.java:42)
+ at water.Job$3.compute2(Job.java:334)
+ at water.H2O$H2OCountedCompleter.compute(H2O.java:653)
+ at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
+ at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
+ at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
+ at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
+ at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Assignee

Cliff Click

Reporter

Arno Candel

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure