Dynamic strategy for setting nfolds in AutoML

Description

With smaller data or when your data size to compute resources ratio is high, H2O AutoML will typically produce a better Stacked Ensemble model using cross-validation, however, for larger datasets, especially in time-constrained scenarios <1 hour, we see better results when we reduce nfolds or skip cross-validation completely and instead use a blending frame to train the Stacked Ensemble.

We need a dynamic strategy, based on data to compute ratio, for choosing the number of folds or using a blending.

Assignee

Sebastien Poirier

Fix versions

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Components

Priority

Major
Configure