Disable auto-partitioning of validation data in AutoML

Description

AutoML should not create validation data since it is not fully utilized (only used if GLM is the metalearner where it activates lambda search). The validation_frame argument should also be removed in the case where a user has their own validation data because this is misleading.

This auto-partitioning was originally intended to make use of the validation frame for early stopping, but H2O is using CV performance to evaluate the early stopping criteria for the models and grids (there is currently no way to switch this to validation frame via the API). The intent was to use validation frame for early stopping so that the CV metrics reported on the leaderboard would be "honest". Removing the auto-splitting of the validation frame should improve performance because it will result in 10% more training data and most of the time will outweigh the benefits of using a validation frame for lambda search in the GLM metalearner in the Stacked Ensemble model.

Status

Assignee

Erin LeDell

Fix versions

Reporter

Megan Kurka

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Components

Affects versions

Priority

Major
Configure