Uploaded image for project: 'Public H2O 3'
  1. PUBDEV-6079

Disable auto-partitioning of validation data in AutoML

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.22.0.1
    • Fix Version/s: 3.22.0.3
    • Component/s: AutoML
    • Labels:
      None
    • CustomerVisible:
      No
    • Sprint:

      Description

      AutoML should not create validation data since it is not fully utilized (only used if GLM is the metalearner where it activates lambda search). The validation_frame argument should also be removed in the case where a user has their own validation data because this is misleading.

      This auto-partitioning was originally intended to make use of the validation frame for early stopping, but H2O is using CV performance to evaluate the early stopping criteria for the models and grids (there is currently no way to switch this to validation frame via the API). The intent was to use validation frame for early stopping so that the CV metrics reported on the leaderboard would be "honest". Removing the auto-splitting of the validation frame should improve performance because it will result in 10% more training data and most of the time will outweigh the benefits of using a validation frame for lambda search in the GLM metalearner in the Stacked Ensemble model.

        Attachments

          Activity

            People

            • Assignee:
              erin Erin LeDell
              Reporter:
              megank Megan Kurka
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: