Uploaded image for project: 'Public H2O 3'
  1. PUBDEV-2899

Add support for arbitrary internal CV folds in h2o.stack


    • Type: New Feature
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: h2oEnsemble, R
    • Labels:
    • CustomerVisible:
    • Sprint:


      Right now in `h2o.stack`, we can only use models that were cross-validated using Modulo folds. Add support for arbitrary (user defined) folds. https://github.com/h2oai/h2o-3/blob/master/h2o-r/ensemble/h2oEnsemble-package/R/stack.R#L62

      This block of code requires "Modulo" folds to have been used. To open up to other fold schemes, we should extend this check to the following:

      • First check if Modulo is used in all models, if so, we are okay
      • If Modulo was not used in all the models, check if the folds were saved by setting `keep_cross_validation_fold_assignment = TRUE` in all models, and if so, check that the folds are identical across models.
      • If there is no way to tell that the models were cross-validated using the same folds, then allow the user to still train the model, but print a warning message that says the following:
        "h2o.stack was not able to determine that the models were all cross-validated using the same exact folds.  User assumes responsibility for ensuring that folds are identical (if not, the stack will not perform well).  Please check the performance of the ensemble using the h2o.ensemble_performance function to ensure that your ensemble is performing adequately."

      Related ticket: https://0xdata.atlassian.net/browse/PUBDEV-2901




            • Assignee:
              erin Erin LeDell
              erin Erin LeDell
            • Votes:
              0 Vote for this issue
              1 Start watching this issue


              • Created: