There's some strong evidence that as datasets grow in size, it's better to switch over to using a blending frame for training the metalearner in Stacked Ensembles instead of 5-fold CV (which is what we use by default on all datasets).
On a benchmark of the HIGGS dataset, we compare blending, 3-fold, 5-fold of 1 hour to 5-fold for 4 hours. Here we see that with 1M rows, we can beat a 1-hour blending frame by running the default 5-fold for longer (4 hours) – though it’s still obviously better to use less time (and hence the blending frame here). At 10M rows, blending for 1 hour is still giving better results than default 5-fold CV for 4 hours…. which means there is really no reason we should be doing CV at this point. These results use a separate test set for leaderboard scoring (the AUCs you see on the plot).
We will have to do some more benchmarking on this because if we switch over to using a 10% (or some other fraction) blending frame for datasets of a certain "size" or "size in relation to compute resources", then we don’t get the CV metrics for the leaderboard, so we will have to chop off another piece of data just for the leaderboard scoring, which could be ok if there's "enough" data, but we need to be careful about doing this properly.