Implement Blending for Stacked Ensembles

Description

This is the version of stacking where you don't use cv-preds to train the metalearner, but instead you score the base models on a holdout set and use those predicted values instead.

I'm not sure yet whether this should go into the existing Stacked Ensemble class, or if we should create a new one specifically for this case. The resulting model is the same though, so it should probably use Stacked Ensemble (with relaxed restrictions on the input models).

There are two main motivations here:

  • This is faster than cross-validating the base learners (though these ensembles may not perform as well as the Super Learner ensemble).

  • Adds the ability to train stacked ensembles on time-series data (where holdout data is "future" data compared to "past" data in training set).

Once we add this, we can add support for this in AutoML as well.

Assignee

Sebastien Poirier

Fix versions

Reporter

Erin LeDell

Support ticket URL

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

AffectedCustomers

Epic Link

Components

Priority

Major
Configure