Add multiclass support to Stacking

Description

Support mutliclass response columns in the Stacked Ensemble method. We should support several types of multiclass stacking: Stacking, StackingC and possibly sMM5

StackingC differs from regular Stacking in these points: for each linear model associated with a specific class, only the partial class probability distribution which deals with this very class is used during training and testing. While Stacking uses probabilities for all classes and from all component classifiers for each linear model, StackingC uses only the class probabilities associated with the class which we want our linear model to predict.

Dˇzeroski and Zenko (2002) investigate Stacking in the extension proposed by Ting & Witten (1999). They introduce a new variant sMM5 which they claim to be in a league of its own. Their new variant is quite competitive to StackingC but much slower, according to unpublished experiments on our twenty-six datasets. However, combining both ideas does not improve performance.

More info in the attached paper.

Environment

None

Status

Assignee

Navdeep

Fix versions

None

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Release Priority

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Epic Link

Components

Priority

Major