Add direct support for text data in H2O AutoML using Word2Vec

Description

We should be able to take a 2-column H2OFrame as training data for AutoML where one column is text and one column is a label. From here, we can train a Word2Vec model, then transform the text into vectors and then proceed with the typical AutoML process.

I have to think about how we would store the W2V model in the AutoML object for future use, but there's probably a reasonable way to make this work.

Assignee

Michal Kurka

Fix versions

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Epic Link

Components

Priority

Major
Configure