Expose Decision Tree as a stand-alone algo in H2O

Description

We should decide whether we want to make a new REST endpoint for this (e.g. /3/ModelBuilders/decisiontree) or if we want to simply wrap the Random Forest or GBM function at the R/Py client level. However, if it's done in H2O core, then it will show up in Flow as well, which is nice.

You can get a single decision tree by setting the following args in RF:

  • ntrees = 1

  • mtries = # of features (would be determined dynamically at runtime)

  • sample_rate = 1
    Note: min_rows defaults to 1 and max_depth to 20

Or, you can get a single decision tree by setting the following args in GBM (maybe this is better because it skips the data-dependent mtries arg):

  • ntrees = 1

  • min_rows = 1

  • sample_rate = 1

  • col_sample_rate = 1
    Note: min_rows defaults to 10 and max_depth to 5

I'm wondering if we also want to set max_depth to something large so that we don't apply regularization by default...

Related: https://0xdata.atlassian.net/browse/PUBDEV-4007

Assignee

Navdeep

Fix versions

None

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Epic Link

Components

Priority

Major
Configure