AutoML: for XGBoost hyperparam search, tune learning rate starting from best XGB model

Description

Learning rate hyper-param search is currently disabled (hardcoded to 0.05) as a lower learning rate doesn't improve perf but costs a lot in computation time.
We should however be able to tune learn_rate starting from the best XGB model found (hardcoded + grid search):

  • decrease learn_rate: [0.05], 0.01, 0.005, 0.001

  • increase stopping_rounds in proportion: [5], 25, 50, 100 (or more?)

see internal discussion: https://h2oai.slack.com/archives/C0EU04LD7/p1550615935104800?thread_ts=1550529008.068800&cid=C0EU04LD7

 

Solution implemented

The AutoML training is now split into 3 phases:

  • exploration phase: equivalent to the legacy AutoML training prior to v3.30. AutoML trains some predefined default models and does some hyperparameter searches (excluding learning rate) using random grid search.
    One notable difference with the previous versions is that now the time budget (if provided through max_runtime_secs params or using the 1h default if no max_models is also provided) is distributed not only between grids, but also between default models, which should allow to build more models for larger datasets (even if they won’t have all converged if time allocation is too small).

  • exploitation phase: this is the new experimental phase that currently consists in fine tuning the best GBM and the best XGB. This is disabled by default, but can be easily enabled from clients by setting the AutoML exploitation_ratio parameter to a positive value.
    If using max_models without time constraint, then the exploitation with always apply after exploration with no time limitation.
    If using max_runtime_secs then the exploitation will apply and receive (approximately) a time budget in proportion (for example, is max_runtime_secs=3600 and exploitation_ratio=0.2, then approximately 12min are dedicated to exploitation).
    For each algo, if the fine-tuned model is improved compared with the previous best for this algo, then it replaces it in the leaderboard (i.e the old one is removed).
    User could start with a recommended exploitation_ratio=0.1 and increase it if wanting to dedicate more time to learn_rate tuning (more time = more likely to explorate lower learning_rates)

  • stacked ensemble: same as in previous versions. The SE will stack all models from the leaderboard, including the ones trained during the exploitation if they were good enough to be included in this leaderboard.

Assignee

Sebastien Poirier

Fix versions

Reporter

Sebastien Poirier

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Components

Priority

Critical
Configure