AutoML: offer the possibility to specify the order in which training steps will be executed


After discussing Epsilon's needs with and regarding case 94685, we decided for now to provide the possibility for AutoML to specify the order in which training steps will be executed.
This can be done at higher/coarse-grained level (order of default algos, default grids):

  • XGB_defaults, GBM_defaults, … XGB_grid, …

Or at a more fine-grained level (order of each hardcoded model):

  • XGB_default_1, XGB_default_2, …., GBM_def_1, ….


The suggested parameter name for this specification is modeling_plan.

Here is the suggested JSON representation to specify those steps in an ordered way:

Unfortunately, JSON doesn’t guarantee conservation of object keys so we can’t use a JSON object for this but have to use only arrays.

The semantic of the example above goes as follow:

  • starts with XGBoost algorithm, but only hardcoded models with ids def_1, def_2, def_3 in the given order.

  • then train all the GLM models (default models and/or grids), followed by all DRF models (using alias all in the latter case).

  • then train all the default GBM models (using alias defaults to avoid typing all the model ids explicitly).

  • then train all the XRT models

  • then train XGBoost step with id grid_1 (probably a grid…)

  • then train all the GBM grids (using alias grids to avoid listing them explicitly).

  • then train the StackedEnsemble models with ids best and all in this order.

  • DeepLearning algo hasn’t been mentioned in this example, so it will be skipped.

If an algo or a model id (e.g. def_3) is present in this order specification but the id doesn’t exist anymore in the new AutoML version, then it will be ignored with a warning message.

The representation is also easily extensible: we can add new algos, new default models, new grids, new hyperparameter search methods…

If user also specifies exclude_algos parameter, this one will apply on top of the order specification: this allows user to keep this specification in one variable, without having to change it later. For example exclude_algos=[“XRT“]in combination with modeling_plan=the_example_above will execute the steps defined in the example except XRT. Same thing if using include_algos instead.

After running AutoML, the detailed modeling_steps specification (with all step ids) will be available from the automl instance so that the user can save it for later use.


Python representation examples (can use list or tuples):

And an equivalent representation in R:



Sebastien Poirier

Fix versions


Sebastien Poirier

Support ticket URL



Affected Spark version


Customer Request Type


Task progress