Now, the h2o grid search is used only for one h2o estimator to find the best hyper parameters, but in reality, maybe we need also to find the hyper parameters in the preprocessing of the data.
Like in the sklearn environment, we can build a pipeline with some transformers and a estimator. We sometimes need to find the other better hyper parameters in the transformers via the grid search. I think that it will be really good if h2o can do the same thing.
I have tried some works for this. The h2o estimator and custom transformers can be added in the sklearn pipeline with the cv and scorer rebuilt. but it can't be trained in parallel (means n_jobs can only be 1). That is really slow to train it.
Hope that you have a idea to realize it.