Consider the following scenario: There is an autoencoder model (mojo model) trained with 20 features. Now, I want to predict a new input row which has only 19 features, i.e. one feature in unknown thus the feature value is missing. Then the resulting reconstruction MSE is NaN.
However, when using H2O flow everything works as expected. The reconstruction MSE is not NaN for such a case.
When prediction an autoencoder model the result is the reconstruction error, which is basically just the MSE between the reconstructed features and the original features of the given row.
Currently, this computation of the MSE does not work for input rows which have some empty numerical input features when using the .EasyPredictModelWrapper.
However, the reconstructed features are calculated corretly. The problem arises when then the difference between the orginal input and row and the reconstructed features is calculated. The "original" input row is not imputed for missing values, i.e. NaN values are kept but no mathematical operation can be done.
To fix this bug, there should be a mean imputation (or whatever missing value handling is defined in the mojo model) for missing values in the "original" input row. Then the reconstruction error can be calculated consistently.