Open issues

Target Encoder is not invariant to the order of encoded columns
PUBDEV-6846
Can we use OAuth with H2O Flow UI and if so then how it works any example
PUBDEV-6710
Installing on Anaconda distribution
PUBDEV-5240
XGBoost: XGBoostModel.score(munged_bnpparibas_test_data) fails
PUBDEV-4999
XGBoost: "NCCL failure :cuda malloc failed" memory allocation crash on munged BNPParibas
PUBDEV-4998
XGBoost: realloc() memory allocation crash on munged BNPParibas
PUBDEV-4997
AutoML: stopping_rounds isn't making it all the way down to some of the models, leading to overfitting
PUBDEV-4983
Save gramMatrix to a csv file in hex.pca.daal.PCA_DAAL_SVD_DenseBatch)
PUBDEV-4960
H2O failing start-up on Apache Hadoop 3.0.0-alpha3
PUBDEV-4658
Records from a pandas data frame got duplicated when importing into H2Oframe
PUBDEV-4609
R error 3.10.3.3: Error: water.exceptions.H2OModelBuilderIllegalArgumentException:
PUBDEV-4025
npm package not starting
PUBDEV-3798
ParseFile ignores the check_header=1 setting and treats header as a data row
PUBDEV-3242
Parameterize build system by port for JUNIT tests
PUBDEV-2932
Cannot save Flow notebook
PUBDEV-2921
Cannot parse some files in the client mode
PUBDEV-2640
glm multinomial fails with nfold or fold column cross validation
PUBDEV-2498
Update resources webpage w/ new booklets
PUBDEV-2137
AutoML: terrible predictions from SE on MNIST
PUBDEV-6874
XGBoost Grid Search Failing With Assertion Error Exception
PUBDEV-6764
Project OOM when creating an h2o frame poc
PUBDEV-6614
H2O Flow does not show my .hex file in Split Frame section
PUBDEV-6318
H2O Flow does not show my .hex file in Split Frame section
PUBDEV-6317
3.22.1.2 upgrade: h2o.ls() now produces java.lang.ArrayIndexOutOfBoundsException: 71
PUBDEV-6239
XGBoost - booster type gblinear is causing the crash in the JVM
PUBDEV-5955
Verify immutability logic is not working
PUBDEV-5945
We can create categorical vector from vector of numbers but we end up with cardinality = 0
PUBDEV-5940
Ability to interact directly with HCatalog/Hive/Sentry for Data Ingestion
PUBDEV-5644
Bad split on categorical variable in GBM and DRF affecting model quality
PUBDEV-5516
Apache Hadoop 2.9.0
PUBDEV-5482
Support for Particle Swarm Optimization for hyper-parameter tuning
PUBDEV-5480
Move ModelMetricsListSchemaV3 to water.api.schemas3
PUBDEV-5152
The h2o.importFolder is not importing the full dataset
PUBDEV-5051
GBM early stopping is computed on training set logloss
PUBDEV-5041
H2OFrame in Python is adding additional duplicate rows to the Pandas DataFrame
PUBDEV-4806
Loading 3-line svmlight file appears to put h2o in infinite loop
PUBDEV-4798
H2O Build fails on Windows and Clean Ubuntu Linux
PUBDEV-4031
Documentation of Java API
PUBDEV-3997
Expose munging commands to Java api layer
PUBDEV-3996
Expose one-hot encoding to H2OFrame operations
PUBDEV-3955
Improving situation with unseen categories or junk data in numerical column
PUBDEV-3904
rectangular assignment from a frame using boolean condition
PUBDEV-3776
grid search results in deep learning
PUBDEV-3708
Grid search overwrite_with_best_model false
PUBDEV-3707
Visualize DRF/GBM ensembles in H2O
PUBDEV-3674
H2O APIs should support JSON body POST
PUBDEV-3383
Two-stage prcomp cannot be aborted
PUBDEV-3323
Memory leak in H2O (standalone cluster)
PUBDEV-3203
Merge needs public Java API
PUBDEV-2992
H2O stops accessing HDFS after Kerberos ticket is renewed
PUBDEV-2533
issue 1 of 2079

Target Encoder is not invariant to the order of encoded columns

Description

When the order of encoded columns given to a TargetEncoder constructor is different, the results in the encoded columns are vastly diffent (the delta is really big and the difference is huge on the first decimal place).

Is this expected behavior ?

1 TargetEncoder tec = new TargetEncoder(new String[]{ "embarked", "home.dest"});

gives different result than

1 TargetEncoder tec = new TargetEncoder(new String[]{"home.dest", "embarked"});

The encoding map is the same, it's the applyTargetEncoding that makes all the difference. This behavior has been observed with KFold leakage handling strategy enabled. Blending disabled.

Environment

None

Status

Assignee

New H2O Bugs

Fix versions

None

Reporter

Pavel Pscheidl

Support ticket URL

None

Labels

None

Release Priority

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Priority

Blocker