3.24.0.4 Word2Vec Training on multi-cluster node (size 8) produces ArrayIndexOutOfBoundsException while building the HuffmanBinaryWordTree

Description

I have a multi-node cluster with 8 nodes.
I have tokenized some texts and exported the tokenized frame using export_file on HDFS.

After, I have imported the file using import_file and started training a H2OWord2vecEstimator model:

This training frequently produces an ArrayIndexOutOfBoundsException:

The bug is not always reproducible and I couldn´t reproduce it on a standalone cluster.

Assignee

Michal Kurka

Fix versions

Reporter

Gustavo Henrique Orair

Support ticket URL

None

Labels

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

Yes

Components

Affects versions

Priority

Major
Configure