Intermittent failure in creating H2O cloud

Description

Though I get a cloud eventually, it still does fail now and then.
So I'll make a habit out of reporting the stack traces, so you know it still has rough edges.

The incantation code:

from pysparkling import *
conf = (H2OConf(sc)
.use_auto_cluster_start()
.set_yarn_queue("spark-analytics")
.set_num_of_external_h2o_nodes(8)
.set_mapper_xmx("10G")
)

context = H2OContext.getOrCreate(sc, conf)

Many times this works, but today I got:

Py4JJavaError: An error occurred while calling z:org.apache.spark.h2o.JavaH2OContext.getOrCreate.
: java.io.FileNotFoundException: notify_sparkling-water-bteeuwen_155098482 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.launchH2OOnYarn(ExternalH2OBackend.scala:75)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.init(ExternalH2OBackend.scala:109)
at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:102)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:279)
at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala)
at org.apache.spark.h2o.JavaH2OContext.getOrCreate(JavaH2OContext.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

I reran the code, and it worked.

Status

Assignee

Jakub Hava

Reporter

Avkash Chauhan

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

Platform Issue

Customer Request Type

Support Incident

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Fix versions

Priority

Major
Configure