Ensure that we handle cluster non-availabiliy in decent fashion

Description

When external cluster is stopped, we get exception on Spark side:

Exception in thread "Thread-22" java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:698)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.URL.openStream(URL.java:1045)
at scala.io.Source$.fromURL(Source.scala:141)
at scala.io.Source$.fromURL(Source.scala:131)
at org.apache.spark.h2o.utils.H2OContextRestAPIUtils$class.getCloudInfoFromNode(H2OContextRestAPIUtils.scala:49)
at org.apache.spark.h2o.H2OContext$H2OContextRestAPIBased.getCloudInfoFromNode(H2OContext.scala:417)
at org.apache.spark.h2o.utils.H2OContextRestAPIUtils$class.getCloudInfo(H2OContextRestAPIUtils.scala:57)
at org.apache.spark.h2o.H2OContext$H2OContextRestAPIBased.getCloudInfo(H2OContext.scala:417)
at org.apache.spark.h2o.H2OContext$H2OContextRestAPIBased.getCloudInfo(H2OContext.scala:425)
at org.apache.spark.h2o.H2OContext$H2OContextRestAPIBased.getSparklingWaterHeartBeatEvent(H2OContext.scala:439)
at org.apache.spark.h2o.H2OContext$$anon$1.run(H2OContext.scala:71)
Exception in thread "Thread-20" java.net.ConnectException: Connection refused (Connection refused)

This exception is caused by the thread checking the status of the cluster and it can’t reach it anymore as it’s dead

Environment

None

Status

Assignee

Jakub Hava

Reporter

Jakub Hava

Labels

None

Release Priority

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure