Fix intermittent NPE in PySparkling with rollups on external backend

Description

Remote rollups failed with an exception, wrapping and rethrowing: DistributedException from d0142492d774/192.168.254.5:54321: 'null', caused by java.lang.NullPointerException

 

java.lang.RuntimeException: DistributedException from 64c330808c10/192.168.254.4:54321: 'null', caused by java.lang.NullPointerException

at water.Futures.blockForPending(Futures.java:88)

at water.fvec.Frame.bulkRollups(Frame.java:513)

at water.fvec.Frame.means(Frame.java:531)

at hex.Model.makeBigScoreTask(Model.java:1575)

at hex.Model.predictScoreImpl(Model.java:1598)

at hex.deeplearning.DeepLearningModel.predictScoreImpl(DeepLearningModel.java:564)

at hex.Model.score(Model.java:1454)

at hex.Model.score(Model.java:1438)

at hex.Model.score(Model.java:1394)

at org.apache.spark.examples.h2o.HamOrSpamDemo$.isSpam(HamOrSpamDemo.scala:184)

at org.apache.spark.examples.h2o.HamOrSpamDemo$$anonfun$main$2.apply(HamOrSpamDemo.scala:98)

at org.apache.spark.examples.h2o.HamOrSpamDemo$$anonfun$main$2.apply(HamOrSpamDemo.scala:95)

at scala.collection.immutable.List.foreach(List.scala:381)

at org.apache.spark.examples.h2o.HamOrSpamDemo$.main(HamOrSpamDemo.scala:95)

at water.sparkling.itest.local.HamOrSpamDemoTest$$anonfun$main$1.apply$mcV$sp(HamOrSPamDemoSuite.scala:26)

at water.sparkling.itest.IntegTestStopper$class.exitOnException(IntegTestHelper.scala:140)

at water.sparkling.itest.local.HamOrSpamDemoTest$.exitOnException(HamOrSPamDemoSuite.scala:24)

at water.sparkling.itest.local.HamOrSpamDemoTest$.main(HamOrSPamDemoSuite.scala:25)

at water.sparkling.itest.local.HamOrSpamDemoTest.main(HamOrSPamDemoSuite.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: DistributedException from 64c330808c10/192.168.254.4:54321: 'null', caused by java.lang.NullPointerException

at water.RPC.result(RPC.java:241)

at water.RPC.get(RPC.java:253)

at water.RPC.get(RPC.java:47)

at water.Futures.waitAndCheckForException(Futures.java:30)

at water.Futures.cleanCompleted(Futures.java:62)

at water.Futures.add(Futures.java:47)

at water.fvec.RollupStats.start(RollupStats.java:312)

at water.fvec.Vec.startRollupStats(Vec.java:915)

at water.fvec.Vec.startRollupStats(Vec.java:903)

at water.fvec.Frame.bulkRollups(Frame.java:512)

... 26 more

Caused by: java.lang.NullPointerException

at water.fvec.Vec.chunkForChunkIdx(Vec.java:1099)

at water.MRTask.compute2(MRTask.java:619)

at water.H2O$H2OCountedCompleter.compute1(H2O.java:1420)

at water.fvec.RollupStats$Roll$Icer.compute1(RollupStats$Roll$Icer.java)

at water.H2O$H2OCountedCompleter.compute(H2O.java:1416)

at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)

at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)

at jsr166y.ForkJoinPool$WorkQueue.popAndExecAll(ForkJoinPool.java:904)

at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:977)

at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)

at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

19/09/16 16:55:24 INFO spark.SparkContext: Invoking stop() from shutdown hook

19/09/16 16:55:24 INFO server.ServerConnector: Stopped Spark@54709809{HTTP/1.1}

Status

Assignee

Jakub Hava

Reporter

Jakub Hava

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Fix versions

Priority

Major
Configure