GBM ModelMetrics, airlines_all (8 machines)*** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122

Description

thought I'd try some multi-machine

I did a git clone on mr-0xd10 and built, so it's head of master

can run this from any machine as it copies the jars to the machines (mr-0xd2 thru mr-0xd10)

(one warning, since I use h2o.py, have to uninstall any h2o python package you installed. I probably need to rename my h2o.py)

using airlines_all from the usual /home/0xdiag/datasets on each machine

seems to past the training...the progress advances to 1.0 while polling

I did it twice, failed both times

The last h2o request is ModelMetrics (it finished training, then did Models.json, then Frames.json, then ModelMetrics.json)

2015-02-25 01:37:53.805546 – Start http://172.16.2.189:54321/3/ModelMetrics.json/models/GBMModelKey/frames/airlines_all.hex # None;

not sure if it does the same thing with fewer machines.

cd h2o-dev/py2/testdir_single_jvm
python test_GBM_airlines.py -cj ../testdir_hosts/pytest_config-182-190.json

======================================================================
ERROR: test_GBM_airlines (_main_.Basic)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_GBM_airlines.py", line 8, in tearDown
h2o.check_sandbox_for_errors()
File "../h2o_test.py", line 254, in check_sandbox_for_errors
python_test_name=python_test_name)
File "../h2o_sandbox.py", line 289, in check_sandbox_for_errors
raise Exception(errorMessage)
Exception: check_sandbox_for_errors: Errors in sandbox stdout or stderr (or R stdout/stderr).
Could have occurred at any prior time

water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122
at water.RPC.get(RPC.java:252)
at water.TaskGetKey.get(TaskGetKey.java:28)
02-25 01:29:55.792 172.16.2.186:54321 27724 # Session WARN: Caught exception: water.DException$DistributedException: from /172.16.2.186:54321; by class water.KeySnapshot$GlobalUKeySetTask; class water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122; Stacktrace: [water.MRTask.getResult(MRTask.java:265), water.MRTask.doAll(MRTask.java:295), water.MRTask.doAllNodes(MRTask.java:287), water.KeySnapshot.globalSnapshot(KeySnapshot.java:234), water.KeySnapshot.globalSnapshot(KeySnapshot.java:221), water.api.ModelMetricsHandler$ModelMetricsList.fetch(ModelMetricsHandler.java:22), water.api.ModelMetricsHandler.fetch(ModelMetricsHandler.java:142), water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:155), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:606), water.api.Handler.handle(Handler.java:57), water.api.RequestServer.handle(RequestServer.java:602), water.api.RequestServer.serve(RequestServer.java:560), water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:433), java.lang.Thread.run(Thread.java:745)] at water.DKV.get(DKV.java:210)

at water.DKV.get(DKV.java:168)
at water.Key.get(Key.java:84)
at water.fvec.Frame.vecs_impl(Frame.java:246)
at water.fvec.Frame.vecs(Frame.java:232)
at water.fvec.Frame.anyVec(Frame.java:208)
at water.KeySnapshot$KeyInfo.<init>(KeySnapshot.java:52)
at water.KeySnapshot.localSnapshot(KeySnapshot.java:212)
at water.KeySnapshot$GlobalUKeySetTask.setupLocal(KeySnapshot.java:249)
at water.MRTask.setupLocal0(MRTask.java:339)
at water.MRTask.dinvoke(MRTask.java:282)
at water.RPC$RPCCall.compute2(RPC.java:333)
at water.H2O$H2OCountedCompleter.compute(H2O.java:582)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
java.lang.AssertionError
at water.AutoBuffer.<init>(AutoBuffer.java:132)
at water.RPC.response(RPC.java:572)
at water.UDPAck.call(UDPAck.java:17)
at water.FJPacket.compute2(FJPacket.java:21)
at water.H2O$H2OCountedCompleter.compute(H2O.java:582)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

----------------------------------------------------------------------

Assignee

New H2O Bugs

Reporter

Kevin Normoyle

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure