Long running H2O instances throwing HTTP 500 - Internal Server Error

Description

"I can't find much info in the documentation on how you are supposed to run H2O.
I downloaded the 2.8.0.1 zip file (I see now that there is a new version) and started h2o on our server (linux, CentOS 6.5) with commandline and -Xmx8g.

My colleagues are using h2o (creating a model, uploading datasets, ...) on regular basis but after a few days, we get some weird errors.

First time it happened, we were unable to view the jobs list:
HTTP 500 - Internal Server Error
NullPointerException: null

At this point we also got "polling errors" from a python/R script.

I did a shutdown and then restarted h2o. It was running fine again for a few days but now we get an error when uploading a file with again an internal server error.

What is causing this? (I will try to add a logfile if that can help)

Are we not supposed to run it continuous like this?
What are the best practices to run h2o on a single server and have it available for remote users?

"One more addition, if I go to cluster status in the web interface, under the table of nodes (one entry) there is the word "Locked"

The last job (parsing of a dataset) was cancelled, but in the water meter I see high CPU on one bar and 50% on a second bar.
I have the impression that there is an infinite loop going on somewhere."

Assignee

New H2O Bugs

Reporter

SriSatish Ambati

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure