Crash when running GLRM on a dataset

Description

I've was using GLRM to fill in some missing data and it was working great.
Then I got a new dataset, and I now get the following crash:

OSError Traceback (most recent call last)
<ipython-input-10-bc29255b24c7> in <module>
96 predictors = h2o_frame.columns[:-1]
97 print("Training GLRM model")
---> 98 glrm_model.train(x = predictors, training_frame = h2o_frame)
99 print("Creating predictor")
100 pred = glrm_model.predict(h2o_frame)

d:\opt\python\lib\site-packages\h2o\estimators\estimator_base.py in train(self, x, y, training_frame, offset_column, fold_column, weights_column, validation_frame, max_runtime_secs, ignored_columns, model_id, verbose)
113 validation_frame=validation_frame, max_runtime_secs=max_runtime_secs,
114 ignored_columns=ignored_columns, model_id=model_id, verbose=verbose)
--> 115 self._train(parms, verbose=verbose)
116
117 def train_segments(self, x=None, y=None, training_frame=None, offset_column=None, fold_column=None,

d:\opt\python\lib\site-packages\h2o\estimators\estimator_base.py in _train(self, parms, verbose)
200 return
201
--> 202 job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
203 model_json = h2o.api("GET /%d/Models/%s" % (rest_ver, job.dest_key))["models"][0]
204 self._resolve_model(job.dest_key, model_json)

d:\opt\python\lib\site-packages\h2o\job.py in poll(self, poll_updates)
76 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)):
77 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
---> 78 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
79 else:
80 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))

OSError: Job with key $03017f00000132d4ffffffff$_b17d8e259e3f4f08372a61296884714 failed with an exception: java.lang.NullPointerException
stacktrace:
java.lang.NullPointerException
at hex.svd.SVD$SVDDriver.computeImpl(SVD.java:813)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:248)
at hex.ModelBuilder.trainModelNested(ModelBuilder.java:399)
at hex.glrm.GLRM$GLRMDriver.initialXY(GLRM.java:460)
at hex.glrm.GLRM$GLRMDriver.computeImpl(GLRM.java:782)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:248)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1557)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

I'm running this from Jupyter Notebook, using Python... Below you can see the relevant code:

print("Creating H2O dataframe")
h2o_frame = h2o.H2OFrame(toPredict[col])
print("Bulding GLRM model")
glrm_model = h2o.estimators.H2OGeneralizedLowRankEstimator(
k=5,
transform="NONE",
init="SVD",
loss="Quadratic",
regularization_x="None",
regularization_y="None",
max_iterations=1000)

predictors = h2o_frame.columns[:-1]
print("Training GLRM model")
glrm_model.train(x = predictors, training_frame = h2o_frame)

toPredict is just a Pandas dataframe, and col is a list of columns I want to process (this fails regardless of the columns I select).
The crash happens on the training step.

Further experiments lead me to think this might be memory related, since if I cut down the dataset from the current 3 million to 50000 rows, it works...

If you want to see with the original dataset, you can download it from here https://www.dropbox.com/s/6svm09d0kdq9i7a/movie_list.tsv.zip?dl=0 (it's 84Mb, I'll keep it live as long as possible).

If this is memory related, I kind of recommend that the error reflects that, since it's difficult to guess that from the error.

Thanks and keep up the good work!

Assignee

New H2O Bugs

Fix versions

None

Reporter

Diogo Andrade

Support ticket URL

None

Labels

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

Yes

Affects versions

Priority

Major
Configure