Running rbind and unique operation multiple times causes jvm to crash

Description

On 3.26.0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 library(h2o) h2o.init() #Path to synthetic data file = "/temp/synthetic_190M_rows_py_export.csv" #Create synthetic data with 190M rows synth <- h2o.createFrame( rows = 190000000, cols = 1, randomize = TRUE, value = 0, real_range = 100, categorical_fraction = 0, factors = 100, integer_fraction = 1, integer_range = 999999999999999, binary_fraction = 0, binary_ones_fraction = 0, time_fraction = 0, string_fraction = 0, missing_fraction = 0, response_factors = 2, has_response = FALSE, seed = 0 ) #Export synthetic data h2o.exportFile(synth, path = file) #Base dataframe bindrow = h2o.importFile(path = file) #pass 1 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 2 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 3 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 4 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 5 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 6 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 7 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 8 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 9 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 10 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow) #pass 11 df1 = h2o.importFile(path = file) bindrow = h2o.rbind(bindrow, df1) h2o.nrow(bindrow) bindrow = h2o.unique(bindrow[, 1]) h2o.nrow(bindrow)

Environment

None

Status

Assignee

Unassigned

Fix versions

None

Reporter

Nidhi Mehta

Support ticket URL

Labels

None

Release Priority

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Components

Priority

Major
Configure