Double-quoted integers are not parsed as enum

Description

The parser doesn't respect double quotes to mark categorical columns.

Let's say we have a categorical column with levels "0" "1" "2" "3" "4" and "other", created by h2o.interaction on an integer column that was as.factor'd, keeping only the most frequent levels.

When we write that frame to disk and then load it back in, this column turns into integers 0 to 4 with a missing value in place of "other", where instead, it should have maintained enum status, since all values were wrapped in double quotes.

Assignee

Tomas Nykodym

Reporter

Arno Candel

Labels

None

CustomerVisible

Yes

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Components

Priority

Major
Configure