Consider Apache Arrow serialization for h2o frames

Description

This was originally brought up in a comment in :

https://0xdata.atlassian.net/browse/SW-556?focusedCommentId=51240&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-51240

Spark 2.3 supports dataframes based on Apache Arrow serialization https://issues.apache.org/jira/browse/SPARK-13534
This might be easier if H2O would use Apache Arrow for its frames, then in the future it would be possible to support Spark Dynamic Allocation?
Arrow was created to address this zero-copy need between different frameworks https://arrow.apache.org/
Arrow: All systems utilize the same memory format; No overhead for cross-system communication
I don't know h2o architecture and probably oversimplifying here.

Thanks.

Assignee

Unassigned

Reporter

Ruslan Dautkhanov

Labels

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure