Parse in 99% finish state for very long

Description

Multi-file HDFS Parse is in 99% finish state, ie no more new new "data" keys are being created (There wasn't much actual memory pressure) - There were a lot of small keys in this 90G dataset over 8x40G nodes (sparse dataset - post-parse 15G), 15M rows.

Then towards the end the Cleaner thread (one thread per node) is 100pc up.
(Cleaner Thread at the end of Parse needs to be Asynchronous?)
Let's investigate if that's holding up the completion.

Assignee

New H2O Bugs

Reporter

SriSatish Ambati

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major
Configure