too many files when you spill keys. Also: spilling seems slow

Description

To create large amounts of data quickly, I create a file in the test, then create symlinks in the syn_datasets dir. Then parse using a pattern.
I don't know if the test works in windows, because I use linux commands for the symlink creation (from within python)

I also loop reusing the source files to create more parsed keys (I restrict the loop to 2 now)

I also restrict the heap a bit

All of this helps cause spilling to ice.
cd testdir_single_jvm
python test_parse_500_cols_spill.py

This is after trying (interrupting once it got slow) this test

Using iotop, I only saw like 5 MB/sec of IO bw (write) when spilling.
I suspect that will hurt us eventually. (we should have a spilling benchmark test)

I also updated the sandbox clean stuff in h2o.py to print a message about when it's removing a prior sandbox, and how long it takes when it's done
This was after a partial test run (135 secs to rm the directory)
Removing sandbox (if slow, might be old ice dir spill files)
[2014-10-08 15:07:54.780783] Took 135.284460068 secs to remove sandbox

kevin@Kevin-Ubuntu4:~/h2o/py/testdir_single_jvm/sandbox/ice.W5qa_K/ice54321$ time ls -R * > /dev/null

real 0m6.900s
user 0m6.260s
sys 0m0.628s
kevin@Kevin-Ubuntu4:~/h2o/py/testdir_single_jvm/sandbox/ice.W5qa_K/ice54321$ ls -R * | wc -l
651847

eventually you can hit os limits on # of files in a directory.

Assignee

New H2O Bugs

Reporter

Kevin Normoyle

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Components

Priority

Major
Configure