To create large amounts of data quickly, I create a file in the test, then create symlinks in the syn_datasets dir. Then parse using a pattern.
I don't know if the test works in windows, because I use linux commands for the symlink creation (from within python)
I also loop reusing the source files to create more parsed keys (I restrict the loop to 2 now)
I also restrict the heap a bit
All of this helps cause spilling to ice.
This is after trying (interrupting once it got slow) this test
Using iotop, I only saw like 5 MB/sec of IO bw (write) when spilling.
I suspect that will hurt us eventually. (we should have a spilling benchmark test)
I also updated the sandbox clean stuff in h2o.py to print a message about when it's removing a prior sandbox, and how long it takes when it's done
This was after a partial test run (135 secs to rm the directory)
Removing sandbox (if slow, might be old ice dir spill files)
[2014-10-08 15:07:54.780783] Took 135.284460068 secs to remove sandbox
kevin@Kevin-Ubuntu4:~/h2o/py/testdir_single_jvm/sandbox/ice.W5qa_K/ice54321$ time ls -R * > /dev/null
kevin@Kevin-Ubuntu4:~/h2o/py/testdir_single_jvm/sandbox/ice.W5qa_K/ice54321$ ls -R * | wc -l
eventually you can hit os limits on # of files in a directory.