When yarn containers have global limit ,running xgboost with other algos and setting extramempercent somthimes slows the performance of other algos

Description

from

1 2 3 4 5 6 7 8 9 10 11 12 13 Setting "extramempercent" will statically partition the memory the YARN container is allowed to use. Some of it will be reserved to JVM and the rest will be available to XGBoost and other native processes running inside of the same container. It is not possible to resize the heap size of a running JVM and that means we cannot just use the "extra memory" when we are not running XGBoost and use it eg. for GLRM. This problem doesn't have a simple solution. To work around this you could start a separate cluster just for XGBoost, this way you wouldn't penalize your GLRM. Solution on our end could be to either - start separate containers just for XGBoost (as a part of the same cluster) and instead of drawing the line horizontally and take the memory out of each container we would do a vertical partitioning and combine XGBoost-only nodes and H2O-only nodes in the same cluster. - start a completely separate cluster just for XGBoost on demand. Both approaches have pros & cons. The second one is better from the architecture point of view but makes it harder to manage the lifecycle of the application + logging is less straightforward. Both of them are quite complex to implement.

Environment

None

Status

Assignee

New H2O Bugs

Fix versions

None

Reporter

Nidhi Mehta

Support ticket URL

None

Labels

None

Release Priority

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Priority

Major