We're updating the issue view to help you get more done. 

Performance metrics when using balance_classes

Description

Motivation

When balance_classes is used in H2O modeling, the performance metrics will not match the performance metrics constructed during training.

An example is shown below:

1 2 3 4 5 6 # Build model with balanced classes gbm_model <- h2o.gbm(x, y, training_frame, balance_classes = TRUE) h2o.performance(gbm_model, train = True)@metrics$AUC [1] 0.7248632 h2o.performance(gbm_model, newdata = training_frame)@metrics$AUC [1] 0.7248632

Solution

When balance_classes is enabled in H2O modeling, the model is built on a balanced version of the training data frame. The performance metrics constructed during training are based on this balanced version of the training data frame. Therefore, when performance is calculated on the unbalanced training data frame, the metrics will be different.

To determine the performance metrics on the unbalanced training data frame use the following:

1 h2o.performance(gbm_model, newdata = training_frame)

Environment

None

Status

Assignee

Megan Kurka

Reporter

Megan Kurka

Labels

None

Release Priority

None

CustomerVisible

None

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Priority

Major