GLM2 now dynamically recomputes thresholds, but apparently CMs are not in sync..leads to mismatched choice between thresholds and CMs (when this happens the CM displayed won't be right?)

Description

http://mr-0xd4:8080/job/testdir_single_jvm_3_of_5/1776/testReport/test_GLM2_params_rand2/Basic/test_GLM2_params_rand2/
This is a more direct error check on another case (assertion): thresholds 200 and cm 101 should be lists of the same size
the failure below should reproduce by using the seed

Using random seed: 7055986311003396848

i.e.
cd testdir_single_jvm
pythong test_GLM2_params_rand2.py -s 7055986311003396848

On 11/04/2014 12:15 PM, Tomas Nykodym wrote:
> Thresholds are now not hardcoded list of 100 values as before but are computed based on the actual predictions.
> I probably did not update the model with the latest thresholds.
> I’ll take a look but probably not before next week (I am on the way to Berlin) so if you can make a jira for that that would be great.
> thx!
>> On Nov 4, 2014, at 3:25 AM, Kevin <kevin@0xdata.com> wrote:
>> I seems like I'm getting a thresholds list that's a different length than the cms list
>> maybe thresholds is too long (191?). I'm using lambda_search=1 and higher_accuracy=1. I wasn't using those params before.
>>
>> I get this miscompare: (I just added a more direct compare of the lengths of cm and thresholds)
>>
>> assert best_index<len(cms), "%s %s" % (best_index, len(cms))
>> AssertionError: 191 101
>>
>>
>> I figure out best_index, by comparing best_threshold to the values in the thresholds list
>> since best_threshold may not be in the list, I have to compare for >=
>>
>> best_threshold = validations['best_threshold']
>> thresholds = validations['thresholds']
>>
>> # FIX! best_threshold isn't necessarily in the list. jump out if >=
>> for i,t in enumerate(thresholds):
>> if t >= best_threshold: # ends up using next one if not present
>> best_index = i
>> break
>>
>>
>> # cm = glm['glm_model']['submodels'][0]['validation']['_cms'][-1]
>> submodels = glm['glm_model']['submodels']
>> cms = submodels[0]['validation']['_cms']
>> assert best_index<len(cms), "%s %s" % (best_index, len(cms))
>>
>>
>> this was the glm
>>
>> [2014-11-04 01:29:25.366014] 2/GLM2 parameters: {'family': 'binomial', 'max_predictors': None, 'cols': None, 'n_folds': 1, 'use_all_factor_levels': 1, 'variable_importances': None, 'higher_accuracy': 1, 'ignored_cols_by_name': None, 'response': 54, 'source': 'B.hex', 'has_intercept': None, 'lambda_search': 1, 'destination_key': None, 'standardize': None, 'max_iter': None, 'lambda_min_ratio': None, 'alpha': 0.1, 'non_negative': None, 'beta_epsilon': None, 'nlambdas': None, 'tweedie_variance_power': None, 'ignored_cols': 'C1', 'prior': None, 'link': None, 'strong_rules_enabled': None, 'lambda': 0}
>>
>>
>> [2014-11-04 01:29:59.687584] FAIL
>> poll 0.81 http://172.16.2.187:54480/2/GLMProgress.json?job_key=GLM2Job__8baf8196cb2e170a8fa178c2b68346ec&destination_key=GLMModel__ae0dec60e3211464f1378e84d4eb44bd
>> [2014-11-04 01:30:01.712024] poll 0.84999996 http://172.16.2.187:54480/2/GLMProgress.json?job_key=GLM2Job__8baf8196cb2e170a8fa178c2b68346ec&destination_key=GLMModel__ae0dec60e3211464f1378e84d4eb44bd
>> [2014-11-04 01:30:03.719097] poll 0.89 http://172.16.2.187:54480/2/GLMProgress.json?job_key=GLM2Job__8baf8196cb2e170a8fa178c2b68346ec&destination_key=GLMModel__ae0dec60e3211464f1378e84d4eb44bd
>> [2014-11-04 01:30:05.728499] redirect 0.0 http://172.16.2.187:54480/2/GLMModelView.json?_modelKey=GLMModel__ae0dec60e3211464f1378e84d4eb44bd
>> [2014-11-04 01:30:07.738229] best_lambda_idx: 52
>> [2014-11-04 01:30:13.217396] lambda_max: 0.861145409408
>> [2014-11-04 01:30:13.217433] GLMModel/iterations: 110
>> [2014-11-04 01:30:13.217471] GLMModel/validations
>> [2014-11-04 01:30:13.217491] null_deviance: 16705.8235017
>> [2014-11-04 01:30:13.217540] residual_deviance: 11938.446495
>> [2014-11-04 01:30:13.217557] auc: 0.836865918485
>> [2014-11-04 01:30:13.217575] best_threshold: 0.21321565
>> [2014-11-04 01:30:13.217595] Now printing the right 'best_threshold' 0.21321565 from '_cms
>> [2014-11-04 01:30:13.217637]
>> ======================================================================
>> FAIL: test_GLM2_params_rand2 (test_GLM2_params_rand2.Basic)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> File "/home4/jenkins/slave_dir/workspace/testdir_single_jvm_3_of_5/py/testdir_single_jvm/test_GLM2_params_rand2.py", line 82, in test_GLM2_params_rand2
>> h2o_glm.simpleCheckGLM(self, glm, None, **kwargs)
>> File "../h2o_glm.py", line 219, in simpleCheckGLM
>> assert best_index<len(cms), "%s %s" % (best_index, len(cms))
>> AssertionError: 191 101

Assignee

New H2O Bugs

Reporter

Kevin Normoyle

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Components

Priority

Major
Configure