At the end of the K-means algorithm, strange training metrics are assigned to the output

Description

These lines cause the training metrics from the last iteration are replaced by another unknown metrics from DKV.
There should be metrics from the last Kmeans iteration.
If this is turn on, it causes the result metrics don't match any calculated metrics from all iterations.
Especially for Constrained Kmeans, it returns a result that does not meet the stated constraints.

https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/kmeans/KMeans.java#L360-L361

Probably there is the same problem for the validation set.

For example, in the iris dataset I printed for every Loyd iteration the centroid statistic:

11-21 16:25:51.714 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Size Within Cluster Sum of Squares
11-21 16:25:51.714 10.30.0.22:54321 15364 FJ-1-15 INFO: 1 362 6904.80395
11-21 16:25:51.714 10.30.0.22:54321 15364 FJ-1-15 INFO: 2 10 208.57395
11-21 16:25:51.714 10.30.0.22:54321 15364 FJ-1-15 INFO: 3 8 114.59766
11-21 16:25:51.726 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Statistics:
11-21 16:25:51.726 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Size Within Cluster Sum of Squares
11-21 16:25:51.726 10.30.0.22:54321 15364 FJ-1-15 INFO: 1 323 2225.26394
11-21 16:25:51.726 10.30.0.22:54321 15364 FJ-1-15 INFO: 2 42 235.42798
11-21 16:25:51.726 10.30.0.22:54321 15364 FJ-1-15 INFO: 3 15 154.36347
11-21 16:25:51.729 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Statistics:
11-21 16:25:51.729 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Size Within Cluster Sum of Squares
11-21 16:25:51.729 10.30.0.22:54321 15364 FJ-1-15 INFO: 1 264 1750.57349
11-21 16:25:51.729 10.30.0.22:54321 15364 FJ-1-15 INFO: 2 91 406.90851
11-21 16:25:51.729 10.30.0.22:54321 15364 FJ-1-15 INFO: 3 25 299.63215
11-21 16:25:51.732 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Statistics:
11-21 16:25:51.732 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Size Within Cluster Sum of Squares
11-21 16:25:51.732 10.30.0.22:54321 15364 FJ-1-15 INFO: 1 209 1297.35668
11-21 16:25:51.732 10.30.0.22:54321 15364 FJ-1-15 INFO: 2 135 622.72886
11-21 16:25:51.732 10.30.0.22:54321 15364 FJ-1-15 INFO: 3 36 419.02747

The result centroid statistics on training data are completely different:

11-21 16:25:51.749 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Statistics:
11-21 16:25:51.749 10.30.0.22:54321 15364 FJ-1-15 INFO: Centroid Size Within Cluster Sum of Squares
11-21 16:25:51.749 10.30.0.22:54321 15364 FJ-1-15 INFO: 1 166 953.09324
11-21 16:25:51.749 10.30.0.22:54321 15364 FJ-1-15 INFO: 2 169 794.90234
11-21 16:25:51.749 10.30.0.22:54321 15364 FJ-1-15 INFO: 3 45 486.23239

Status

Assignee

Veronika Maurerová

Fix versions

None

Reporter

Veronika Maurerová

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

CustomerVisible

No

Priority

Major
Configure