categorical_column | target |
---|---|

A | True |

B | True |

C | False |

D | False |

E | True |

F | True |

G | False |

H | False |

parameter | value |
---|---|

ntrees | 1 |

max_depth | 1 |

min_rows | 1 |

nbins_cats | 4 for BUG and 8 for NO BUG |

column | type |
---|---|

categorical_column | enum |

target | enum |

With nbins_cats = 8 (i.e. nbins_cats greater or equal to the number of unique values of the categorical column), there is no bug, the training **AUC is 1** as expected and the tree is the expected one below :

Whereas with nbins_cats = 4, there is the bug i.e. a bad split (and "numerical") on the categorical column, the training **AUC is 0.75** and it is confirmed by the bad tree shown below :

Normally, in this example, even with nbins_cats = 4 we should get the same optimal split than with nbins_cats = 8 and thus AUC should be 1.

But you see that AUC is only 0.75 and not 1.