Uploaded image for project: 'Public H2O 3'
  1. PUBDEV-4593

Levenshtein Distance Normalization Error

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.18.0.1
    • Component/s: Rapids
    • Labels:
      None
    • CustomerVisible:
      No
    • Sprint:

      Description

      Duke comparators normalization seems off,

      words = ['A', 'AA', 'AAA', 'AAAA', 'AAAAA', 'AAAAAA']
      compare = ['a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa']
      frame = pd.DataFrame({'words':words, 'compare':compare})
      words = h2o.H2OFrame(words)
      compare = h2o.H2OFrame(compare)
      words.strdistance(compare, 'lv')
      

      does not give all 0s as expected. The same result can be seen by using the Duke comparators directly (screen shot attached)

        Attachments

          Activity

            People

            • Assignee:
              pavel Pavel Pscheidl
              Reporter:
              chrism315 Chris Mascioli
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: