Uploaded image for project: 'Public H2O 3'
  1. PUBDEV-3965

Importing data in python returns error - TypeError: expected string or bytes-like object

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.10.3.4
    • Component/s: None
    • Labels:
      None
    • CustomerVisible:
      No
    • AffectedCustomers:
    • Support Assessment:
      Platform Issue
    • Customer Request Type:
      Support Incident

      Description

      Hi,

      Note: The customer sent their data to repro this problem. Please contact JeffG to know how to pull PayPal data from internal SFTP site.

      I notice the following parse error during the data ingestion phase using the Python API (stack trace from Jupyter notebook is attached):

      <local_path>/.local/lib/python3.5/site-packages/h2o/job.py:68: UserWarning: ParseError at file <data_file_path> at byte offset 4900582108; error = 'Unmatched quote char "'
      warnings.warn(w)

      ……

      TypeError: expected string or bytes-like object

      Although, after eliminating all the double-quotes in the training data, the parse error is not seen anymore, the h2o.h2o.import_file call hangs in the Jupyter notebook. I am able to view the data frame, its column, chunk compression and frame distribution summaries through H2O GUI flow and can process it further (chunk and frame summary screen shots attached). After performing all the required type checks, I am forcing the field names and types as python lists to the import_file function call. Lastly, I have a feature which has non-ASCII, non-extended ASCII data which is forced as “enum” and am using H2O version 3.10.2.2 (rel-tutte).

      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-8-33cd45112131> in <module>()
      6 col_names=data_colnames,
      7 col_types=data_coltypes,
      ----> 8 na_strings=[''])

      <local_path>/lib/python3.5/site-packages/h2o/h2o.py in import_file(path, destination_frame, parse, header, sep, col_names, col_types, na_strings)
      375 return lazy_import(path)
      376 else:
      --> 377 return H2OFrame()._import_parse(path, destination_frame, header, sep, col_names, col_types, na_strings)
      378
      379

      <local_path>/lib/python3.5/site-packages/h2o/frame.py in _import_parse(self, path, destination_frame, header, separator, column_names, column_types, na_strings)
      316 path = os.path.abspath(path)
      317 rawkey = h2o.lazy_import(path)
      --> 318 self._parse(rawkey, destination_frame, header, separator, column_names, column_types, na_strings)
      319 return self
      320

      <local_path>/lib/python3.5/site-packages/h2o/frame.py in _parse(self, rawkey, destination_frame, header, separator, column_names, column_types, na_strings)
      329 na_strings=None):
      330 setup = h2o.parse_setup(rawkey, destination_frame, header, separator, column_names, column_types, na_strings)
      --> 331 return self._parse_raw(setup)
      332
      333 def _parse_raw(self, setup):

      <local_path>/lib/python3.5/site-packages/h2o/frame.py in _parse_raw(self, setup)
      353 p['source_frames'] = [_quoted(src['name']) for src in setup['source_frames']]
      354
      --> 355 H2OJob(h2o.api("POST /3/Parse", data=p), "Parse").poll()
      356 # Need to return a Frame here for nearly all callers
      357 # ... but job stats returns only a dest_key, requiring another REST call to get nrow/ncol

      <local_path>/lib/python3.5/site-packages/h2o/job.py in poll(self)
      66 if self.warnings:
      67 for w in self.warnings:
      ---> 68 warnings.warn(w)
      69
      70 # check if failed... and politely print relevant message

      TypeError: expected string or bytes-like object

        Attachments

          Activity

            People

            • Assignee:
              pasha Pasha Stetsenko
              Reporter:
              avkash Avkash Chauhan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: