Question from a prospect, trying to install on databricks… Databricks runtime 6.2
Connected using databricks-connect on Pycharm IDE
Downloaded the file from https://s3.amazonaws.com/h2o-release/sparkling-water/spark-2.4/184.108.40.206-1-2.4/sparkling-water-220.127.116.11-1-2.4.zip
2. pip install h2o_pysparkling_2.4
Running the following set of commands from terminal
bin/sparkling-shell --conf “spark.executor.memory=1g”
This successfully runs and creates the following
Spark context Web UI available at http://10.0.0.24:4041
Spark context available as ‘sc’ (master = local[*], app id = local-1587011456735).
Spark session available as ‘spark’.
Running in the pycharm IDE
From pysparkling import *
This leads to an error as below:
Traceback (most recent call last):
File “<input>“, line 1, in <module>
File “/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py”, line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File “/Users/aloomba/opt/anaconda3/envs/sandbox/lib/python3.7/site-packages/pysparkling/init.py”, line 31, in <module>
version = Initializer.getVersion()
File “/Users/aloomba/opt/anaconda3/envs/sandbox/lib/python3.7/site-packages/ai/h2o/sparkling/Initializer.py”, line 215, in getVersion
You are using PySparkling for Spark 2.4, but your PySpark is of version 6.2.
Please make sure Spark and PySparkling versions are compatible.