Uploaded image for project: 'SW'
  1. SW
  2. SW-152

ClassNotFound with spark-submit

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.8, 2.0.2
    • Fix Version/s: 1.6.NEXT, 2.0.3
    • Component/s: None
    • Labels:
      None
    • CustomerVisible:
      No

      Description

      When using default settings, an app that contains multiple scala/java files and spark-submit we will end up with a ClassNotFound exception.

      To reproduce:

      1) start a spark cluster (local is ok with $SPARK_HOME/sbin/start-all.sh)

      2) Create a project with 2 files:

      CNFDemo.scala:

      import org.apache.spark._
      import org.apache.spark.h2o._
      
      object CNFDemo {
        def main(args: Array[String]) {
          val sparkConf = new SparkConf().setAppName("CNFDemo")
      
          val sc = new SparkContext(sparkConf)
          val hc = H2OContext.getOrCreate(sc)
      
          sc.parallelize(1 to 10000,12).map(IntWrapper).collect().foreach(w => println(w.num))
      
          hc.stop(stopSparkContext = true)
        }
      }
      

      IntWrapper.scala

      case class IntWrapper(num: Int)
      

      3) Package it (for sbt projects >sbt package will do)

      4) Submit the app:

      $SPARK_HOME/bin/spark-submit --packages ai.h2o:sparkling-water-core_2.10:1.6.3 --class CNFDemo --master spark://<MASTER_URL>:7077 --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m" <PROJECT_DIR>/target/scala-2.10/cnfdemo_2.10-1.0.jar
      

      Current workaround:

      Simply set "spark.ext.h2o.repl.enabled" to "false" on SparkConfig.

      Solutions:

      1) Simply add this to the doc (do we need repl.enabled in true when doing spark-submit??)
      2) detect the appropriate default "spark.ext.h2o.repl.enabled" setting automatically
      3) fix the classpath issue

      To check:

      Tried it only with SW1.6.3 and Spark 1.6.1, need to check other versions.

      Initial investigation:

      When running H2OIMain#initialize this branch is called:

       else {
              // non local mode, application not started using SparkSubmit
              interpreterClassloader = new InterpreterClassLoader()
            }
      

      The comment would suggest it's not right?

      When trying to Load the IntWrapper class via InterpreterClassLoader#LoadClass this branch is called and throws ClassNotFound:

      } else {
    super.loadClass(name)
  }
      

      Changing it to Class.forName(name) works but I'm not sure if that's the right solution.

        Attachments

          Activity

            People

            • Assignee:
              mateusz Mateusz Dymczyk
              Reporter:
              mateusz Mateusz Dymczyk
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support