We're updating the issue view to help you get more done. 

pysparkling: adding a column to a data frame does not work when parse the original frame in spark

Description

#90702
Code to repro-
from Kuba - looks like the issue. frame is not re-evaluated after the column is added.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # import csv file spark_df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('BostonHousing.csv') # create h2o context from pysparkling import * hc = H2OContext.getOrCreate(sc) boston = hc.as_h2o_frame(spark_df) import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator predictors = boston.columns[:-1] response = "medv" boston_glm2 = H2OGeneralizedLinearEstimator(nfolds=2,Lambda=.01) boston_glm2.train(x = predictors, y = response,training_frame = boston) pred = boston_glm2.predict(boston) boston["predict"] = pred['predict'] sp_boston = hc.as_spark_frame(boston) sp_boston

Environment

None

Status

Assignee

Michal Malohlava

Reporter

Nidhi Mehta

Labels

Release Priority

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

Fix versions

Priority

Major