it takes a long time to build the model on a sparse dataset (89x5000) when read in using parquet format on a 5 executor SW cluster.