Context:
This happens when spark.createDataFrame().
in Databricks.
Possible cause:
The problem seems was the data you're passing into createDataFrame()
is too large to serialize and send to the executors, exceeding spark.rpc.message.maxSize
.
Solutions:
One solution would be editing spark.rpc.message.maxSize setting in the cluster and restart it
- Cluster => configuration => Spark =>
spark.rpc.message.maxSize 512
Another solution might be changing or spliting the dataframe we are creating into smaller batches for spark.createDataFrame()
In case you have a similar problem and resolved in a different manner, leave a comment:)
No comments:
Post a Comment