org.apache.spark.SparkException: Job aborted dute to storage failure: Serialized task xxx: was xxx bytes, which exceeds max allowed: spark.rpc.message.maxSize (xxx bytes)

Context:

This happens when spark.createDataFrame(). in Databricks.


Possible cause:

The problem seems was the data you're passing into createDataFrame() is too large to serialize and send to the executors, exceeding spark.rpc.message.maxSize.


Solutions:

One solution would be editing spark.rpc.message.maxSize setting in the cluster and restart it

  • Cluster => configuration => Spark => spark.rpc.message.maxSize 512

Another solution might be changing or spliting the dataframe we are creating into smaller batches for spark.createDataFrame()


In case you have a similar problem and resolved in a different manner, leave a comment:)

No comments:

Post a Comment