Pyspark Driver Integration errors out with py4j.Py4JException: Method attemptId([]) does not exist
Environment
How do you use Sentry? Self-hosted - 9.1.2
Which SDK and version? sentry-sdk[pyspark] == 0.20.3
Steps to Reproduce
I setup the Pyspark Integration as described in the official docs. I have only added the Driver integration currently. As I have not added the worker integration, I am also not adding the daemon configuration to the spark-submit script.
Expected Result
Sentry correctly captures and reports the errors.
Actual Result
The log is filled with errors. The crux of the error seems to be py4j.Py4JException: Method attemptId([]) does not exist. I have attached two logs here https://gist.github.com/amCap1712/6000892a940b7c004dad28060ddfd90d . One is when running on Spark 2.4.5 and other with Spark 3.1.1. Also, sentry captures this error which seems to occur while its connecting the integration and reports it.
I'll be happy to assist as much as I can to debug and solve this issue.
Related to #1102
@amCap1712 / @pvanderlinden Is there a workaround for this issue apart from filtering out this exception?
@dinesh-712 I ended up not using the pyspark specific integration, only the normal Python integration.
@pvanderlinden Thanks for the reply. Does using normal python integration guarantee the capture of errors in all worker nodes(slaves) created in SparkContext?
It will only capture exception which reach the driving script. But I don't think the integration is functional at the moment
Currently not a priority. I will close this. If there is demand for this, please reopen.