Luca Canali
Luca Canali
I must say I am bit puzzled by the error found in test_pandas_array_struct as I cannot reproduce it in my test system. When I run `python/run-tests --modules pyspark-sql --testnames pyspark.sql.tests.test_pandas_udf_scalar`...
This should be good to go now, @HyukjinKwon ?
This is fixed in sparkMeasure v0.21 which instroduced executor metrics collection and the reports: ``` (scala)> stageMetrics.printMemoryReport (python)> stagemetrics.print_memory_report() ```
@HyukjinKwon thanks for the review. Indeed I agree this needs to be checked on the "SQL side" too. I have just pushed a small extension to address the case of...
@HyukjinKwon I see that a particular query in SQLQueryTestSuite.udf/postgreSQL/udf-aggregates_part3.sql seems to have a problem with this PR. I am struggling to understand why. It looks to be related to the...
I understand from @HyukjinKwon comment on January 18 that there should be more people expert in Spark's use of Python and SQL to review this. @cloud-fan, @maryannxue, @viirya @ueshin @BryanCutler...
The issue with SQLQueryTestSuite.udf/postgreSQL/udf-aggregates_part3.sql should be fixed now. I have also extended the instrumentation to applyInPandasWithState recently introduced in SPARK-40434
Thank you @cloud-fan !
I confirm that this is an annoying issue, somehow the pom file did not get to maven repos for version 0.21. There does not seem to be a foundamental reason...
This is now fixed in sparMeasure v0.22 See: https://repo1.maven.org/maven2/ch/cern/sparkmeasure/spark-measure_2.12/0.22/spark-measure_2.12-0.22.pom