bzhao comments

Results 64 comments of


                                            bzhao

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

Looks there are other testcases need to be fixed. This is I testing on master without any change. ``` spark@DESKTOP-U0I7MO9:~/spark$ python/run-tests --testnames 'pyspark.sql.tests.test_dataframe' Running PySpark tests. Output is in /home/spark/spark/python/unit-tests.log...

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

> I also test with Spark 3.3.0 with Python 3.9.12, and it's fine. > > could you help figure out whether this **repr** issue only exists in Spark 3.2.x or...

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

> Is it dependent on pandas version being used? See also https://github.com/apache/spark/blob/master/dev/infra/Dockerfile Hi, I tested with pandas 1.3.X and 1.4.X. That's true that anything is OK and won't raise error....

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

https://github.com/pandas-dev/pandas/commit/67e8c4c3761ab1da4b0a341a472c0fe2ea393e8b This is the associated commit from pandas upstream. Analysising the history

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

I had opened an [issue](https://github.com/pandas-dev/pandas/issues/47844) in Pandas community. Let's waiting

[SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex

From the pandas community, it seems the new behavior is good and expected. They only support several matches during using astype with DatetimeArray . We should apply it on PySpark...

[SPARK-39942][PYTHON][PS] Need to verify the input nums is integer in nsmallest func

> I'm not sure about @Yikun 's `type_checker` annotation, but it seems simpler than manually adding `isinstance` checking for each parameter. The code looks checking the definition of annotation, as...

[SPARK-39942][PYTHON][PS] Need to verify the input nums is integer in nsmallest func

> If it's a common mistake, we might want to add this fix, but for this patch, I personally think this example seems a little too extreme. The user could...

[SPARK-39822][PYTHON][PS] Provide a good feedback to users

> How about we include Series and DataFrame in this PR as well since they all rely on `infer_pd_series_spark_type`? You mean including the said UTs? Or make this one more...

[SPARK-39822][PYTHON][PS] Provide a good feedback to users

> Thanks for working on error improvement of pandas API on Spark! > > We have https://issues.apache.org/jira/browse/SPARK-39581 as an umbrella to track all relevant tickets. > > Would you like...