[SPARK-40012][PYTHON][DOCS][WIP] Make pyspark.sql.dataframe examples self-contained
What changes were proposed in this pull request?
This PR proposes to improve the examples in pyspark.sql.dataframe by making each example self-contained with more realistic examples
Why are the changes needed?
To make the documentation more readable and able to copy and paste directly in PySpark shell.
Does this PR introduce any user-facing change?
Yes, Documentation changes only
How was this patch tested?
Built documentation on local
I set the tag [WIP] because dataframe.py needs a lot of updates. I will add some additional changes to this PR in the upcoming days.
Please review and provide some feedback. Thanks
Can one of the admins verify this patch?
@Transurgeon is this still WIP? If there are too many to fix, feel free to split into multiple PRs.
@HyukjinKwon.
No not WIP anymore, I wanted to get some feedback to see if I was making good changes before I continue working on it.
Should I remove the WIP tag?
@HyukjinKwon I made some additional changes. I think we can start by merging this PR, then I will make another one for the rest of the changes.
I have a list of all the functions I made changes to in this PR, should I add it to the JIRA ticket to avoid duplicate changes?
I think you can reuse the same JIRA, and make a followup.
We should make the tests passed before merging it in (https://github.com/Transurgeon/spark/runs/7981691501).
cc @dcoliversun @khalidmammadov FYI if you guys find some time to review, and work on the rest of API.
@Transurgeon Hi. Look like that CI is disabled in your fork repo.
Maybe the doc can help you :)
Im gonna take this over if the PR author gets inactive few more days - this is the last task left for the umbrella task.
Hi Hyukjin and Oliver, thanks all for your feedback.
I have created a commit with all your suggestions and allowed all jobs to be run in git Actions for my fork.
I will make one last commit for further minor changes.
Hey, let's co-author this change. I will create another PR on the top of this PR to speed this up.