spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-40012][PYTHON][DOCS][WIP] Make pyspark.sql.dataframe examples self-contained

Open Transurgeon opened this issue 3 years ago • 2 comments

What changes were proposed in this pull request?

This PR proposes to improve the examples in pyspark.sql.dataframe by making each example self-contained with more realistic examples

Why are the changes needed?

To make the documentation more readable and able to copy and paste directly in PySpark shell.

Does this PR introduce any user-facing change?

Yes, Documentation changes only

How was this patch tested?

Built documentation on local

Transurgeon avatar Aug 09 '22 04:08 Transurgeon

I set the tag [WIP] because dataframe.py needs a lot of updates. I will add some additional changes to this PR in the upcoming days.

Please review and provide some feedback. Thanks

Transurgeon avatar Aug 09 '22 04:08 Transurgeon

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 10 '22 00:08 AmplabJenkins

@Transurgeon is this still WIP? If there are too many to fix, feel free to split into multiple PRs.

HyukjinKwon avatar Aug 23 '22 00:08 HyukjinKwon

@HyukjinKwon.

No not WIP anymore, I wanted to get some feedback to see if I was making good changes before I continue working on it.

Should I remove the WIP tag?

Transurgeon avatar Aug 23 '22 15:08 Transurgeon

@HyukjinKwon I made some additional changes. I think we can start by merging this PR, then I will make another one for the rest of the changes.

I have a list of all the functions I made changes to in this PR, should I add it to the JIRA ticket to avoid duplicate changes?

Transurgeon avatar Aug 23 '22 19:08 Transurgeon

I think you can reuse the same JIRA, and make a followup.

HyukjinKwon avatar Aug 23 '22 23:08 HyukjinKwon

We should make the tests passed before merging it in (https://github.com/Transurgeon/spark/runs/7981691501).

cc @dcoliversun @khalidmammadov FYI if you guys find some time to review, and work on the rest of API.

HyukjinKwon avatar Aug 23 '22 23:08 HyukjinKwon

@Transurgeon Hi. Look like that CI is disabled in your fork repo. image Maybe the doc can help you :)

dcoliversun avatar Aug 24 '22 08:08 dcoliversun

Im gonna take this over if the PR author gets inactive few more days - this is the last task left for the umbrella task.

HyukjinKwon avatar Aug 25 '22 10:08 HyukjinKwon

Hi Hyukjin and Oliver, thanks all for your feedback.

I have created a commit with all your suggestions and allowed all jobs to be run in git Actions for my fork.

I will make one last commit for further minor changes.

Transurgeon avatar Aug 25 '22 22:08 Transurgeon

Hey, let's co-author this change. I will create another PR on the top of this PR to speed this up.

HyukjinKwon avatar Aug 29 '22 06:08 HyukjinKwon