Append & Merge PdPipeline Stages for Joining Pandas DataFrames
Thanks for this library, it made my pandas workflow clean & easy to maintain.
Feature Request:
pd.merge is a common operation when dealing with 2 or more data frames & pipeline stage for this is missing.
Currently, I am implementing a custom pipeline stage pdp.Merge from PdPipelineStage & would like to contribute.
Sure, I'd appreciate any contribution! :) Let me know if you need any help.
And thank you for the kind words. :)
Hi, can we close this issue.? I already raised a pull request with this features #48
Let me know, if you need any help in migrating to GitHub actions/releasing the changes.
Hey @Asrst ,
Sorry for the belated response! :) I've migrated successfully to Github Actions.
I'd love it if you can rebase over the current head of the master branch, and then open a new pull request.
Tests should then work properly, including linting and coverage reports by codecov.
Cheers, Shay
By the way, you can see here your code reduces test coverage below 100% (which we cannot have): https://github.com/pdpipe/pdpipe/pull/53
Please enrich tests to cover all cases you accounted for in your code.
I don't want to use that PR (#53, mine), as it has two merge commits. I rather you rebase over the master and open a PR with a single code commit, which I can then fast-forward to, to avoid a merge commit completely. :)
hi @shaypal5,
Improved the code coverage, rebased over the master & raised a new pull request. #55
Hey @Asrst ,
I'm so sorry. I just got to reading the PR, and unfortunately the current implementation does not make a lot of sense. It seems like you wrote two stages that get a single dataframe when creating the pipeline, and then on application append or merge it to the input dataframe.
Can you explain the use case? Share your use of it?
I think stages using the pandas.DataFrame.append and pandas.DataFrame.merge operations must mainly cater to use cases where both dataframes are somehow inputted on application time, and not one of them being statically set when creating the pipeline.
If your use case is very specific, I would prefer you stick with a AdHocStage that implements the same functionality. And even if you show this to be a generalizable use case, stage names will have to be more specific, as this is a very specific use of the methods. Maybe AppendFixedFrame and MergedFixedFrame or something.