kedro-plugins feat(datasets): Add option to async load and save in PartitionedDatasets

Description

This PR provides the user to load and save PartitionedDataset asynchronously for partitions provided.
PartitionedDatasets already provide a way to do lazy loading, which solves for memory complexity. With this PR the time complexity is also reduced if the user wants to save/load these partitions in parallel with the help of use_async argument.

Development notes

Additional use_async argument to PartitionedDataset constructor is used to control the async load/save.
Based on this argument, _save and _load methods call different private functions.
Leveraged existing tests for PartitionedDataset by parameterizing value for use_async using @pytest.mark.parametrize("use_async", [True, False])

Checklist

[x] Opened this PR as a 'Draft Pull Request' if it is work-in-progress
[x] Updated the documentation to reflect the code changes
[x] Added a description of this change in the relevant RELEASE.md file
[x] Added tests to cover my changes

May 23 '24 17:05 puneeter

Hi @puneeter, can you please provide a description and any relevant development notes on the PR? This will make it easier for the team to review.

May 24 '24 11:05 merelcht

Hi @puneeter, can you please provide a description and any relevant development notes on the PR? This will make it easier for the team to review.

I updated the description. Please let me know if it needs any refactoring.

May 24 '24 11:05 puneeter

Would need team's help to point to the right documentation to be changed because of this change. Maybe: docs/source/data/partitioned_and_incremental_datasets.md?

Oct 18 '24 11:10 puneeter

Hey @puneeter, sorry for the long delay. Indeed, partitioned_and_incremental_datasets.md corresponds to https://docs.kedro.org/en/0.19.10/data/partitioned_and_incremental_datasets.html

In the end, is the usage similar to what I wrote here https://github.com/kedro-org/kedro-plugins/pull/696#discussion_r1616675036 or is it different?

Aside from that, I'll leave one more comment

Jan 14 '25 15:01 astrojuanlu

@puneeter I see all the tests have been modified to take the use_async argument, but is there a way to also check that the async functionality is working?

Jan 16 '25 09:01 merelcht

My comments on asyncio.run above were premonitory, because we ended up finding a actual example of such breakage in a separate PR https://github.com/kedro-org/kedro/issues/4611

I think this had good intentions but it might actually be difficult to do. @puneeter do you mind if we turn this PR into an issue and we tackle it at some other time?

May 13 '25 16:05 astrojuanlu

I'm going to close this PR as there's not been any response from the author for a while. If anyone is interested in trying something like this in the future, please open an issue.

May 26 '25 11:05 merelcht