pytask icon indicating copy to clipboard operation
pytask copied to clipboard

ENH: add some more common Nodes

Open NickCrews opened this issue 2 years ago • 1 comments

Writing pandas dataframes to disk seems really common. I am going to have to write my own DFNode class for this. Would you like me to generalize it enough and then contribute it here?

I would imagine something like

class DFNode:

    def __init__(self, path: str | ..., kind: Literal["csv", "parquet", ...], load_kwargs: dict[str, Any = {}, save_kwargs: dict[str, Any]):
        self.path = path
        self.kind = kind
        self.load_kwargs = load_kwargs
        self.save_kwargs = save_kwargs
    
    def save(self, value: Any) -> None:
        saver = getattr(value, f"to_{self.kind}")
        saver(self.path, **self.save_kwargs)

    def load(self, is_product: bool) -> Any:
        if is_product:
            return self
        loader = getattr(pd,  f"read_{self.kind}")
        return loader(self.path, **self.load_kwargs)

NickCrews avatar Nov 26 '23 22:11 NickCrews

Yes, please. I intended to build a collection of nodes that people regularly use with the help of users. If it grows too big, it should probably be split into a separate package, but for now, we can implement them here in a separate module like nodes_extension.py and import necessary packages with import_optional_dependency.

tobiasraabe avatar Nov 27 '23 11:11 tobiasraabe