metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Subclassing: Can expand the graph but not "remove" steps

Open dotKokott opened this issue 1 year ago • 0 comments

Very excited for Flow subclassing to arrive! I noticed a little issue:

The original example from the PR works: https://github.com/Netflix/metaflow/pull/2086 (After putting BaseFlow into its own file)

from metaflow import FlowSpec, step


class BaseFlow(FlowSpec):
    @step
    def start(self):
        print("this is the start")
        self.next(self.step1)

    @step
    def step1(self):
        print("base step 1")
        self.next(self.end)

    @step
    def end(self):
        print("base step end.")


class SubFlow(BaseFlow):
    @step
    def step1(self):
        print("sub step 1")
        self.next(self.step2)

    @step
    def step2(self):
        print("sub step 2")
        self.next(self.end)


if __name__ == "__main__":
    SubFlow()

Here the SubFlow is actually modifying the graph by expanding it (adding an extra step).

However I have a use case were I would like to rename a step:

from metaflow import FlowSpec, step


class BaseFlow(FlowSpec):
    @step
    def start(self):
        print("this is the start")
        self.next(self.process)

    @step
    def process(self):
        print("base process")
        self.next(self.end)

    @step
    def end(self):
        print("base step end.")


class ForEachFlow(BaseFlow):
    @step
    def start(self):
        print("this is the start")
        self.items = [1,2,3]
        self.next(self.process_chunk, foreach="items")

    @step
    def process_chunk(self):
        print("processing chunk")
        self.next(self.end)    


if __name__ == "__main__":
    SubFlow()

In which case the graph validator says that process is unreachable.

    Step process is unreachable from the start step. Add self.next(process) in another step or remove process.

So it seems like I can add steps to the graph but I cannot remove steps from the BaseFlow. Is this intentional?

I can work around this of course, by either keeping the name process for my fan step or by fanning out after process as to not skip a step that is defined in the base flow.

My use case here is to provide a set of BaseFlows for both training and data processing that hide a lot of complexity in regards to chunking, logging etc.

dotKokott avatar Feb 09 '25 09:02 dotKokott