flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] Imagespec envd build skipped when _not_ using pyflyte register

Open zeryx opened this issue 2 years ago • 4 comments

Describe the bug

Imagespec dependencies, such as those added to a workflow/example.py file are automatically built when one calls pyflyte register.

https://gist.github.com/zeryx/a3074a2aade6a7263c3d796d08891741

However, when you register and execute with pyflyte run, pyflyte run --copy-all, or flytekit.remote.register_workflow the ImageSpec build step is skipped, and the container_image argument failsover to use the base image.

Expected behavior

The expected behavior is regardless of the mechanism used to register the workflow onto a flyte cluster, if an imagespec image is required for any tasks; those ImageSpec steps should be built via envd and deployed to the provided registry.

If that fails for whatever reason, the registration itself should fail; and return a useful error indicating that an ImageSpec build process failed for some reason; rather than registering the task with the default flyte base image.

If one is using flytekit as a library and triggering register_workflow() or register_task() and an imagespec image is provided as a container_image argument; that ImageSpec image should attempt to build said image, or validate that the existing image already exists.

Additional context to reproduce

  • pyflyte init, replace your workflow/example.py with https://gist.github.com/zeryx/a3074a2aade6a7263c3d796d08891741
  • call pyflyte run --copy-all workflows/example.py
  • verify that envd is not being interacted with, and no images are written to registry

Screenshots

image

pyflyte --config ~/.uctl/config.yaml run --remote --copy-all workflows/example.py wf
2023-06-12 11:23:46,062144 WARNING  {"asctime": "2023-06-12 11:23:46,062", "name": "flytekit.cli", "levelname": "WARNING", ignore.py:51
                                    "message": "Could not determine ignored files due to:\nb'fatal: not a git repository               
                                    (or any of the parent directories): .git\\n'\nNot applying any filters"}                           
Opening in existing browser session.
libva error: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so init failed
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
127.0.0.1 - - [12/Jun/2023 11:24:13] "GET /callback?code=WnyxL2kywRTHw3tSSMLVO3T-38OLUwpQbOr5_8i_snY&state=feEPffKlUICJ63Y8bCuEtXBtgcU3kY2Oi1sQNrme5amT17xMnhgNOA HTTP/1.1" 200 -
Go to https://demo.hosted.unionai.cloud/console/projects/flytesnacks/domains/development/executions/f8a710d7a26b74e77a9e to see execution in the console.

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

zeryx avatar Jun 12 '23 15:06 zeryx

This tinier example correctly triggers a build:

"""A simple Flyte example."""

from flytekit import task, workflow, ImageSpec

default = ImageSpec(
    name="flytekit",
    base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.0",
    registry="ghcr.io/zeryx",
    packages=["flytekit>=1.6.0", "bayesian-optimization==1.4.3"],
    python_version="3.8",
)


@task(container_image=default)
def trigger_job() -> str:
    return "hello, world!"


@workflow
def wf() -> str:
    return trigger_job()

jeevb avatar Jun 12 '23 15:06 jeevb

This one doesn't trigger a build:

"""A simple Flyte example."""

from flytekit import dynamic, task, workflow, ImageSpec

default = ImageSpec(
    name="flytekit",
    base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.0",
    registry="localhost:30000",
    packages=["flytekit>=1.6.0", "bayesian-optimization==1.4.3"],
    python_version="3.8",
)


@task(container_image=default)
def internal_job() -> str:
    from bayes_opt import BayesianOptimization, UtilityFunction

    return f"hello {bayes}!"


@dynamic
def trigger_job() -> str:
    return internal_job()


@workflow
def wf() -> str:
    return trigger_job()

This suggests that ImageSpec is only handled for tasks that are discovered during serialization of the entity being run. In this example, internal_job is not discovered during serialization of wf, and as such, does not have its image spec built when invoked with pyflyte run. See: https://github.com/flyteorg/flytekit/blob/3370a96fe1df3b484b21f17a5cb919600333ac8c/flytekit/remote/remote.py#LL845C25-L845C25

Should remote.register_script register all entities in that script as opposed to just the ones discoverable during serialization of the entity being invoked?

jeevb avatar Jun 12 '23 15:06 jeevb

dynamic tasks are a weird corner case in the case of pyflyte run since we can't know what's "behind" a dynamic task (in the sense of what Flyte entities compose such dynamic task) without running it.

The current implementation of pyflyte run starts by figuring out which Flyte entity is pointed to at invocation time (i.e. either the task or workflow argument passed to pyflyte run). We could modify that and load all Flyte entities present in the file, which would solve the problem described in this issue.

eapolinario avatar Jul 18 '23 19:07 eapolinario

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Apr 14 '24 00:04 github-actions[bot]