[BUG] Imagespec envd build skipped when _not_ using pyflyte register
Describe the bug
Imagespec dependencies, such as those added to a workflow/example.py file are automatically built when one calls pyflyte register.
https://gist.github.com/zeryx/a3074a2aade6a7263c3d796d08891741
However, when you register and execute with pyflyte run, pyflyte run --copy-all, or flytekit.remote.register_workflow the ImageSpec build step is skipped, and the container_image argument failsover to use the base image.
Expected behavior
The expected behavior is regardless of the mechanism used to register the workflow onto a flyte cluster, if an imagespec image is required for any tasks; those ImageSpec steps should be built via envd and deployed to the provided registry.
If that fails for whatever reason, the registration itself should fail; and return a useful error indicating that an ImageSpec build process failed for some reason; rather than registering the task with the default flyte base image.
If one is using flytekit as a library and triggering register_workflow() or register_task() and an imagespec image is provided as a container_image argument; that ImageSpec image should attempt to build said image, or validate that the existing image already exists.
Additional context to reproduce
- pyflyte init, replace your workflow/example.py with https://gist.github.com/zeryx/a3074a2aade6a7263c3d796d08891741
- call pyflyte run --copy-all workflows/example.py
- verify that envd is not being interacted with, and no images are written to registry
Screenshots
pyflyte --config ~/.uctl/config.yaml run --remote --copy-all workflows/example.py wf
2023-06-12 11:23:46,062144 WARNING {"asctime": "2023-06-12 11:23:46,062", "name": "flytekit.cli", "levelname": "WARNING", ignore.py:51
"message": "Could not determine ignored files due to:\nb'fatal: not a git repository
(or any of the parent directories): .git\\n'\nNot applying any filters"}
Opening in existing browser session.
libva error: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so init failed
libva error: /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so init failed
127.0.0.1 - - [12/Jun/2023 11:24:13] "GET /callback?code=WnyxL2kywRTHw3tSSMLVO3T-38OLUwpQbOr5_8i_snY&state=feEPffKlUICJ63Y8bCuEtXBtgcU3kY2Oi1sQNrme5amT17xMnhgNOA HTTP/1.1" 200 -
Go to https://demo.hosted.unionai.cloud/console/projects/flytesnacks/domains/development/executions/f8a710d7a26b74e77a9e to see execution in the console.
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
This tinier example correctly triggers a build:
"""A simple Flyte example."""
from flytekit import task, workflow, ImageSpec
default = ImageSpec(
name="flytekit",
base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.0",
registry="ghcr.io/zeryx",
packages=["flytekit>=1.6.0", "bayesian-optimization==1.4.3"],
python_version="3.8",
)
@task(container_image=default)
def trigger_job() -> str:
return "hello, world!"
@workflow
def wf() -> str:
return trigger_job()
This one doesn't trigger a build:
"""A simple Flyte example."""
from flytekit import dynamic, task, workflow, ImageSpec
default = ImageSpec(
name="flytekit",
base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.0",
registry="localhost:30000",
packages=["flytekit>=1.6.0", "bayesian-optimization==1.4.3"],
python_version="3.8",
)
@task(container_image=default)
def internal_job() -> str:
from bayes_opt import BayesianOptimization, UtilityFunction
return f"hello {bayes}!"
@dynamic
def trigger_job() -> str:
return internal_job()
@workflow
def wf() -> str:
return trigger_job()
This suggests that ImageSpec is only handled for tasks that are discovered during serialization of the entity being run. In this example, internal_job is not discovered during serialization of wf, and as such, does not have its image spec built when invoked with pyflyte run. See: https://github.com/flyteorg/flytekit/blob/3370a96fe1df3b484b21f17a5cb919600333ac8c/flytekit/remote/remote.py#LL845C25-L845C25
Should remote.register_script register all entities in that script as opposed to just the ones discoverable during serialization of the entity being invoked?
dynamic tasks are a weird corner case in the case of pyflyte run since we can't know what's "behind" a dynamic task (in the sense of what Flyte entities compose such dynamic task) without running it.
The current implementation of pyflyte run starts by figuring out which Flyte entity is pointed to at invocation time (i.e. either the task or workflow argument passed to pyflyte run). We could modify that and load all Flyte entities present in the file, which would solve the problem described in this issue.
Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable. Thank you for your contribution and understanding! 🙏