[BUG] promptflow is trying to upload my entire working directory while running with evaluate API
Describe the bug
I noticed while running promptflow evaluate API, it is trying to upload my entire working directory to remote, and below case it failed because there is a special file which is a soft link it cannot be uploaded. but this file is completely irrelevant to my evaluation run, it is just a file in my working directory.
This happens when I provide a target call as a separate python class
But when the target call just invokes a promptflow (dag yml), it seems not uploading the working directory
Questions: I can definitely remove this special file to make it work, but would like to understand why it needs to upload my entire working directory? how to prevent this?
results = evaluate(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
result = func(*args, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
raise e
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
return _evaluate(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 401, in _evaluate
input_data_df, target_generated_columns, target_run = _apply_target_to_data(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 183, in _apply_target_to_data
run = pf_client.run(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
return self._run(
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
return self.runs.create_or_update(run=run, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
return f(self, *args, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 135, in create_or_update
created_run = RunSubmitter(client=self._client).submit(run=run, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 52, in submit
task_results = [task.result() for task in tasks]
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 52, in <listcomp>
task_results = [task.result() for task in tasks]
File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/anaconda/envs/azureml_py38/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 134, in _run_bulk
self._submit_bulk_run(flow=flow, run=run, local_storage=local_storage)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/_orchestrator/run_submitter.py", line 226, in _submit_bulk_run
local_storage.dump_snapshot(flow)
File "/anaconda/envs/azureml_py38/lib/python3.9/site-packages/promptflow/_sdk/operations/_local_storage_operations.py", line 256, in dump_snapshot
shutil.copytree(
File "/anaconda/envs/azureml_py38/lib/python3.9/shutil.py", line 568, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
File "/anaconda/envs/azureml_py38/lib/python3.9/shutil.py", line 522, in _copytree
raise Error(errors)
shutil.Error: [('<A soft link file>, '<A soft link file>', "[Errno 21] Is a directory: '<A soft link file>")]
How To Reproduce the bug Steps to reproduce the behavior, how frequent can you experience the bug: 1.
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Running Information(please complete the following information):
- Promptflow Package Version using
pf -v: [e.g. 0.0.102309906] - Operating System: [e.g. Ubuntu 20.04, Windows 11]
- Python Version using
python --version: [e.g. python==3.10.12]
{ "promptflow": "1.13.0", "promptflow-azure": "1.13.0", "promptflow-core": "1.13.0", "promptflow-devkit": "1.13.0", "promptflow-evals": "0.3.1", "promptflow-tracing": "1.13.0" }
Executable '/anaconda/envs/azureml_py38/bin/python' Python (Linux) 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) [GCC 12.3.0]
Additional context Add any other context about the problem here.
Promptflow uploads your code snapshot to remote run, for flow.dag.yaml it's usually the dictory contains the flow.dag.yaml. For your case, could you share more on how do you specify the target flow? I guess somehow it treats the current working directory as the code snapshot root.
To add up to @wangchao1230 's comment, the reason we need to upload the whole working directory when flow runs in remote is because we need to make sure all current flow's dependency can be found in remote. Uploading working directory will make sure the flow runs in same environment between local & cloud. For example, if you used a customized evaluator, we need to upload the whole working directory to make sure the customized evaluator can be load successfully in remote. But if you only used built-in evaluators, I think some improvements can be done to skip uploading the whole working directory and only upload the flow YAML file. +@luigiw @singankit for comment here.
Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!