openfl
openfl copied to clipboard
Federated Evaluation in Workflow API is not working as expected.
Describe the bug Federated Evaluation is not working in Workflow API for FederatedRuntime.
- If notebook is running in 2.7.0. Envoys are giving below error
EXCEPTION : <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Unexpected <class 'FileNotFoundError'>: [Errno 2] No such file or directory: '/var/github/workspace/openfl/payalcha_openfl/openfl-tutorials/experimental/workflow/FederatedEvaluation/director/db3919cc-9932-469b-8f10-4af39a420042'"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2025-05-20T00:56:15.785507319-07:00", grpc_status:2, grpc_message:"Unexpected <class \'FileNotFoundError\'>: [Errno 2] No such file or directory: \'/var/github/workspace/openfl/payalcha_openfl/openfl-tutorials/experimental/workflow/FederatedEvaluation/director/db3919cc-9932-469b-8f10-4af39a420042\'"}"
>
Traceback (most recent call last):
File "/var/github/workspace/openfl/venv310/bin/fx", line 8, in <module>
sys.exit(entry())
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/interface/cli.py", line 310, in entry
error_handler(e)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/interface/cli.py", line 229, in error_handler
raise error
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/interface/cli.py", line 308, in entry
cli(max_content_width=120)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/interface/cli.py", line 131, in invoke
super().invoke(ctx)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/experimental/workflow/interface/cli/envoy.py", line 158, in start_
envoy.start()
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/experimental/workflow/component/envoy/envoy.py", line 222, in start
self._run()
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/experimental/workflow/component/envoy/envoy.py", line 145, in _run
data_file_path = self._save_data_stream_to_file(data_stream)
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/openfl/experimental/workflow/component/envoy/envoy.py", line 172, in _save_data_stream_to_file
for response in data_stream:
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/grpc/_channel.py", line 543, in __next__
return self._next()
File "/var/github/workspace/openfl/venv310/lib/python3.10/site-packages/grpc/_channel.py", line 969, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Unexpected <class 'FileNotFoundError'>: [Errno 2] No such file or directory: '/var/github/workspace/openfl/payalcha_openfl/openfl-tutorials/experimental/workflow/FederatedEvaluation/director/db3919cc-9932-469b-8f10-4af39a420042'"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2025-05-20T00:56:15.785507319-07:00", grpc_status:2, grpc_message:"Unexpected <class \'FileNotFoundError\'>: [Errno 2] No such file or directory: \'/var/github/workspace/openfl/payalcha_openfl/openfl-tutorials/experimental/workflow/FederatedEvaluation/director/db3919cc-9932-469b-8f10-4af39a420042\'"}"
- requirements.txt file is not properly generated. It must hold all the requirements mention in the notebook except basic openfl or workflow interface requirements. My understanding is, it is due to
# | exportkeyword not present in the notebook cell.
- Even if I change the torch 2.7.0 to 2.3.1. Notebooks run successfully but model seems not get properly loaded in case of FederatedRuntime as Aggregated values of LocalRuntime is not even close to FederatedRuntime
FederatedRuntime aggregated value - Average aggregated model accuracy values = 0.11860000342130661 Bengaluru value of 0.11860000342130661 Portland value of 0.11860000342130661
LocalRuntime aggregated value - Average aggregated model accuracy values = 0.9070000052452087 Bengaluru value of 0.9064000248908997 Portland value of 0.9075999855995178
Huge difference in my understanding is due to the reason that in FederatedRuntime trained model is not loaded properly.
To Reproduce Steps to reproduce the behavior:
- Clone openfl
- Pip install openfl and openfl-tutorials/experimental/workflow/workflow_interface_requirements.txt
- perform fx experimental activate
- Start director, envoys in openfl-tutorials/experimental/workflow/FederatedEvaluation
- Start notebook thru jupyter lab or papermill command
Expected behaviorA clear and concise description of what you expected to happen.
- generated_workspace must hold proper requirements.txt with all requirements.
- In generated experiment.py runtime must be federated_runtime not local_runtime. There fflow is initiated in the generated experiment.