flower app_evaluate result not visible due to client shutdown

Describe the bug

When I execute the TensorFlow quickstart example from https://flower.dev/docs/quickstart_tensorflow.html, I cannot see the app_evaluate results. The reporting process stops after "metrics_centralized {}". I noticed it also while running it in a distributed environment with more verbose debug settings. after metric_centralized, it seems that the server wants to connect to the clients again. It seems that the server can't connect due to client shutdown:

DEBUG flower 2022-01-09 10:33:12,464 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:34:47,614 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 10:34:47,716 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:34:54,029 | server.py:210 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-01-09 10:34:54,029 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:36:23,763 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 10:36:23,852 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:36:29,379 | server.py:210 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-01-09 10:36:29,380 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:37:56,419 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 10:37:56,512 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 10:38:02,388 | server.py:210 | evaluate_round received 2 results and 0 failures INFO flower 2022-01-09 10:38:02,388 | server.py:172 | FL finished in 289.92379931100004 INFO flower 2022-01-09 10:38:02,388 | app.py:119 | app_fit: losses_distributed [(1, 2.3099342584609985), (2, 2.340725302696228), (3, 2.3426902294158936)] INFO flower 2022-01-09 10:38:02,388 | app.py:120 | app_fit: metrics_distributed {} INFO flower 2022-01-09 10:38:02,388 | app.py:121 | app_fit: losses_centralized [] INFO flower 2022-01-09 10:38:02,388 | app.py:122 | app_fit: metrics_centralized {} D0109 10:38:02.390271102 8 chttp2_transport.cc:1748] ipv4:10.12.0.1:34958: Sending goaway err={"created":"@1641724682.390241242","description":"Server shutdown","file":"src/core/lib/surface/server.cc","file_line":480,"grpc_status":0} D0109 10:38:02.390317727 8 chttp2_transport.cc:1748] ipv4:10.12.0.2:51046: Sending goaway err={"created":"@1641724682.390242701","description":"Server shutdown","file":"src/core/lib/surface/server.cc","file_line":480,"grpc_status":0} D0109 10:38:02.405275611 8 init.cc:219] grpc_shutdown starts clean-up now

Steps/Code to Reproduce

cd examples/quickstart_tensorflow/ && / python3 server.py & python3 client.py & python3 client.py

Expected Results

INFO flower 2021-02-25 14:15:46,741 | app.py:76 | Flower server running (insecure, 3 rounds) INFO flower 2021-02-25 14:15:46,742 | server.py:72 | Getting initial parameters INFO flower 2021-02-25 14:16:01,770 | server.py:74 | Evaluating initial parameters INFO flower 2021-02-25 14:16:01,770 | server.py:87 | [TIME] FL starting DEBUG flower 2021-02-25 14:16:12,341 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2021-02-25 14:21:17,235 | server.py:177 | fit_round received 2 results and 0 failures DEBUG flower 2021-02-25 14:21:17,512 | server.py:139 | evaluate: strategy sampled 2 clients DEBUG flower 2021-02-25 14:21:29,628 | server.py:149 | evaluate received 2 results and 0 failures DEBUG flower 2021-02-25 14:21:29,696 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2021-02-25 14:25:59,917 | server.py:177 | fit_round received 2 results and 0 failures DEBUG flower 2021-02-25 14:26:00,227 | server.py:139 | evaluate: strategy sampled 2 clients DEBUG flower 2021-02-25 14:26:11,457 | server.py:149 | evaluate received 2 results and 0 failures DEBUG flower 2021-02-25 14:26:11,530 | server.py:165 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2021-02-25 14:30:43,389 | server.py:177 | fit_round received 2 results and 0 failures DEBUG flower 2021-02-25 14:30:43,630 | server.py:139 | evaluate: strategy sampled 2 clients DEBUG flower 2021-02-25 14:30:53,384 | server.py:149 | evaluate received 2 results and 0 failures INFO flower 2021-02-25 14:30:53,384 | server.py:122 | [TIME] FL finished in 891.6143046000007 INFO flower 2021-02-25 14:30:53,385 | app.py:109 | app_fit: losses_distributed [(1, 2.3196680545806885), (2, 2.3202896118164062), (3, 2.1818180084228516)] INFO flower 2021-02-25 14:30:53,385 | app.py:110 | app_fit: accuracies_distributed [] INFO flower 2021-02-25 14:30:53,385 | app.py:111 | app_fit: losses_centralized [] INFO flower 2021-02-25 14:30:53,385 | app.py:112 | app_fit: accuracies_centralized [] DEBUG flower 2021-02-25 14:30:53,442 | server.py:139 | evaluate: strategy sampled 2 clients DEBUG flower 2021-02-25 14:31:02,848 | server.py:149 | evaluate received 2 results and 0 failures INFO flower 2021-02-25 14:31:02,848 | app.py:121 | app_evaluate: federated loss: 2.1818180084228516 INFO flower 2021-02-25 14:31:02,848 | app.py:125 | app_evaluate: results [('ipv4:127.0.0.1:57158', EvaluateRes(loss=2.1818180084228516, num_examples=10000, accuracy=0.0, metrics={'accuracy': 0.21610000729560852})), ('ipv4:127.0.0.1:57160', EvaluateRes(loss=2.1818180084228516, num_examples=10000, accuracy=0.0, metrics={'accuracy': 0.21610000729560852}))] INFO flower 2021-02-25 14:31:02,848 | app.py:127 | app_evaluate: failures [] flower 2020-07-15 10:07:56,396 | app.py:77 | app_evaluate: failures []

Actual Results

INFO flower 2022-01-09 14:28:35,623 | app.py:77 | Flower server running (insecure, 3 rounds) INFO flower 2022-01-09 14:28:35,623 | server.py:118 | Initializing global parameters INFO flower 2022-01-09 14:28:35,623 | server.py:304 | Requesting initial parameters from one random client INFO flower 2022-01-09 14:28:40,044 | server.py:307 | Received initial parameters from one random client INFO flower 2022-01-09 14:28:40,044 | server.py:120 | Evaluating initial parameters INFO flower 2022-01-09 14:28:40,044 | server.py:133 | FL starting DEBUG flower 2022-01-09 14:28:40,045 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:31:20,651 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 14:31:20,768 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:31:26,910 | server.py:210 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-01-09 14:31:26,910 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:34:05,966 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 14:34:06,073 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:34:11,664 | server.py:210 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-01-09 14:34:11,664 | server.py:251 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:36:50,001 | server.py:260 | fit_round received 2 results and 0 failures DEBUG flower 2022-01-09 14:36:50,113 | server.py:201 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-01-09 14:36:55,675 | server.py:210 | evaluate_round received 2 results and 0 failures INFO flower 2022-01-09 14:36:55,675 | server.py:172 | FL finished in 495.630141119 INFO flower 2022-01-09 14:36:55,675 | app.py:119 | app_fit: losses_distributed [(1, 2.3150360584259033), (2, 2.325573205947876), (3, 2.1993141174316406)] INFO flower 2022-01-09 14:36:55,675 | app.py:120 | app_fit: metrics_distributed {} INFO flower 2022-01-09 14:36:55,675 | app.py:121 | app_fit: losses_centralized [] INFO flower 2022-01-09 14:36:55,675 | app.py:122 | app_fit: metrics_centralized {}

Jan 09 '22 13:01 niklas-scholz

I encountered the same issue.

By looking at the source code of app.py, I realized that we can set force_final_distributed_eval = True when we run fl.server.start_server().

Not sure whether this is intended, but it solves my problem.

Jan 14 '22 02:01 Hans0124SG

Hi @Hans0124SG, thanks a lot! That solved my problem. I am also not sure about this intention since, in app.py line 155, the condition on this parameter is commented with # Temporary workaround to force distributed evaluation

Jan 14 '22 15:01 niklas-scholz

Hello everyone,, I got the same issue, I have set force_final_distributed_eval = True, but the problem still not solved.

The problem like this. (jupyter_env) root@nsl:/home/nsl/Research/Jupyter/HAR/FL_Thermal/FL_Flower/FL_Thermal_7_Classes/2_Agents# python3 server.py INFO flower 2022-03-11 16:18:40,064 | app.py:109 | Flower server running (3 rounds) SSL is disabled INFO flower 2022-03-11 16:18:40,064 | server.py:128 | Initializing global parameters INFO flower 2022-03-11 16:18:40,064 | server.py:327 | Requesting initial parameters from one random client INFO flower 2022-03-11 16:18:58,815 | server.py:330 | Received initial parameters from one random client INFO flower 2022-03-11 16:18:58,815 | server.py:130 | Evaluating initial parameters INFO flower 2022-03-11 16:18:58,815 | server.py:143 | FL starting DEBUG flower 2022-03-11 16:19:05,703 | server.py:265 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 16:39:13,503 | server.py:277 | fit_round received 2 results and 0 failures DEBUG flower 2022-03-11 16:39:13,512 | server.py:211 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 16:39:16,333 | server.py:223 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-03-11 16:39:16,333 | server.py:265 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 17:01:15,667 | server.py:277 | fit_round received 2 results and 0 failures DEBUG flower 2022-03-11 17:01:15,674 | server.py:211 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 17:01:18,991 | server.py:223 | evaluate_round received 2 results and 0 failures DEBUG flower 2022-03-11 17:01:18,991 | server.py:265 | fit_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 17:22:09,597 | server.py:277 | fit_round received 2 results and 0 failures DEBUG flower 2022-03-11 17:22:09,605 | server.py:211 | evaluate_round: strategy sampled 2 clients (out of 2) DEBUG flower 2022-03-11 17:22:12,351 | server.py:223 | evaluate_round received 2 results and 0 failures INFO flower 2022-03-11 17:22:12,351 | server.py:182 | FL finished in 3793.5352155329965 INFO flower 2022-03-11 17:22:12,351 | app.py:149 | app_fit: losses_distributed [(1, 0.6875768154859543), (2, 0.433855876326561), (3, 0.37323958426713943)] INFO flower 2022-03-11 17:22:12,351 | app.py:150 | app_fit: metrics_distributed {} INFO flower 2022-03-11 17:22:12,351 | app.py:151 | app_fit: losses_centralized [] INFO flower 2022-03-11 17:22:12,351 | app.py:152 | app_fit: metrics_centralized {} (jupyter_env) root@nsl:/home/nsl/Research/Jupyter/HAR/FL_Thermal/FL_Flower/FL_Thermal_7_Classes/2_Agents#

There is no he app_evaluate results. The reporting process stops after "metrics_centralized {}".

I have set the force_final_distributed_eval = True at app.py file (/usr/local/lib/python3.8/dist-packages/flwr/server/app.py) in line 38, like this: def start_server( # pylint: disable=too-many-arguments server_address: str = DEFAULT_SERVER_ADDRESS, server: Optional[Server] = None, config: Optional[Dict[str, int]] = None, strategy: Optional[Strategy] = None, grpc_max_message_length: int = GRPC_MAX_MESSAGE_LENGTH, force_final_distributed_eval: bool = True, certificates: Optional[Tuple[bytes, bytes, bytes]] = None,

Can you help me? Is there any suggestions? Thank you,,

Mar 11 '22 08:03 zaikit