Using Docker to run in standalone mode, data upload failed.
System information
- Have I written custom code (yes/no): no
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
centos7 - FATE Flow version (use command: python fate_flow_server.py --version):
{'FATE': '1.11.2', 'FATEFlow': '1.11.1', 'FATEBoard': '1.11.1', 'EGGROLL': '2.5.1', 'CENTOS': '7.2', 'UBUNTU': '16.04', 'PYTHON': '3.8', 'MAVEN': '3.6.3', 'JDK': '8', 'SPARK': '3.4.0'} - Python version (use command: python --version):
Python 3.8.13
Describe the current behavior
The container star command is as follows.
$ docker run -d -it \
--name single_fate \
--restart=always \
-p 8080:8080 \
-p 9380:9380 \
federatedai/standalone_fate:1.11.2
$ docker ps |grep single_fate
cc2b7babbc26 federatedai/standalone_fate:1.11.2 "./bin/docker-entryp…" 22 minutes ago Up 22 minutes 0.0.0.0:9380->9380/tcp, 0.0.0.0:9090->8080/tcp single_fate
I followed the tutorial and ran the code for Pipeline tutorial upload, but upload data failed on this code,
pipeline_upload.upload(drop=1)
error info is :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:61, in JobInvoker.upload_data(self, submit_conf, drop)
60 if 'retcode' not in result or result["retcode"] != 0:
---> 61 raise ValueError
63 if "jobId" not in result:
ValueError:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 pipeline_upload.upload(drop=1)
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/loguru/_logger.py:1251, in Logger.catch.<locals>.Catcher.__call__.<locals>.catch_wrapper(*args, **kwargs)
1249 def catch_wrapper(*args, **kwargs):
1250 with catcher:
-> 1251 return function(*args, **kwargs)
1252 return default
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/backend/pipeline.py:664, in PipeLine.upload(self, drop)
662 upload_conf = self._construct_upload_conf(data_conf)
663 LOGGER.debug(f"upload_conf is {json.dumps(upload_conf)}")
--> 664 self._train_job_id, detail_info = self._job_invoker.upload_data(upload_conf, int(drop))
665 self._train_board_url = detail_info["board_url"]
666 self._job_invoker.monitor_job_status(self._train_job_id,
667 "local",
668 0)
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:69, in JobInvoker.upload_data(self, submit_conf, drop)
67 data = result["data"]
68 except BaseException:
---> 69 raise ValueError("job submit failed, err msg: {}".format(result))
70 return job_id, data
Describe the expected behavior
upload data success.
Contributing
- Do you want to contribute a PR? (yes/no): no
- Briefly describe your candidate solution(if contributing):
When I use the curl command inside a container.:
curl http://127.0.0.1:9380/
{"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}
When I use the curl command outside of the container.:
$ curl http://127.0.0.1:9380/
curl: (56) Recv failure: Connection reset by peer
So I guess it's a problem with the default host.
After I changed the host in service_conf.yaml of fateflow host to 0.0.0.0, the data was uploaded successfully.
The modified configuration is as follows::
fateflow:
# you must set real ip address, 127.0.0.1 and 0.0.0.0 is not supported
host: 0.0.0.0
http_port: 9380
grpc_port: 9360
# when you have multiple fateflow server on one party,
# we suggest using nginx for load balancing.
nginx:
host:
http_port:
grpc_port:
# use random instance_id instead of {host}:{http_port}
random_instance_id: false
Thank you very much for your feedback. You are correct, and we will work on optimizing this issue in the future.