FATE-Flow icon indicating copy to clipboard operation
FATE-Flow copied to clipboard

Using Docker to run in standalone mode, data upload failed.

Open danerlt opened this issue 2 years ago • 2 comments

System information

  • Have I written custom code (yes/no): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos7
  • FATE Flow version (use command: python fate_flow_server.py --version): {'FATE': '1.11.2', 'FATEFlow': '1.11.1', 'FATEBoard': '1.11.1', 'EGGROLL': '2.5.1', 'CENTOS': '7.2', 'UBUNTU': '16.04', 'PYTHON': '3.8', 'MAVEN': '3.6.3', 'JDK': '8', 'SPARK': '3.4.0'}
  • Python version (use command: python --version): Python 3.8.13

Describe the current behavior

The container star command is as follows.

$ docker run -d -it \
    --name single_fate \
    --restart=always \
    -p 8080:8080 \
    -p 9380:9380 \
    federatedai/standalone_fate:1.11.2
$ docker ps |grep single_fate
cc2b7babbc26        federatedai/standalone_fate:1.11.2              "./bin/docker-entryp…"   22 minutes ago      Up 22 minutes          0.0.0.0:9380->9380/tcp, 0.0.0.0:9090->8080/tcp                                                       single_fate

I followed the tutorial and ran the code for Pipeline tutorial upload, but upload data failed on this code,

pipeline_upload.upload(drop=1)

error info is :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:61, in JobInvoker.upload_data(self, submit_conf, drop)
     60 if 'retcode' not in result or result["retcode"] != 0:
---> 61     raise ValueError
     63 if "jobId" not in result:

ValueError: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 pipeline_upload.upload(drop=1)

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/loguru/_logger.py:1251, in Logger.catch.<locals>.Catcher.__call__.<locals>.catch_wrapper(*args, **kwargs)
   1249 def catch_wrapper(*args, **kwargs):
   1250     with catcher:
-> 1251         return function(*args, **kwargs)
   1252     return default

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/backend/pipeline.py:664, in PipeLine.upload(self, drop)
    662 upload_conf = self._construct_upload_conf(data_conf)
    663 LOGGER.debug(f"upload_conf is {json.dumps(upload_conf)}")
--> 664 self._train_job_id, detail_info = self._job_invoker.upload_data(upload_conf, int(drop))
    665 self._train_board_url = detail_info["board_url"]
    666 self._job_invoker.monitor_job_status(self._train_job_id,
    667                                      "local",
    668                                      0)

File ~/miniconda3/envs/fate/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py:69, in JobInvoker.upload_data(self, submit_conf, drop)
     67     data = result["data"]
     68 except BaseException:
---> 69     raise ValueError("job submit failed, err msg: {}".format(result))
     70 return job_id, data

Describe the expected behavior

upload data success.

Contributing

  • Do you want to contribute a PR? (yes/no): no
  • Briefly describe your candidate solution(if contributing):

When I use the curl command inside a container.:

curl http://127.0.0.1:9380/
{"retcode":100,"retmsg":"<NotFound '404: Not Found'>"}

When I use the curl command outside of the container.:

$ curl http://127.0.0.1:9380/
curl: (56) Recv failure: Connection reset by peer

So I guess it's a problem with the default host.

danerlt avatar Aug 21 '23 13:08 danerlt

After I changed the host in service_conf.yaml of fateflow host to 0.0.0.0, the data was uploaded successfully.

The modified configuration is as follows::

fateflow:
  # you must set real ip address, 127.0.0.1 and 0.0.0.0 is not supported
  host: 0.0.0.0
  http_port: 9380
  grpc_port: 9360
  # when you have multiple fateflow server on one party,
  # we suggest using nginx for load balancing.
  nginx:
    host:
    http_port:
    grpc_port:
  # use random instance_id instead of {host}:{http_port}
  random_instance_id: false

danerlt avatar Aug 21 '23 13:08 danerlt

Thank you very much for your feedback. You are correct, and we will work on optimizing this issue in the future.

zhihuiwan avatar Aug 23 '23 03:08 zhihuiwan