benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

ConnectionResetError: [Errno 104] Connection reset by peer

Open rl-2 opened this issue 4 years ago • 10 comments

Hello,

I'm trying to train a PPO agent with Stable Baselines, followed by the instructions on Sec 5.2.2. After running ./TrainAndTestOpenAIStableBaselines.sh within_template, I got the following error:

Traceback (most recent call last):
  File "OpenAI_StableBaseline_Train.py", line 231, in <module>
    range(c.num_worker)])
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I wonder if I miss a step to activate the ScienceBird application? Please let me know.

Thank you!

rl-2 avatar Nov 22 '21 19:11 rl-2

Hi Rodger, please try the new version and let me know if the issue persists. Thanks.

Cheng-Xue avatar Nov 28 '21 23:11 Cheng-Xue

Hi Cheng, it seems the issue is still there. Here is a full log:

Error in client-server communication: [Errno 111] Connection refused
Process ForkServerProcess-20:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 24, in _worker
    env = env_fn_wrapper.var()
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Utils/utils.py", line 64, in _init
    max_attempts_per_level=max_attempts_per_level)
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 78, in __init__
    self.connect_agent_to_server()
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 88, in connect_agent_to_server
    self.ar.configure(self.env_id)
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 171, in configure
    self.playing_mode.value
  File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 131, in _send_command
    self.server_socket.sendall(msg)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "OpenAI_StableBaseline_Train.py", line 231, in <module>
    range(c.num_worker)])
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

rl-2 avatar Nov 29 '21 19:11 rl-2

To follow up on this issue, I initialized the game server before running the script and I got the similar issue:

021-11-30 00:57:35,012 - OpenAI stable baselines Training and Testing - INFO - training step: 0
Server started...
Error in client-server communication: [Errno 111] Connection refused

On the server side, it seems it has been killed automatically:

The Science Birds Server is waiting for the first agent to connect
Waiting for agent  
Killed

rl-2 avatar Nov 30 '21 01:11 rl-2

Hi Rodger, the problem should still be that the game server is not successfully initialised. Can you provide the exact environment you are using so that we can replicate the issue? Thanks.

Cheng-Xue avatar Nov 30 '21 21:11 Cheng-Xue

Thanks, Cheng. Below is the environments info:

  • Ubuntu: 18.04.6 LTS
  • Python: 3.7.10
  • Numpy: 1.18.5
  • Torch: 1.10.0
  • Torchvision: 0.8.2
  • lxml: 4.6.3
  • tensorboard: 2.7.0
  • Java: 13.0.4
  • stable-baselines: 1.3.0

And the steps I've taken are:

  1. Run java -jar ./game_playing_interface.jar and the terminal shows:
The Science Birds Server is waiting for the first agent to connect
Waiting for agent 
  1. Run ./TrainAndTestOpenAIStableBaselines.sh within_template. Then I got the errors shown in this thread.

rl-2 avatar Nov 30 '21 22:11 rl-2

Hi Luo, I have updated a version. The new version will open a new terminal window to run the server. Please let me know if the problem still exist. Cheers.

Cheng-Xue avatar Dec 06 '21 01:12 Cheng-Xue

Hi Cheng,

Thanks a ton for the update! I saw this error when I run the code:

sh: 1: gnome-terminal: not found

Note that I'm running the code on an AWS instance. I'm not sure it prevents launching a new terminal window?

rl-2 avatar Dec 06 '21 23:12 rl-2

Hi Rodger, it is a bit tricky to run on AWS, although we did our test on AWS as well, it only supports 'symbolic' mode atm. The initial version (you can activate it by setting self.headless_server = True at line 10 in Server.py.

Can you please verify if the following code can successfully run start the server?

bash -c "cd ../sciencebirdsgames/Linux && nohup java -jar ./game_playing_interface.jar --headless --dev > out 2>&1 &"

Cheng-Xue avatar Dec 07 '21 03:12 Cheng-Xue

I also have a question regarding server.py.
You used 3 conditions; self.if_head, self.headless_server, self.state_repr_type.

  1. --dev > out 2>&1 option is added in line 22, 33, 43, 52 (when self.headless_server==True).
    Isn't this option correspond to self.state_repr_type?

  2. --headless option is added in line 22, 27, 43, 47 (when self.if_head==False and self.state_repr_type=='symbolic or when self.if_head=='headless').
    This obviously looks like wrong code, since you didn't add self.state_repr_type condition later on (i.e. elif and else).
    Also, I don't get why you added similarly functioning conditions self.if_head and self.headless_server.
    Can you explain me about this?

hawe66 avatar Feb 02 '24 06:02 hawe66

I also have a question regarding server.py. You used 3 conditions; self.if_head, self.headless_server, self.state_repr_type.

  1. --dev > out 2>&1 option is added in line 22, 33, 43, 52 (when self.headless_server==True). Isn't this option correspond to self.state_repr_type?
  2. --headless option is added in line 22, 27, 43, 47 (when self.if_head==False and self.state_repr_type=='symbolic or when self.if_head=='headless'). This obviously looks like wrong code, since you didn't add self.state_repr_type condition later on (i.e. elif and else). Also, I don't get why you added similarly functioning conditions self.if_head and self.headless_server. Can you explain me about this?

Hi Hawe,

Apologies for the delay in getting back to you.

Regarding your questions:

The addition of --dev > out 2>&1 corresponds to the use of symbolic states. But when the image representation is used, the agent will not read from the symbolic states, so adding --dev will not alter the result.

When self.state_repr_type == "symbolic", the agent requests symbolic state representation from the server. The presence of --dev ensures accurate information retrieval. Conversely, when self.state_repr_type != "symbolic", the agent doesn't engage with symbolic representation and requests only the images.

Regarding the presence of both self.headless_server and self.if_head, it was an issue during our code refactoring. We are planning to integrate the Java server directly into Unity for improved usability without additional configurations. We're committed to addressing these concerns and improving code readability in our next release.

Please let me know if you have future questions or would like more clarifications.

Cheers, Cheng

Cheng-Xue avatar Feb 20 '24 02:02 Cheng-Xue