agent-trainer icon indicating copy to clipboard operation
agent-trainer copied to clipboard

Train model in debian server

Open jacktang opened this issue 9 years ago • 3 comments

Hi

I cloned the code base in my debian server and installed deps according the README guide. make test run very well, and I tried make train-new, it produced the output:

$ make train-new
python -m agent train-new
20161007|17:50:01|INFO: Session id 201610071749: training
20161007|17:50:01|INFO: Session 201610071749 at episode 0: q network ref<agent.trainer.q_network.QNetwork object at 0x7f90900b4550>; replay memories len 0; final iteration: 0
SDL Initialization Failed: No available video device
Makefile:26: recipe for target 'train-new' failed
make: *** [train-new] Segmentation fault

How can I fix the error? Thanks!

jacktang avatar Oct 07 '16 09:10 jacktang

Hi there,

Is the server headless, does it have a window server running? Also, do you have the appropriate video drivers installed on the server?

Also, you can deploy agent-trainer to a remote machine using agent-trainer-deployer. It was tested on CentOS 7, but I think it could be somewhat easily portable to other distros, since I think the biggest difference may lie on the package manager used for installing dependencies.

Xvfb (X virtual framebuffer) is used in agent-trainer-deployer to handle headless servers.

lopespm avatar Oct 10 '16 09:10 lopespm

Thanks for the reponse :) I switched to agent-trainer-docker and xvfb was installed, when I run make train-new, it printed:

root@3c44b3f8aeb0:/home/agent-trainer# make train-new
python -m agent train-new
20161012|05:29:30|INFO: Session id 201610120529: training
20161012|05:29:30|INFO: Session 201610120529 at episode 0: q network ref<agent.trainer.q_network.QNetwork object at 0x2abf9bf6d810>; replay memories len 0; final iteration: 0
SDL Initialization Failed: Failed to connect to the Mir Server
make: *** [train-new] Segmentation fault (core dumped)

Did you run the application in that docker? I will look throught agent-trainer-deployer later tonight.

jacktang avatar Oct 12 '16 05:10 jacktang

I have retried the process on a clean (headless) CentOS7 machine, and ran agent-trainer successfully inside the container created by the agent-trainer-docker cpu image. If you do run ps -aux inside the container, is Xvfb running?

lopespm avatar Dec 02 '16 02:12 lopespm