EventMachine-LE icon indicating copy to clipboard operation
EventMachine-LE copied to clipboard

Set socket CLOEXEC

Open ibc opened this issue 14 years ago • 30 comments

https://github.com/eventmachine/eventmachine/pull/298

The description is really elaborated, and just involves setting FD_CLOEXEC on a file descriptor, which solves the described issue.

Is it ok to add this pull request into EventMachine-LE?

ibc avatar Mar 01 '12 15:03 ibc

On Mar 1, 2012, at 16:08, Iñaki Baz Castillo wrote:

https://github.com/eventmachine/eventmachine/pull/298

The description is really elaborated, and just involves setting FD_CLOEXEC on a file descriptor, which solves the described issue.

Is it ok to add this pull request into EventMachine-LE?

When I looked at it, it seemed this essentially was working on Linux only. If this is needed, why not make it work on OSX as well? But I haven't looked closer.

Grüße, Carsten

cabo avatar Mar 01 '12 15:03 cabo

I didn't know that. Of course making it to work on Mac/BSD is the way to go.

ibc avatar Mar 01 '12 15:03 ibc

On Mar 1, 2012, at 16:16, Iñaki Baz Castillo wrote:

I didn't know that. Of course making it to work on Mac/BSD is the way to go.

The comment in the pull request is a bit over the top :-)

Hmm, this pull request only has a single change which indeed should work on both Linux and Unix. But I saw a larger one, which covers more of the problem,...

Ah, that was 283. I think this is more complete, but also has some other elements. I think we should pick this up, but it will require some work to separate it into a keepalive patch and a cloexec patch and to smooth out the latter one.

Grüße, Carsten

cabo avatar Mar 01 '12 15:03 cabo

But the keepalive is not needed since EM already supports setting socket options, right? In fact I use it :)

Ok, I will check pull 283 and create a branch for including just the code about cloexec stuff.

ibc avatar Mar 01 '12 15:03 ibc

Is that pull not just for Linux?:

https://github.com/eventmachine/eventmachine/pull/283/files#diff-5

 when /linux/
   add_define 'HAVE_EPOLL' if have_func('epoll_create', 'sys/epoll.h')
+  add_define 'HAVE_ACCEPT4'
+  add_define 'HAVE_SOCK_CLOEXEC'

So HAVE_SOCK_CLOEXEC is not set for bsd/mac.

ibc avatar Mar 01 '12 15:03 ibc

On Mar 1, 2012, at 16:31, Iñaki Baz Castillo wrote:

But the keepalive is not needed since EM already supports setting socket options, right? In fact I use it :)

Ok, I will check pull 283 and create a branch for including just the code about cloexec stuff.


Reply to this email directly or view it on GitHub: https://github.com/ibc/EventMachine-LE/issues/5#issuecomment-4260126

Look at the fun gymnastics in

http://lists.manyfish.co.uk/pipermail/neon/2009-June/001021.html

This is hard to get right.

Grüße, Carsten

cabo avatar Mar 01 '12 15:03 cabo

Mmm, so maybe we should also check the Linux kernel version in extconf.rb before defining HAVE_SOCK_CLOEXEC? is that possible?

BTW I've already commited a new branch "cloexec" which includes CLOEXEC stuff from pull 283. It compiles and passes all the test in my ubuntu (kernel 3.0.0 64 bits).

ibc avatar Mar 01 '12 15:03 ibc

On Mar 1, 2012, at 16:47, Iñaki Baz Castillo wrote:

So HAVE_SOCK_CLOEXEC is not set for bsd/mac.

Right. So what it should do, is:

  1. for Unix, use fcntl/FD_CLOEXEC, and rely on the fact that this is inherited through accept().
  2. for Linux, try to use SOCK_CLOEXEC, but react to EINVAL by falling back to FD_CLOEXEC. Don't rely on inheritance.

This is portability :-/

Grüße, Carsten

cabo avatar Mar 01 '12 15:03 cabo

On Mar 1, 2012, at 16:54, Iñaki Baz Castillo wrote:

BTW I've already commited a new branch "cloexec" which includes CLOEXEC stuff from pull 283. It compiles and passes all the test in my ubuntu (kernel 3.0.0 64 bits).

Great. But, first, we need to write tests for the cloexec behavior we are trying to achieve. All kinds of sockets coming into existence need to be tested for the cloexec flag.

Grüße, Carsten

cabo avatar Mar 01 '12 16:03 cabo

I will ask the author of https://github.com/eventmachine/eventmachine/pull/298/ to write a test unit.

ibc avatar Mar 01 '12 16:03 ibc

Hi, about your suggestion for portability, I must recognize I have no enough knowledge about socket programming so I'm a bit afraid if I have to try that stuf. Could you please give the required love to the branch? (when you can, of course).

ibc avatar Mar 01 '12 16:03 ibc

Good news. I've made a code that demostrates the bug described in https://github.com/eventmachine/eventmachine/pull/298, and indeed applying CLOEXEC fixes it. I'm creating a test_unit right now to be included in the "cloexec" branch. More news later.

ibc avatar Mar 06 '12 13:03 ibc

Done! and it indeed works!

ibc avatar Mar 06 '12 13:03 ibc

One more question about "cloexec" branch:

The original pull request contains in extconf.rb

when /linux/
  [....]
  add_define 'HAVE_ACCEPT4'

I've read that accept4 is available on Linux starting from kernel 2.6.28, and support in glibc is available starting with version 2.10. Problem (copy&pasted from this nginx thread):

What happens when host has new glibc but old kernel? I assume this test would succeed, but accept4() call will fail with ENOSYS.

So, how could we ensure that accept4 is indeed available? Both the kernel and glibc versions should be checked before, right? how to do that in extconf.rb?

ibc avatar Mar 06 '12 15:03 ibc

On Mar 6, 2012, at 16:33, Iñaki Baz Castillo wrote:

One more question about "cloexec" branch:

The original pull request contains in extconf.rb

when /linux/
 [....]
 add_define 'HAVE_ACCEPT4'

I've read that accept4 is available on Linux starting from kernel 2.6.28, and support in glibc is available starting with version 2.10. Problem (copy&pasted from this nginx thread):

What happens when host has new glibc but old kernel? I assume this test would succeed, but accept4() call will fail with ENOSYS.

That is about 1/2 of the portability problems I talked about.

So, how could we ensure that accept4 is indeed available? Both the kernel and glibc versions should be checked before, right? how to do that in extconf.rb?

You can't, because the machine where the gem is built may not be the same where it is run. So, at runtime, you try it, and switch off the capability (i.e., switch to fcntl(2)) if it doesn't work. Don't you love Linux portability.

I wouldn't do this in 1.1.0. Leave something for 1.1.1...

Grüße, Carsten

cabo avatar Mar 06 '12 15:03 cabo

What do you mean with "the machine where the gem is built may not be the same where it is run"? Correct me if I'm wrong, but when running "gem install xxxxx" in a host, the extconf.rb is executed so there, in extconf.rb, there could be some way to detect Kernel and glibc version before compiling the Ruby C/C++ extension, am I wrong?

PS: Ok, I will create a new milestone 1.1.1 for this ticket.

PS2: So, can we release 1.1.0? :)

ibc avatar Mar 06 '12 15:03 ibc

On Mar 6, 2012, at 16:40, Iñaki Baz Castillo wrote:

What do you mean with "the machine where the gem is built may not be the same where it is run"?

Somebody might install a new kernel or revert to an old one. The gem directory might be rsynced to many machines, not all of which are completely identical. etc.

Grüße, Carsten

cabo avatar Mar 06 '12 16:03 cabo

Ok, so let this ticket for a future version.

BTW: The difference between accept() and accept4() is that accept4() saves a single system call. Do we really need it in a Ruby server? (probably creating a String consumes more resources).

ibc avatar Mar 06 '12 16:03 ibc

BTW: for comparing version numbers without depending on external Gems like "versionomy" (reply got in ruby-talk):

class String
  def to_v
    scan(/\d+/).map(&:to_i).extend Comparable
  end
end

"2.6.28".to_v >= "2.6.3".to_v
=> true

ibc avatar Mar 06 '12 16:03 ibc

On Mar 6, 2012, at 17:08, Iñaki Baz Castillo wrote:

Ok, so let this ticket for a future version.

BTW: The difference between accept() and accept4() is that accept4() saves a single system call. Do we really need it in a Ruby server? (probably creating a String consumes more resources).

Once you have accepted the connection, there is somebody on the other end that may be interested to know when it goes away. So there may be a race condition if the thread/process doing the sequence of accept() and fcntl() is interrupted exactly between the two and decides to exec right then. Well, maybe a bit theoretical of us, but there are good reasons in a more general server..

Grüße, Carsten

cabo avatar Mar 06 '12 16:03 cabo

Kgio (Eric Wong) uses a good solution:

have_func('accept4', %w(sys/socket.h))

If true, it defines HAVE_ACCEPT4.

http://bogomips.org/kgio.git/tree/ext/kgio/extconf.rb http://bogomips.org/kgio.git/tree/ext/kgio/missing_accept4.h

Still some stuff about detecting CLOEXEC support remains, but IMHO we just need the following:

  • extconf.rb:

    have_func('accept4', %w(sys/socket.h))  # If true it sets HAVE_ACCEPT4
    
  • project.h:

    #ifdef SOCK_CLOEXEC   # This is set when loading some bits/socket.h
    #define EM_CLOEXEC SOCK_CLOEXEC
    #else
    #define EM_CLOEXEC 0
    #endif
    

#ifdef HAVE_ACCEPT4 #define EM_ACCEPT accept4 #else #define EM_ACCEPT accept #endif

I will push it into "cloexec" branch right now (just testing).

ibc avatar Mar 07 '12 10:03 ibc

Hope it looks ok: https://github.com/ibc/EventMachine-LE/commit/4d21cf39794d740f307b3efad59fe1e5f1edcce9

ibc avatar Mar 07 '12 11:03 ibc

I would define EM_ACCEPT so that it supplies the additional parameter by itself (instead of in every place it is called).

The parameter should be a variable sock_cloexec, initialized to SOCK_CLOEXEC. If we get an EINVAL, that should be reset to 0 and the call retried with the new value.

Something like this. Can't look at this in earnest until April...

Grüße, Carsten

cabo avatar Mar 07 '12 11:03 cabo

I agree, however:

According to man accept4, if a system implements accept4 then it does also implement CLOEXEC flag (since an optional flag for accept4() is SOCK_CLOEXEC. Am I right?

Unfortunatelly in the "cloexec" branch there are calls like int sd = socket (family, SOCK_DGRAM|EM_CLOEXEC, 0);. Why is EM_CLOEXEC passed there??? According to socket() man, the second parameter is socket_type which can be SOCK_STREAM, SOCK_DGRAM... why to add EM_CLOEXEC (so SOCK_CLOEXEC) there???

Second question:

accept4 also accepts SOCK_NONBLOCK as flag (it saves extra calls to fcntl) so we could bypass the call to SetSocketNonblocking in ed.cpp. Is it ok?

ibc avatar Mar 07 '12 12:03 ibc

Ok, newer versions of socket() does include SOCK_CLOEXEC, my fault: http://www.kernel.org/doc/man-pages/online/pages/man2/socket.2.html

Please check the new commit https://github.com/ibc/EventMachine-LE/commit/771b629819c049c048f6051e1af41c58bd5a7e53

I've tested it in a Linux new and old kernels (with accept4 and without it).

ibc avatar Mar 07 '12 12:03 ibc

The code has been merged into "master" branch (not in the most ellegant way...).

ibc avatar Mar 07 '12 17:03 ibc

And now the CLOEXEC stuff has been removed from master until 1.1.1 :)

ibc avatar Mar 07 '12 19:03 ibc

Hi, should we continue with this issue for 1.1.5?

ibc avatar Oct 09 '12 22:10 ibc

Just curious, what was the eventual resolution on this issue?

artob avatar May 04 '13 20:05 artob

Please let me some days to check it again. I don't remember right now the status of this issue and the associated branch.

ibc avatar May 06 '13 14:05 ibc