Logging messages can block
This gem can block execution, effectively allowing a website to be DOS'd, if too many messages are sent at the same time. The TCPSocket is set to be synchronous, so the only buffering of messages is done in the kernel. But writes to the socket use write instead of write_nonblock, so if the kernel buffer fills up the write will block and the ruby thread being executed will just need to wait until data is cleared from the kernel buffer.
Compounding the issue, there is only one TCPSocket connection for all threads of a process. If a web app uses puma or some other kind of threaded model, then as soon as the kernel buffer fills up all threads will need to wait until buffer room is freed up.
To partially solve this issue, I think the write needs to be turned into a write_nonblock. The gem could then detect overflows and discard messages when that happens. It could also buffer some messages if they overflow, but that would just be a nicety.
To solve the issue further, there probably should be a separate connection per thread. That would allow for better scalability of socket buffering when comparing a single threaded process and a process with lots of threads trying to write messages. But this is also just a nicety on top.
To solve the issue further, there probably should be a separate connection per thread.
It means socket should be created on each send or stored socket on TLS? Could you show me an implementation example?
@repeatedly: stored socket on TLS (thread-local-storage, at first I thought you meant TLSv1.0 like SSL :).
Something like this:
def connect!
con = TCPSocket.new(@host, @port)
# More setup here
con
end
def send_data(data)
Thread.current[:fluent_logger_connection] ||= connect!
con = Thread.current[:fluent_logger_connection]
con.write data
end
But again, this is extra niceness. The real issue is the blocking writes. That should be addressed first.
any update? this could be major issue if you're sending data from web server and when there is some problem with td-agent so that this logger totally block whole web server and it can easily crashed (which is the issue we already having now)
I will investigate this issue in next week.
I want to reproduce this issue, but I cannot reproduce this for now. I wrote very small script for this.
require "fluent-logger"
Fluent::Logger::FluentLogger.open(nil, :host=>'localhost', :port=>24224)
a = []
start = Time.now
100.times do |i|
a << Thread.start(i) do
puts "start#{i}"
1000.times do
Fluent::Logger.post("myapp.access", ENV.to_hash)
end
puts "end#{i}"
end
end
a.each(&:join)
p :done
p Time.now - start
<source>
@type forward
port 24224
</source>
<match *.**>
@type stdout
</match>
@txase @breath103 Please try #39.
@okkez thanks! sorry for late reply. i'll let you know after checking out this
Try setting up a logger to send to "localhohoho.com", and wait, wait, wait... at the logger creation time, and every time the connection is retried again.
any updates on this? it's been an year since last update.
Here is the patch: https://github.com/fluent/fluent-logger-ruby/pull/39 We need the feedback on several environment...