when disk is full, fluentd throws away all logs it receives
It doesn't crash, it doesn't attempt to forward the messages without writing writing them to an on-disk buffer, it just logs unexpected error while checking flushed chunks. ignored. error_class=Errno::ENOSPC error="No space left on device... and throws away the data. This is obviously not a good way to handle that error.
Yeah. You are right. We should handle ENOSPC carefully. We will fix it.
I dealt with this problem several times. But in my case buffer chunk files got corrupted and fluentd (v0.12 in my case) was forwarding garbage.
I guess we could update Fluent::FileBufferChunk#<< do something like this
# lib/fluent/plugin/buf_file.rb
def <<(data)
file_pos = @file.pos
if @file.write(data) == data.bytesize
@size += data.bytesize
else
@file.truncate(file_pos)
end
end
POSIX has the following note about the write() function (http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) says:
If a write() requests that more bytes be written than there is room for (for example, the file size limit of the process or the physical end of a medium), only as many bytes as there is room for shall be written. For example, suppose there is space for 20 bytes more in a file before reaching a limit. A write of 512 bytes will return 20. The next write of a non-zero number of bytes would give a failure return (except as noted below).
So we could rollback partial write operations in order to avoid leaving buffer file chunks in corrupted state.
Has this been resolved in version 1?
@bootjp nope.
I just ran into this issue again on a system I haven't been monitoring very carefully. The buffer hit some limit on the number of messages after about 60GB of logs and started throwing the error "buffer space has too many data". Fluentd was still accepting logs as if nothing was wrong, so none of the upstream systems noticed. I guess this started about a month ago, so now I've lost several hundred GB worth of log messages.
have some sloution for this error ?
Also running into this. Any updates?
My solution was to stop using fluentd. I came to the realization that if logs are important enough to keep around, then what I want isn't really a logging system, it's a database.