fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

when disk is full, fluentd throws away all logs it receives

Open notslang opened this issue 8 years ago • 7 comments

It doesn't crash, it doesn't attempt to forward the messages without writing writing them to an on-disk buffer, it just logs unexpected error while checking flushed chunks. ignored. error_class=Errno::ENOSPC error="No space left on device... and throws away the data. This is obviously not a good way to handle that error.

notslang avatar Sep 22 '17 16:09 notslang

Yeah. You are right. We should handle ENOSPC carefully. We will fix it.

repeatedly avatar Sep 29 '17 03:09 repeatedly

I dealt with this problem several times. But in my case buffer chunk files got corrupted and fluentd (v0.12 in my case) was forwarding garbage.

I guess we could update Fluent::FileBufferChunk#<< do something like this

# lib/fluent/plugin/buf_file.rb
    def <<(data)
      file_pos = @file.pos

      if @file.write(data) == data.bytesize
        @size += data.bytesize
      else
        @file.truncate(file_pos)
      end
    end

POSIX has the following note about the write() function (http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) says:

If a write() requests that more bytes be written than there is room for (for example, the file size limit of the process or the physical end of a medium), only as many bytes as there is room for shall be written. For example, suppose there is space for 20 bytes more in a file before reaching a limit. A write of 512 bytes will return 20. The next write of a non-zero number of bytes would give a failure return (except as noted below).

So we could rollback partial write operations in order to avoid leaving buffer file chunks in corrupted state.

soylent avatar Oct 06 '17 10:10 soylent

Has this been resolved in version 1?

bootjp avatar May 16 '19 07:05 bootjp

@bootjp nope.

I just ran into this issue again on a system I haven't been monitoring very carefully. The buffer hit some limit on the number of messages after about 60GB of logs and started throwing the error "buffer space has too many data". Fluentd was still accepting logs as if nothing was wrong, so none of the upstream systems noticed. I guess this started about a month ago, so now I've lost several hundred GB worth of log messages.

notslang avatar Jul 09 '19 06:07 notslang

have some sloution for this error ?

linbingdouzhe avatar Oct 29 '20 01:10 linbingdouzhe

Also running into this. Any updates?

govindrai avatar Mar 25 '21 16:03 govindrai

My solution was to stop using fluentd. I came to the realization that if logs are important enough to keep around, then what I want isn't really a logging system, it's a database.

notslang avatar Mar 25 '21 16:03 notslang