FLUME-3149 reduce cpu cost for taildir file source while still maintaining reliability by using posFile in memory channel
File channel tracks transferred events and use transnational mechanism to make transfer recoverable. However, it increases CPU cost due to frequent system calls like write, read, etc. The Cpu cost could be very high if the transfer rate is high. In contrast, Memory channel has no such issue which requires only about 10% of CPU cost in the same environment but it's not recovered if the system is down accidentally. For sources like taildir, I propose we could write position file in memory channel to achieve reliability and reduce CPU cost. After testing on my own production environment, CPU usage dropped from 13% to 3% and still maintain reliability. (Transfer rate: 1Mb/s , kafka sink, file channel -> memory channel with pos file)
I think this change is essentially reaching reliability in MemoryChannel when using taildirSource. So the performance diff is essentially the diff of MemoryChannel and FileChannel. In my tests it saves more than 90% percent of CPU but it's no easy way of comparing it by a simple command. Perhaps I should add some tests to verify the reliability when using tailDirSource and MemoryChannel?
Can one of the admins verify this patch?