Failed to save data for plugin storage under antivirus solution
Hello, i'm running fluentd (1.10.2) using td-agent (3.7.1) on Windows 2016
My FluentD config looks like:
<match>
flush_interval 1s
host xxx
port 12345
protocol udp
@type gelf
</match>
<filter>
@type record_transformer
<record>
message ${record[description]}
</record>
</filter>
<source>
channels application,system,security
@id windows_eventlog
<storage>
persistent true
@type local
</storage>
tag windows_eventlog
@type windows_eventlog
</source>
<system>
root_dir C:/opt/td-agent
</system>
And i'm seeing the following errors coming through on the logging stream:
[windows_eventlog] failed to save data for plugin storage to file path="C:/opt/td-agent/worker0/windows_eventlog/storage.json" tmp="C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp, C:/opt/td-agent/worker0/windows_eventlog/storage.json)"
I checked and can see that this directory does exist and the file C:/opt/td-agent/worker0/windows_eventlog/storage.json is created and has the following contents:
{"application":[541177,41519],"system":[648,54300],"security":[8973443,1]}
Any ideas why i might be getting a permission denied error? It appears that the service is running as System:

Not sure what else to check, any thoughts?
Are you running the td-agent as admin?
It's required if your are getting logs from channels like "Security"
@flurreN I haven't changed the Log On settings of the service. Does the default SYSTEM user not have access to the Security channel? Should i change the Log On setting to a local user with Administator rights?
@flurreN I have changed the service to login as local admin and no change. Still seeing the errors above and now that you mention the Security channel i'm not seeing logs from that either. Any thoughts on next-steps for debugging this?
I manage to replicate your issue by denying write permission for SYSTEM on the opt folder. Which makes sense. The error is actually saying it dont have permission to write to C:\opt\td-agent\worker0\windows_eventlog\storage.json
Can you double check that you are not denying any SYSTEM permission on the storage.json file?
@flurreN I confirmed that all folders and files in the path including C:\opt\td-agent\worker0\windows_eventlog\storage.json are all writable by SYSTEM.

I also confirmed that the ruby processes are indeed running as SYSTEM in Task Manager:

What is interesting in this is that i see the timestamp of the storage.json file being updated and the storage.json.tmp files coming and going.
I also managed to replicate the issue by having it run as a service and from the td-agent command promt at the same time. I guess they are both trying to access it by the same time, do you think you running anything in the background?
@flurreN I don't think i'm running anything else in the background. Here are the current Ruby / FluentD processes running:

Do any of those look like duplicates or different from your test system?
Its also running 3x Ruby interpeter for me so that should be fine
Try storing with a path
<storage>
@type local
persistent true
path "c:/opt/td-agent/storage.json"
</storage>
@flurreN tried that config and am still seeing the error:
[windows_eventlog] failed to save data for plugin storage to file path="c:/opt/td-agent/storage.json" tmp="c:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (c:/opt/td-agent/storage.json.tmp, c:/opt/td-agent/storage.json)"
@nmaludy Hmm not sure, I think the devs would know more about this.
Also you should use in_windows_eventlog2 it has more option and will replace @in_windows_eventlog, it will probably be what you are looking for when it comes to eventlogs in the end anyway. I'm not sure if it will fix your issue but who knows, maybe.
@flurreN i tried your suggestion of in_windows_eventlog2 but am seeing the worker fail to start with the following error in the td-agent.log:
2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `memory'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `parse_memory'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:83:in `parse'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:163:in `bookmark_validator'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:138:in `subscribe_channel'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:131:in `block in start'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `each'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `start'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:200:in `block in start'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `each'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `each'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `lifecycle'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:199:in `start'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:248:in `start'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:147:in `run'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:600:in `block in run_worker'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:837:in `main_process'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:594:in `run_worker'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/command/fluentd.rb:350:in `<top (required)>'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/bin/fluentd:8:in `<top (required)>'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `load'
2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `<main>'
2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
2020-06-08 08:31:59 -0400 [error]: #0 suppressed same stacktrace
2020-06-08 08:31:59 -0400 [info]: Worker 0 finished unexpectedly with status 1
I copy/pasted the config snippet from the link you sent me.
Fixed that error by deleting the old storage.json file i had in the same place. Looks like the new plugin has a different storage format. The plugin is running now, but i'm still seeing errors related to renaming the storage file.
Did you try to run the config with the td-agent command promt as admin (not as a service)? Not sure if it would give another result but might be worth a try. If nothing works I would also try to reinstall the whole td-agent. @nmaludy
@flurreN No change running it directly from the td-agent command prompt as Administrator. Also with this setup it still isn't able to read the Security channel.
Also with the windows_eventlog2 setup, i was never able to get it to read from any channel. No logs seemed to have ever been sent to the central logging server. Debugging this further now.
On the windows_eventlog2 setup, i figured out the problem. The Keywords field was being sent as a String and our old logger was sending it as an Integer, therefore our ElasticSearch index had type Integer and we were getting index failures when trying to ingest the logs from windows_eventlog2. To fix this i setup the following filter to remove the Keywords field (i don't care about it):
<filter winevt.**>
@type record_transformer
remove_keys Keywords
</filter>
@flurreN So, switching to windows_eventlog2 and applying the filter above to fix my indexing problem has solved the issue related to receiving the Security stream. Now, even the Service instance of td-agent / fluentd is reading the Security messages when running as the default SYSTEM user.
However, back to my original issue, still seeing errors in the logs related to renaming the temporary storage file:
2020-06-08 10:28:51 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"
2020-06-08 10:29:11 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"
For anyone else out there watching here is my working config:
<filter windows_eventlog**>
@type record_transformer
enable_ruby
<record>
full_message ${record["Description"]}
short_message ${record["Description"].lines.first}
</record>
remove_keys Keywords,Description
</filter>
<match windows_eventlog**>
flush_interval 1s
host graylog.domain.tld
port 12201
protocol udp
@type gelf
</match>
<source>
@type windows_eventlog2
@id windows_eventlog2
channels ["application", "system", "security"]
read_existing_events false
read_interval 2
tag windows_eventlog
render_as_xml true
<storage>
@type local
persistent true
path C:/opt/td-agent/storage.json
</storage>
<parse>
@type winevt_xml
preserve_qualifiers true
</parse>
</source>
<system>
root_dir C:/opt/td-agent
</system>
Just an update on this ticket, it looks like it was our Anti-Virus agent that was causing the issue. I worked with our security team to disable the AV agent on this host temporarily and the problem went away.
Looks like the AV agent was "grabbing" the the storage.json and storage.json.tmp files when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.
I haven't figured out why the rename DOES eventually succeed, but turning off AV resolved this.
Maybe there is another way we can operate on Windows to prevent this scenario in the future?
@nmaludy Interesting and it makes sense actually. Im curious, which AV are you using?
I agree with you. The storage code is in this repo https://github.com/fluent/fluentd/ and I saw something similar in this issue https://github.com/fluent/fluentd/issues/2446
Looks like the AV agent was "grabbing" the the
storage.jsonandstorage.json.tmpfiles when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.
When Windows grubs files, they cannot be renamed eventually an administrator. This is because Windows requests greater than SYSTEM permission (It is strange and impossible permission!) to handle other processes handling files.
Yeah, disabling AV agent should be reasonable but a bit of dangerous. IMO, AV agent should be able to specify excluding scanning folder. in_windows_eventlog2 needs to rename storage data written files for managing bookmarks.
Hello. Is there are any updates on this?
@cosmo0920 @ashie is there any update on this issue?
@kashok7474 Please share your configuration and circumstances. Is the same for the comment? If not, could you post your configuration? Thanks.
I've run into this same issue and created a ticket under Fluentd (should have been here for the plugin which I now know). Details here: https://github.com/fluent/fluent-plugin-windows-eventlog/issues/103