fluent-plugin-windows-eventlog icon indicating copy to clipboard operation
fluent-plugin-windows-eventlog copied to clipboard

Failed to save data for plugin storage under antivirus solution

Open nmaludy opened this issue 5 years ago • 25 comments

Hello, i'm running fluentd (1.10.2) using td-agent (3.7.1) on Windows 2016

My FluentD config looks like:

<match>
  flush_interval 1s
  host xxx
  port 12345
  protocol udp
  @type gelf
</match>

<filter>
  @type record_transformer
  <record>
    message ${record[description]}
  </record>
</filter>

<source>
  channels application,system,security
  @id windows_eventlog
  <storage>
    persistent true
    @type local
  </storage>
  tag windows_eventlog
  @type windows_eventlog
</source>

<system>
  root_dir C:/opt/td-agent
</system>

And i'm seeing the following errors coming through on the logging stream:

[windows_eventlog] failed to save data for plugin storage to file path="C:/opt/td-agent/worker0/windows_eventlog/storage.json" tmp="C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/worker0/windows_eventlog/storage.json.tmp, C:/opt/td-agent/worker0/windows_eventlog/storage.json)"

I checked and can see that this directory does exist and the file C:/opt/td-agent/worker0/windows_eventlog/storage.json is created and has the following contents:

{"application":[541177,41519],"system":[648,54300],"security":[8973443,1]}

Any ideas why i might be getting a permission denied error? It appears that the service is running as System: image

Not sure what else to check, any thoughts?

nmaludy avatar Jun 05 '20 18:06 nmaludy

Are you running the td-agent as admin?

It's required if your are getting logs from channels like "Security"

philipsabri avatar Jun 05 '20 18:06 philipsabri

@flurreN I haven't changed the Log On settings of the service. Does the default SYSTEM user not have access to the Security channel? Should i change the Log On setting to a local user with Administator rights?

nmaludy avatar Jun 05 '20 19:06 nmaludy

@flurreN I have changed the service to login as local admin and no change. Still seeing the errors above and now that you mention the Security channel i'm not seeing logs from that either. Any thoughts on next-steps for debugging this?

nmaludy avatar Jun 05 '20 19:06 nmaludy

I manage to replicate your issue by denying write permission for SYSTEM on the opt folder. Which makes sense. The error is actually saying it dont have permission to write to C:\opt\td-agent\worker0\windows_eventlog\storage.json

Can you double check that you are not denying any SYSTEM permission on the storage.json file?

philipsabri avatar Jun 05 '20 19:06 philipsabri

@flurreN I confirmed that all folders and files in the path including C:\opt\td-agent\worker0\windows_eventlog\storage.json are all writable by SYSTEM.

image

I also confirmed that the ruby processes are indeed running as SYSTEM in Task Manager:

image

nmaludy avatar Jun 05 '20 19:06 nmaludy

What is interesting in this is that i see the timestamp of the storage.json file being updated and the storage.json.tmp files coming and going.

nmaludy avatar Jun 05 '20 19:06 nmaludy

I also managed to replicate the issue by having it run as a service and from the td-agent command promt at the same time. I guess they are both trying to access it by the same time, do you think you running anything in the background?

philipsabri avatar Jun 05 '20 19:06 philipsabri

@flurreN I don't think i'm running anything else in the background. Here are the current Ruby / FluentD processes running:

image

Do any of those look like duplicates or different from your test system?

nmaludy avatar Jun 05 '20 19:06 nmaludy

Its also running 3x Ruby interpeter for me so that should be fine

Try storing with a path

<storage>
    @type local
    persistent true
    path "c:/opt/td-agent/storage.json"
</storage>

philipsabri avatar Jun 05 '20 20:06 philipsabri

@flurreN tried that config and am still seeing the error:

[windows_eventlog] failed to save data for plugin storage to file path="c:/opt/td-agent/storage.json" tmp="c:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (c:/opt/td-agent/storage.json.tmp, c:/opt/td-agent/storage.json)"

nmaludy avatar Jun 05 '20 23:06 nmaludy

@nmaludy Hmm not sure, I think the devs would know more about this.

Also you should use in_windows_eventlog2 it has more option and will replace @in_windows_eventlog, it will probably be what you are looking for when it comes to eventlogs in the end anyway. I'm not sure if it will fix your issue but who knows, maybe.

philipsabri avatar Jun 06 '20 10:06 philipsabri

@flurreN i tried your suggestion of in_windows_eventlog2 but am seeing the worker fail to start with the following error in the td-agent.log:

2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `memory'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:108:in `parse_memory'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/nokogiri-1.10.9-x64-mingw32/lib/nokogiri/xml/sax/parser.rb:83:in `parse'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:163:in `bookmark_validator'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:138:in `subscribe_channel'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:131:in `block in start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluent-plugin-windows-eventlog-0.5.4/lib/fluent/plugin/in_windows_eventlog2.rb:130:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:200:in `block in start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `each'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/root_agent.rb:199:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:248:in `start'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/engine.rb:147:in `run'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:600:in `block in run_worker'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:837:in `main_process'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/supervisor.rb:594:in `run_worker'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/lib/fluent/command/fluentd.rb:350:in `<top (required)>'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.10.2/bin/fluentd:8:in `<top (required)>'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `load'
  2020-06-08 08:31:59 -0400 [error]: #0 C:/opt/td-agent/embedded/bin/fluentd:23:in `<main>'
2020-06-08 08:31:59 -0400 [error]: #0 unexpected error error_class=TypeError error="no implicit conversion of Array into String"
  2020-06-08 08:31:59 -0400 [error]: #0 suppressed same stacktrace
2020-06-08 08:31:59 -0400 [info]: Worker 0 finished unexpectedly with status 1

I copy/pasted the config snippet from the link you sent me.

nmaludy avatar Jun 08 '20 12:06 nmaludy

Fixed that error by deleting the old storage.json file i had in the same place. Looks like the new plugin has a different storage format. The plugin is running now, but i'm still seeing errors related to renaming the storage file.

nmaludy avatar Jun 08 '20 12:06 nmaludy

Did you try to run the config with the td-agent command promt as admin (not as a service)? Not sure if it would give another result but might be worth a try. If nothing works I would also try to reinstall the whole td-agent. @nmaludy

philipsabri avatar Jun 08 '20 13:06 philipsabri

@flurreN No change running it directly from the td-agent command prompt as Administrator. Also with this setup it still isn't able to read the Security channel.

Also with the windows_eventlog2 setup, i was never able to get it to read from any channel. No logs seemed to have ever been sent to the central logging server. Debugging this further now.

nmaludy avatar Jun 08 '20 13:06 nmaludy

On the windows_eventlog2 setup, i figured out the problem. The Keywords field was being sent as a String and our old logger was sending it as an Integer, therefore our ElasticSearch index had type Integer and we were getting index failures when trying to ingest the logs from windows_eventlog2. To fix this i setup the following filter to remove the Keywords field (i don't care about it):

<filter winevt.**>
  @type record_transformer
  remove_keys Keywords
</filter>

nmaludy avatar Jun 08 '20 14:06 nmaludy

@flurreN So, switching to windows_eventlog2 and applying the filter above to fix my indexing problem has solved the issue related to receiving the Security stream. Now, even the Service instance of td-agent / fluentd is reading the Security messages when running as the default SYSTEM user.

However, back to my original issue, still seeing errors in the logs related to renaming the temporary storage file:

2020-06-08 10:28:51 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"
2020-06-08 10:29:11 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/storage.json" tmp="C:/opt/td-agent/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/st
orage.json.tmp, C:/opt/td-agent/storage.json)"

nmaludy avatar Jun 08 '20 14:06 nmaludy

For anyone else out there watching here is my working config:


<filter windows_eventlog**>
  @type record_transformer
  enable_ruby
  <record>
    full_message ${record["Description"]}
    short_message ${record["Description"].lines.first}
  </record>
  remove_keys Keywords,Description
</filter>

<match windows_eventlog**>
  flush_interval 1s
  host graylog.domain.tld
  port 12201
  protocol udp
  @type gelf
</match>

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels ["application", "system", "security"]
  read_existing_events false
  read_interval 2
  tag windows_eventlog
  render_as_xml true
  <storage>
    @type local
    persistent true
    path C:/opt/td-agent/storage.json
  </storage>
  <parse>
    @type winevt_xml
    preserve_qualifiers true
  </parse>
</source>

<system>
  root_dir C:/opt/td-agent
</system>

nmaludy avatar Jun 08 '20 14:06 nmaludy

Just an update on this ticket, it looks like it was our Anti-Virus agent that was causing the issue. I worked with our security team to disable the AV agent on this host temporarily and the problem went away.

Looks like the AV agent was "grabbing" the the storage.json and storage.json.tmp files when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.

I haven't figured out why the rename DOES eventually succeed, but turning off AV resolved this.

Maybe there is another way we can operate on Windows to prevent this scenario in the future?

nmaludy avatar Jun 08 '20 19:06 nmaludy

@nmaludy Interesting and it makes sense actually. Im curious, which AV are you using?

I agree with you. The storage code is in this repo https://github.com/fluent/fluentd/ and I saw something similar in this issue https://github.com/fluent/fluentd/issues/2446

philipsabri avatar Jun 09 '20 08:06 philipsabri

Looks like the AV agent was "grabbing" the the storage.json and storage.json.tmp files when they were written and modified. This causes an issue in Windows because these file "locks" prevent things like renames from going through since another process has the file open.

When Windows grubs files, they cannot be renamed eventually an administrator. This is because Windows requests greater than SYSTEM permission (It is strange and impossible permission!) to handle other processes handling files.

Yeah, disabling AV agent should be reasonable but a bit of dangerous. IMO, AV agent should be able to specify excluding scanning folder. in_windows_eventlog2 needs to rename storage data written files for managing bookmarks.

cosmo0920 avatar Jun 11 '20 03:06 cosmo0920

Hello. Is there are any updates on this?

gordash2013 avatar Feb 09 '23 13:02 gordash2013

@cosmo0920 @ashie is there any update on this issue?

kashok7474 avatar Sep 11 '23 08:09 kashok7474

@kashok7474 Please share your configuration and circumstances. Is the same for the comment? If not, could you post your configuration? Thanks.

cosmo0920 avatar Sep 12 '23 03:09 cosmo0920

I've run into this same issue and created a ticket under Fluentd (should have been here for the plugin which I now know). Details here: https://github.com/fluent/fluent-plugin-windows-eventlog/issues/103

sutyak avatar Nov 10 '23 15:11 sutyak