mtail NFS: Stale file handle - after log cycling occurs.

I'm running mtail as a separate Docker container, mounting an NFS (EFS) volume to which another application writes out it's log files. I noticed however that at some point the mtail containers went to 100% CPU, closer investigation shows that the mtail logs are rapidly filling up with:

I0422 19:46:12.453819      13 filestream.go:116] read /logs/app.log: stale NFS file handle
I0422 19:46:12.453841      13 filestream.go:116] read /logs/app.log: stale NFS file handle
I0422 19:46:12.453872      13 filestream.go:116] read /logs/app.log: stale NFS file handle
...
...

That explains the 100% CPU at least, I guess. However, I found out that I can trigger this behaviour by simply redeploying the application that is writing out the logs (app.log will be renamed to app.log.1 on deployment and a fresh app.log is written out).

I'm understanding that mtail has some heuristics to attempt to follow files after rotation, is there a particular reason why this is failing when the files are served from NFS?

Apr 22 '21 19:04 darkl0rd

This is weird. Can you share a simple reproduction, like docker command line and any dockerfile if necessary? Or at least examples so I can try to set up a volume mount and a fake log rotation to reproduce.

Apr 23 '21 23:04 jaqx0r

On contrary to my original post, when I just found out about the problem, I've found out that it actually happens at random, not on container restarts / log rotation. I'm currently guessing it has more to do with EFS temporarily becoming unavailable, the reason I assume this is that it does happen for "groups" of applications at once. It's very strange though, because the application writing the logs (and many other applications I run, which are using EFS) don't experience the "Stale Handle" problem (or, of course maybe they do, but they recover nicely and don't log a thing).

Anyway - it's hard to share a fully working reproduction case, since I can't actually reproduce it at will. To give an idea - I have about 30 of pairs like these running, and this happens consistently at least once a day for each of them.

The gist of it:

AWS EFS, mounted with "defaults":

my-efs:/ on /some/path type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=x.x.x.x,local_lock=none,addr=x.x.x.x,_netdev)

Containers

version: "3.8"

services:
  my-app:
    image: "some-app:some-tag"
    volumes: 
      - type: "bind"
        source: "/some/path/application-instance"
        target: "/my/application/logs"
    networks:
      - prometheus
  mtail:
    image: "my-mtail-image:my-tag"
    volumes:
      - type: "bind"
        source: "/some/path/application-instance"
        target: "/logs"
    networks:
      - prometheus
   command: mtail -progs /etc/mtail/progs -logtostderr -logs /logs/my-app.log

I'll be happy to provide any additional debugging information or otherwise.

Apr 24 '21 11:04 darkl0rd

I think mtail should handle the stale file handle error and attempt to reopen the file. Thanks for the docker-compose, that should help!

Is your my-mtail-image image built from this repos' Dockerfile ?

Apr 28 '21 07:04 jaqx0r

No, it uses my own base image (ubuntu:18.04), adds mtail to /usr/local/bin, copies my progs and starts mtail; nothing fancy, no weird options passed to mtail or anything.

I think so too, since a kill -HUP of the mtail process resolves the problem - I think it should be a matter of handling the ESTALE error. I'd still be curious to find out why the file handle supposedly goes "stale" though.

Apr 28 '21 12:04 darkl0rd

I was reading this, and it seems the common solution is to remount the NFS volume in the client. https://www.cyberciti.biz/tips/nfs-stale-file-handle-error-and-solution.html

Thus, not sure mtail can do anything other than closing the file handle so that the volume is unused and then some other process remounts the volume.

But, I wonder what happens with the HUP? That signal should only cause the programmes to be reloaded, not the log files! Can you increase the logging in the tailer with --vmodule=filestream=2,tail=2 flags and redo the HUP and send me some logs?

Thanks!

Apr 28 '21 21:04 jaqx0r

I read that too - but in my case the client is the 'host', as they are bind mounts. There are 2 other containers (minio, nginx) also reading from the same directory and accessing the same file (at times), that do not suffer from the problem. this might of course well be, because these containers open the file only when it's required (effectively a new file handle) rather than keeping it open. That being said, the application writing to the file keeps the file open continuously and never suffers from the problem.

Also worth noting - I'm currently pinpointing the exact timings, but -all- my mtail containers suffer from this problem at around the same time frame. Mostly seems to start at 21:00/22:00. and at 10:00. Haven't been able to pin it on any 'cron' jobs or alike.

I'll bump the logging and wait for the next occurence - it happened at 08:00 CEST this morning, sadly sending a 'HUP' signal turned out to have no effect after all. I have verified on the host that the mount is not stale, and verified that the other two containers can still access the file. No log rotation has taken place. Sadly the affected container does not have it's logging bumped - so we need to give it a few more hours for this container to run into the same problem.

Apr 29 '21 07:04 darkl0rd

Over the past few days I have found out that it always correlates with the following event in syslog:

2021-05-04T17:50:38.131712+00:00 xxx systemd-networkd[5250]: eth0: Configured
2021-05-04T17:50:38.146945+00:00 xxx systemd[1]: Started ntp-systemd-netif.service.

The other containers, running on the same node continue to access the file just fine. For reference, I also am running grafana/promtail next to it, tailing the same log, it's also unaffected.

May 04 '21 18:05 darkl0rd

Thanks for that. I think I need more info, though.

If I'm understanding correctly,

mtail is running in its own Docker container, and promtail is in another container beside it. Both are reading from the same log file from the host machine?

Do they both mount the log file into the container in the same way?

Does systemd restart networking for all containers when that message occurs? Is there more we can find out what happens when "eth0: Configured" occurs, from the docker daemon log or elsewhere?

You've also said that minio and nginx are involved, can you describe what IO they're doing on the log file and more details how they're containerised?

On Wed, 5 May 2021 at 04:14, darkl0rd @.***> wrote:

Over the past few days I have found out that it always correlates with the following event in syslog:

2021-05-04T17:50:38.131712+00:00 xxx systemd-networkd[5250]: eth0: Configured 2021-05-04T17:50:38.146945+00:00 xxx systemd[1]: Started ntp-systemd-netif.service.

The other containers, running on the same node continue to access the file just fine. For reference, I also am running grafana/promtail next to it, tailing the same log, it's also unaffected.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-832143181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX65T4QUEUVLN5JPSIKDTMA2Q3ANCNFSM43NIIU4A .

May 05 '21 02:05 jaqx0r

All containers have a bind mount from the host (which has the EFS filesystem mounted), which looks like:

  ... 
  volumes:
    - /host/logs:/logs:ro
  ...

I added promtail 2 days back, for testing purposes, because of this issue I'm experiencing with mtail. This way I I want to determine whether it is mtail or the EFS mount. At this point, I have not experienced issues with promtail yet.

Minio: Serves the files on-access; so this is not a permanent open file handle NGiNX: likewise

mtail: continuous tail of the file promtail: continuous tail of the file (added this, to have a more similar & comparable scenario than nginx/minio)

I am still trying to find out where the systemd messages are coming from. Manually restarting "systemd-networkd" does not result in the same behavior.

May 05 '21 04:05 darkl0rd

Thanks, so the host has mounted an efs volume (from AWS?) and then that's re-exported into the containers with the bind mount?

My new guess is that the network events might be a network restart from a dhcp refresh and then all the other binaries handle the estale error and reopen, mtail does not.

On Wed, 5 May 2021, 14:25 darkl0rd, @.***> wrote:

All containers have a bind mount from the host (which has the EFS filesystem mounted), which looks like:

... volumes: - /host/logs:/logs:ro ...

I added promtail 2 days back, for testing purposes, because of this issue I'm experiencing with mtail. This way I I want to determine whether it is mtail or the EFS mount. At this point, I have not experienced issues with promtail yet.

Minio: Serves the files on-access; so this is not a permanent open file handle NGiNX: likewise

mtail: continuous tail of the file promtail: continuous tail of the file (added this, to have a more similar & comparable scenario than nginx/minio)

I am still trying to find out where the systemd messages are coming from. Manually restarting "systemd-networkd" does not result in the same behavior.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-832402490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6YN5D7R2KBQYLLM7KLTMDCCBANCNFSM43NIIU4A .

May 05 '21 04:05 jaqx0r

Not re-exported, the EFS mount from the host is mapped into the container (a bind mount). (re-)exporting (NFS terminology) would imply that the host exports the filesystem again - it doesn't.

I'm thinking something along the same lines, at this point I think we can safely say that it definitely is related to how mtail deals (or doesn't deal) with Stale handles.

May 05 '21 05:05 darkl0rd

Sorry, that's what I meant, but thanks for clarifying.

Ideally we can confirm that the EFS volume is being remounted and that's causing the bind mount to go stale, if that's possible!

On Wed, 5 May 2021 at 15:19, darkl0rd @.***> wrote:

Not re-exported, the EFS mount from the host is mapped into the container (a bind mount). (re-)exporting (NFS terminology) would imply that the host exports the filesystem again - it doesn't.

I'm thinking something along the same lines, at this point I think we can safely say that it definitely is related to how mtail deals (or doesn't deal) with Stale handles.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-832418142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6Y6NO3RDNUYPBSDL4TTMDIOFANCNFSM43NIIU4A .

May 06 '21 00:05 jaqx0r

There's no indication that it's actually being remounted. An explicit remount (mount -o remount .. ) does also not trigger the behavior. The only pointer at this point is the systemd-networkd log message. Forcefully restarting systemd-networkd does not trigger the behavior though.

May 06 '21 07:05 darkl0rd

OK, so it's difficult to construct a test reproduction if we don't really know what's happening.

Can you do a showmount -a inside and outside of the containers before and after this happens?

On Thu, 6 May 2021 at 17:07, darkl0rd @.***> wrote:

There's no indication that it's actually being remounted. An explicit remount (mount -o remount .. ) does also not trigger the behavior. The only pointer at this point is the systemd-networkd log message. Forcefully restarting systemd-networkd does not trigger the behavior though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-833284958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6Z42IUB7DGJ4QUQH53TMI525ANCNFSM43NIIU4A .

May 06 '21 08:05 jaqx0r

I suppose it would also be good to see the inode number of the bind mount and on the host OS before and after.

I'm certain that the name 'NFS' in the error message is a red herring -- the error string for ESTALE just says that because NFS is the reason it first came to exist. But I think that we need to isolate if it's Docker's bind mount or if it's EFS who's presenting the stale handle. I've seen evidence on the net that it could be either -- the Docker overlay2 system has been known to cause this error if one container is using a file and some other process removes it; and EFS looks to me like it's just a regular NFS server.

On Thu, 6 May 2021 at 18:27, Jamie Wilkinson @.***> wrote:

OK, so it's difficult to construct a test reproduction if we don't really know what's happening.

Can you do a showmount -a inside and outside of the containers before and after this happens?

On Thu, 6 May 2021 at 17:07, darkl0rd @.***> wrote:

There's no indication that it's actually being remounted. An explicit remount (mount -o remount .. ) does also not trigger the behavior. The only pointer at this point is the systemd-networkd log message. Forcefully restarting systemd-networkd does not trigger the behavior though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-833284958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6Z42IUB7DGJ4QUQH53TMI525ANCNFSM43NIIU4A .

May 06 '21 08:05 jaqx0r

I'm also not certain that promtail deliberately handles this, and perhaps it's a happy accident that it didn't report anything to you yet: https://github.com/grafana/loki/issues/3651

On Thu, 6 May 2021 at 18:32, Jamie Wilkinson @.***> wrote:

I suppose it would also be good to see the inode number of the bind mount and on the host OS before and after.

I'm certain that the name 'NFS' in the error message is a red herring -- the error string for ESTALE just says that because NFS is the reason it first came to exist. But I think that we need to isolate if it's Docker's bind mount or if it's EFS who's presenting the stale handle. I've seen evidence on the net that it could be either -- the Docker overlay2 system has been known to cause this error if one container is using a file and some other process removes it; and EFS looks to me like it's just a regular NFS server.

On Thu, 6 May 2021 at 18:27, Jamie Wilkinson @.***> wrote:

OK, so it's difficult to construct a test reproduction if we don't really know what's happening.

Can you do a showmount -a inside and outside of the containers before and after this happens?

On Thu, 6 May 2021 at 17:07, darkl0rd @.***> wrote:

There's no indication that it's actually being remounted. An explicit remount (mount -o remount .. ) does also not trigger the behavior. The only pointer at this point is the systemd-networkd log message. Forcefully restarting systemd-networkd does not trigger the behavior though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-833284958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6Z42IUB7DGJ4QUQH53TMI525ANCNFSM43NIIU4A .

May 06 '21 08:05 jaqx0r

Please build from HEAD and try out the new change. I don't think we're fixed yet as I haven't tested this code. I would really like to be able to reproduce this before I call it done, though.

If it works, you should get a log message indicating that we saw ESTALE and attempted to reopen the log file.

May 07 '21 02:05 jaqx0r

Just to let you know, that this issue didn't go stale ;-), I have just built and deployed HEAD. Will report back in 24h.

The log message, do I need to bump the log level for it? Or is it always printed?

May 08 '21 12:05 darkl0rd

It is a regular info level message at the moment. Thanks!

On Sat, 8 May 2021, 22:27 darkl0rd, @.***> wrote:

Just to let you know, that this issue didn't go stale ;-), I have just built and deployed HEAD. Will report back in 24h.

The log message, do I need to bump the log level for it? Or is it always printed?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-835339014, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6ZSDW3YXULOPOG2XSLTMUU4JANCNFSM43NIIU4A .

May 08 '21 22:05 jaqx0r

Sadly no additional logging, output is the same as before:

...
I0509 19:11:19.868818      13 filestream.go:131] read /logs/app.log: stale NFS file handle
I0509 19:11:19.869217      13 filestream.go:131] read /logs/app.log: stale NFS file handle
I0509 19:11:19.869311      13 filestream.go:131] read /logs/app.log: stale NFS file handle
...

# mtail -version
mtail version v3.0.0-rc45-57-g8f0b94fa-dirty git revision 8f0b94fad7f6c8359e31a01d7ee72e8312d915a3 go version go1.16.4 go arch amd64 go os linux

May 09 '21 19:05 darkl0rd

That's disappointing. I may have incorrectly tested for the error code. I really want a way to reproduce this.

On Mon, 10 May 2021, 05:13 darkl0rd, @.***> wrote:

Sadly no additional logging, output is the same as before:

... I0509 19:11:19.868818 13 filestream.go:131] read /logs/app.log: stale NFS file handle I0509 19:11:19.869217 13 filestream.go:131] read /logs/app.log: stale NFS file handle I0509 19:11:19.869311 13 filestream.go:131] read /logs/app.log: stale NFS file handle ...

mtail -version

mtail version v3.0.0-rc45-57-g8f0b94fa-dirty git revision 8f0b94fad7f6c8359e31a01d7ee72e8312d915a3 go version go1.16.4 go arch amd64 go os linux

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/505#issuecomment-835866688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXFX6Z2U6LIDHRC7BKFHPLTM3NFZANCNFSM43NIIU4A .

May 09 '21 23:05 jaqx0r