performance:nl-cache critical bug
Hi,
I wanna report a critical bug we've been experienced in production, which lead to the cluster not being able to write files to disk anymore. Apparently, it was related to "nl-cache".
We run Gluster over 3 bricks, under the following configuration:
# cat /proc/version
Linux version 5.10.0-23-cloud-amd64 ([email protected]) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.179-1 (2023-05-12)
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
# gluster --version
glusterfs 11.0
# gluster volume info
Volume Name: moodledata
Type: Distributed-Replicate
Volume ID: bd58e9ea-dc05-41dd-9fe9-eaea4799ad09
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: j1-jes.internal:/home/gfsdata/moodledata/r--
Brick2: e2-jes.internal:/home/gfsdata/moodledata/r--
Brick3: s3-jes.internal:/home/gfsdata/moodledata/r--
Options Reconfigured:
server.event-threads: 8
client.event-threads: 8
performance.cache-size: 128MB
performance.nl-cache-positive-entry: on
cluster.read-hash-mode: 5
network.inode-lru-limit: 65536
performance.cache-max-file-size: 32MB
performance.nl-cache: off
performance.io-thread-count: 64
performance.read-ahead: on
performance.readdir-ahead: on
cluster.heal-timeout: 180
cluster.self-heal-daemon: on
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.client-io-threads: off
When we've experienced the bug, we had performance.nl-cache set to on.
After some heavy files updates, our logs were flooded with the lines below and the webserver/php-fpm was no longer able to write data to certain files (but not all of them):
[2023-05-31 00:15:17.773955 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/features/utime.so(+0x4087) [0x7f532d2fe087] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x8bb3) [0x7f532d29bbb3] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x7e13) [0x7f532d29ae13] ) 0-moodledata-nl-cache: inode is not of type dir [Invalid argument]
[2023-05-31 00:15:17.801455 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/features/utime.so(+0x4087) [0x7f532d2fe087] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x8bb3) [0x7f532d29bbb3] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x7e13) [0x7f532d29ae13] ) 0-moodledata-nl-cache: inode is not of type dir [Invalid argument]
[2023-05-31 00:15:35.962825 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/features/utime.so(+0x4087) [0x7f532d2fe087] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x8bb3) [0x7f532d29bbb3] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.0/xlator/performance/nl-cache.so(+0x7e13) [0x7f532d29ae13] ) 0-moodledata-nl-cache: inode is not of type dir [Invalid argument]
The issue was solved after turning performance.nl-cache off.
Unfortunately we don't know what triggered it, but it occurred after a heavy-data transfer (about 1000 files updated consequently).
I am willing to provide any further necessary info to debug this matter, unfortunately, it only happened in production so we can't really reproduce it (as we can't turn nl-cache on again since it'll cause service disruption)
Thank you, Adelin
Seeing this issue also, nearly identical config
[2023-09-14 14:29:59.844675 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/nl-cache.so(+0x532f) [0x7febebb9a32f] -->/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/nl-cache.so(+0x3e81) [0x7febebb98e81] -->/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/nl-cache.so(+0x96f3) [0x7febebb9e6f3] ) 0-glustervol3-nl-cache: inode is not of type dir [Invalid argument]
it was already earlier reported here: https://github.com/gluster/glusterfs/issues/2521
I have this also found now using "glusterfs-10.5-1.el9s"
It happens here often, is this really a critical error message, for me it looks more like a failsafe check or more things going wrong in general in this caching unit.
Message is triggered here https://github.com/gluster/glusterfs/blob/devel/xlators/performance/nl-cache/src/nl-cache-helper.c in line 405:
void
nlc_set_dir_state(xlator_t *this, inode_t *inode, uint64_t state)
{
nlc_ctx_t *nlc_ctx = NULL;
if (inode->ia_type != IA_IFDIR) {
gf_msg_callingfn(this->name, GF_LOG_ERROR, EINVAL, NLC_MSG_EINVAL,
"inode is not of type dir");
goto out;
}
The patch https://github.com/gluster/glusterfs/pull/4372 should resolve the same.
Still seeing lots of these messages when ingesting lots of data, glusterfs 11.1, is this fix in the upstream yet?
[2025-02-06 08:17:42.876835 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/features/utime.so(+0x42ea) [0x75ab83a262ea] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x917b) [0x75ab8097817b] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x835b) [0x75ab8097735b] ) 7-b3-nl-cache: inode is not of type dir [Invalid argument]
[2025-02-06 08:17:43.250456 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/features/utime.so(+0x42ea) [0x75ab83a262ea] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x917b) [0x75ab8097817b] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x835b) [0x75ab8097735b] ) 7-b3-nl-cache: inode is not of type dir [Invalid argument]
[2025-02-06 08:17:43.547061 +0000] E [nl-cache-helper.c:405:nlc_set_dir_state] (-->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/features/utime.so(+0x42ea) [0x75ab83a262ea] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x917b) [0x75ab8097817b] -->/usr/lib/x86_64-linux-gnu/glusterfs/11.1/xlator/performance/nl-cache.so(+0x835b) [0x75ab8097735b] ) 7-b3-nl-cache: inode is not of type dir [Invalid argument]
Volume options:
Option Value
------ -----
cluster.lookup-unhashed on (DEFAULT)
cluster.lookup-optimize on (DEFAULT)
cluster.rmdir-optimize on (DEFAULT)
cluster.min-free-disk 10% (DEFAULT)
cluster.min-free-inodes 5% (DEFAULT)
cluster.rebalance-stats off (DEFAULT)
cluster.subvols-per-directory (null) (DEFAULT)
cluster.readdir-optimize off (DEFAULT)
cluster.rsync-hash-regex (null) (DEFAULT)
cluster.extra-hash-regex (null) (DEFAULT)
cluster.dht-xattr-name trusted.glusterfs.dht (DEFAULT)
cluster.randomize-hash-range-by-gfid off (DEFAULT)
cluster.rebal-throttle normal (DEFAULT)
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null) (DEFAULT)
cluster.weighted-rebalance on (DEFAULT)
cluster.switch-pattern (null) (DEFAULT)
cluster.entry-change-log on (DEFAULT)
cluster.read-subvolume (null) (DEFAULT)
cluster.read-subvolume-index -1 (DEFAULT)
cluster.read-hash-mode 1 (DEFAULT)
cluster.background-self-heal-count 8 (DEFAULT)
cluster.metadata-self-heal off (DEFAULT)
cluster.data-self-heal off (DEFAULT)
cluster.entry-self-heal off (DEFAULT)
cluster.self-heal-daemon on (DEFAULT)
cluster.heal-timeout 600 (DEFAULT)
cluster.self-heal-window-size 8 (DEFAULT)
cluster.data-change-log on (DEFAULT)
cluster.metadata-change-log on (DEFAULT)
cluster.data-self-heal-algorithm (null) (DEFAULT)
cluster.eager-lock on (DEFAULT)
disperse.eager-lock on (DEFAULT)
disperse.other-eager-lock on (DEFAULT)
disperse.eager-lock-timeout 1 (DEFAULT)
disperse.other-eager-lock-timeout 1 (DEFAULT)
cluster.quorum-type none (DEFAULT)
cluster.quorum-count (null) (DEFAULT)
cluster.choose-local true (DEFAULT)
cluster.self-heal-readdir-size 1KB (DEFAULT)
cluster.post-op-delay-secs 1 (DEFAULT)
cluster.ensure-durability on (DEFAULT)
cluster.consistent-metadata no (DEFAULT)
cluster.heal-wait-queue-length 128 (DEFAULT)
cluster.favorite-child-policy none (DEFAULT)
cluster.full-lock yes (DEFAULT)
cluster.optimistic-change-log on (DEFAULT)
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off (DEFAULT)
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL (DEFAULT)
diagnostics.client-sys-log-level CRITICAL (DEFAULT)
diagnostics.brick-logger (null) (DEFAULT)
diagnostics.client-logger (null) (DEFAULT)
diagnostics.brick-log-format (null) (DEFAULT)
diagnostics.client-log-format (null) (DEFAULT)
diagnostics.brick-log-buf-size 5 (DEFAULT)
diagnostics.client-log-buf-size 5 (DEFAULT)
diagnostics.brick-log-flush-timeout 120 (DEFAULT)
diagnostics.client-log-flush-timeout 120 (DEFAULT)
diagnostics.stats-dump-interval 0 (DEFAULT)
diagnostics.fop-sample-interval 0 (DEFAULT)
diagnostics.stats-dump-format json (DEFAULT)
diagnostics.fop-sample-buf-size 65535 (DEFAULT)
diagnostics.stats-dnscache-ttl-sec 86400 (DEFAULT)
performance.cache-max-file-size 128kb
performance.cache-min-file-size 0 (DEFAULT)
performance.cache-refresh-timeout 1 (DEFAULT)
performance.cache-priority (DEFAULT)
performance.io-cache-size 32MB (DEFAULT)
performance.cache-size 32MB (DEFAULT)
performance.io-thread-count 16 (DEFAULT)
performance.high-prio-threads 16 (DEFAULT)
performance.normal-prio-threads 16 (DEFAULT)
performance.low-prio-threads 16 (DEFAULT)
performance.least-prio-threads 1 (DEFAULT)
performance.enable-least-priority on (DEFAULT)
performance.iot-watchdog-secs (null) (DEFAULT)
performance.iot-cleanup-disconnected-reqs off (DEFAULT)
performance.iot-pass-through false (DEFAULT)
performance.io-cache-pass-through false (DEFAULT)
performance.quick-read-cache-size 128MB (DEFAULT)
performance.cache-size 128MB (DEFAULT)
performance.quick-read-cache-timeout 1 (DEFAULT)
performance.qr-cache-timeout 1 (DEFAULT)
performance.quick-read-cache-invalidation false (DEFAULT)
performance.ctime-invalidation false (DEFAULT)
performance.flush-behind on (DEFAULT)
performance.nfs.flush-behind on (DEFAULT)
performance.write-behind-window-size 1MB (DEFAULT)
performance.resync-failed-syncs-after-fsync off (DEFAULT)
performance.nfs.write-behind-window-size 1MB (DEFAULT)
performance.strict-o-direct off (DEFAULT)
performance.nfs.strict-o-direct off (DEFAULT)
performance.strict-write-ordering off (DEFAULT)
performance.nfs.strict-write-ordering off (DEFAULT)
performance.write-behind-trickling-writes on (DEFAULT)
performance.aggregate-size 128KB (DEFAULT)
performance.nfs.write-behind-trickling-writes on (DEFAULT)
performance.lazy-open yes (DEFAULT)
performance.read-after-open yes (DEFAULT)
performance.open-behind-pass-through false (DEFAULT)
performance.read-ahead-page-count 4 (DEFAULT)
performance.read-ahead-pass-through false (DEFAULT)
performance.readdir-ahead-pass-through false (DEFAULT)
performance.md-cache-pass-through false (DEFAULT)
performance.write-behind-pass-through false (DEFAULT)
performance.md-cache-timeout 600
performance.cache-swift-metadata false (DEFAULT)
performance.cache-samba-metadata false (DEFAULT)
performance.cache-capability-xattrs true (DEFAULT)
performance.cache-ima-xattrs true (DEFAULT)
performance.md-cache-statfs off (DEFAULT)
performance.xattr-cache-list (DEFAULT)
performance.nl-cache-pass-through false (DEFAULT)
network.frame-timeout 1800 (DEFAULT)
network.ping-timeout 42 (DEFAULT)
network.tcp-window-size (null) (DEFAULT)
client.ssl off
network.remote-dio disable (DEFAULT)
client.event-threads 2 (DEFAULT)
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
client.strict-locks off
network.tcp-window-size (null) (DEFAULT)
network.inode-lru-limit 200000
auth.allow *
auth.reject (null) (DEFAULT)
transport.keepalive 1
server.allow-insecure on (DEFAULT)
server.root-squash off (DEFAULT)
server.all-squash off (DEFAULT)
server.anonuid 65534 (DEFAULT)
server.anongid 65534 (DEFAULT)
server.statedump-path /var/run/gluster (DEFAULT)
server.outstanding-rpc-limit 64 (DEFAULT)
server.ssl off
auth.ssl-allow *
server.manage-gids off (DEFAULT)
server.dynamic-auth on (DEFAULT)
client.send-gids on (DEFAULT)
server.gid-timeout 300 (DEFAULT)
server.own-thread (null) (DEFAULT)
server.event-threads 2 (DEFAULT)
server.tcp-user-timeout 42 (DEFAULT)
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
ssl.own-cert (null) (DEFAULT)
ssl.private-key (null) (DEFAULT)
ssl.ca-list (null) (DEFAULT)
ssl.crl-path (null) (DEFAULT)
ssl.certificate-depth (null) (DEFAULT)
ssl.cipher-list (null) (DEFAULT)
ssl.dh-param (null) (DEFAULT)
ssl.ec-curve (null) (DEFAULT)
transport.address-family inet
performance.write-behind on
performance.read-ahead off
performance.readdir-ahead on
performance.io-cache off
performance.open-behind on
performance.quick-read on
performance.nl-cache on
performance.stat-prefetch on
performance.client-io-threads on
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true (DEFAULT)
performance.cache-invalidation on
performance.global-cache-invalidation true (DEFAULT)
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15 (DEFAULT)
network.compression.mem-level 8 (DEFAULT)
network.compression.min-size 1024 (DEFAULT)
network.compression.compression-level 1 (DEFAULT)
network.compression.debug false (DEFAULT)
features.default-soft-limit 80% (DEFAULT)
features.soft-timeout 60 (DEFAULT)
features.hard-timeout 5 (DEFAULT)
features.alert-time 86400 (DEFAULT)
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no (DEFAULT)
debug.log-file no (DEFAULT)
debug.exclude-ops (null) (DEFAULT)
debug.include-ops (null) (DEFAULT)
debug.error-gen off
debug.error-failure (null) (DEFAULT)
debug.error-number (null) (DEFAULT)
debug.random-failure off (DEFAULT)
debug.error-fops (null) (DEFAULT)
features.read-only off (DEFAULT)
features.worm off
features.worm-file-level off
features.worm-files-deletable on
features.default-retention-period 120 (DEFAULT)
features.retention-mode relax (DEFAULT)
features.auto-commit-period 180 (DEFAULT)
storage.linux-aio off (DEFAULT)
storage.linux-io_uring off (DEFAULT)
storage.batch-fsync-mode reverse-fsync (DEFAULT)
storage.batch-fsync-delay-usec 0 (DEFAULT)
storage.owner-uid -1 (DEFAULT)
storage.owner-gid -1 (DEFAULT)
storage.node-uuid-pathinfo off (DEFAULT)
storage.health-check-interval 30 (DEFAULT)
storage.build-pgfid off (DEFAULT)
storage.gfid2path on (DEFAULT)
storage.gfid2path-separator : (DEFAULT)
storage.reserve 1 (DEFAULT)
storage.health-check-timeout 20 (DEFAULT)
storage.fips-mode-rchecksum on
storage.force-create-mode 0000 (DEFAULT)
storage.force-directory-mode 0000 (DEFAULT)
storage.create-mask 0777 (DEFAULT)
storage.create-directory-mask 0777 (DEFAULT)
storage.max-hardlinks 100 (DEFAULT)
features.ctime on (DEFAULT)
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 51
changelog.changelog off (DEFAULT)
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs (DEFAULT)
changelog.encoding ascii (DEFAULT)
changelog.rollover-time 15 (DEFAULT)
changelog.fsync-interval 5 (DEFAULT)
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off (DEFAULT)
features.barrier disable
features.barrier-timeout 120
features.trash off (DEFAULT)
features.trash-dir .trashcan (DEFAULT)
features.trash-eliminate-path (null) (DEFAULT)
features.trash-max-filesize 5MB (DEFAULT)
features.trash-internal-op off (DEFAULT)
cluster.enable-shared-storage disable
locks.trace off (DEFAULT)
locks.mandatory-locking off (DEFAULT)
cluster.disperse-self-heal-daemon enable (DEFAULT)
cluster.quorum-reads no (DEFAULT)
client.bind-insecure (null) (DEFAULT)
features.timeout 45 (DEFAULT)
features.failover-hosts (null) (DEFAULT)
features.shard off
features.shard-block-size 64MB (DEFAULT)
features.shard-lru-limit 16384 (DEFAULT)
features.shard-deletion-rate 100 (DEFAULT)
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false (DEFAULT)
features.expiry-time 120
features.signer-threads 4
features.cache-invalidation on
features.cache-invalidation-timeout 600
ganesha.enable off
features.leases off
features.lease-lock-recall-timeout 60 (DEFAULT)
disperse.background-heals 8 (DEFAULT)
disperse.heal-wait-qlength 128 (DEFAULT)
cluster.heal-timeout 600 (DEFAULT)
dht.force-readdirp on (DEFAULT)
disperse.read-policy gfid-hash (DEFAULT)
cluster.shd-max-threads 1 (DEFAULT)
cluster.shd-wait-qlength 1024 (DEFAULT)
cluster.locking-scheme full (DEFAULT)
cluster.granular-entry-heal no (DEFAULT)
features.locks-revocation-secs 0 (DEFAULT)
features.locks-revocation-clear-all false (DEFAULT)
features.locks-revocation-max-blocked 0 (DEFAULT)
features.locks-monkey-unlocking false (DEFAULT)
features.locks-notify-contention yes (DEFAULT)
features.locks-notify-contention-delay 5 (DEFAULT)
disperse.shd-max-threads 1 (DEFAULT)
disperse.shd-wait-qlength 1024 (DEFAULT)
disperse.cpu-extensions auto (DEFAULT)
disperse.self-heal-window-size 32 (DEFAULT)
cluster.use-compound-fops off
performance.parallel-readdir on
performance.rda-request-size 131072
performance.rda-low-wmark 4096 (DEFAULT)
performance.rda-high-wmark 128KB (DEFAULT)
performance.rda-cache-limit 10MB
performance.nl-cache-positive-entry on
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 600
cluster.brick-multiplex disable
cluster.brick-graceful-cleanup disable
glusterd.vol_count_per_thread 100
cluster.max-bricks-per-process 250
disperse.optimistic-change-log on (DEFAULT)
disperse.stripe-cache 4 (DEFAULT)
cluster.halo-enabled False (DEFAULT)
cluster.halo-shd-max-latency 99999 (DEFAULT)
cluster.halo-nfsd-max-latency 5 (DEFAULT)
cluster.halo-max-latency 5 (DEFAULT)
cluster.halo-max-replicas 99999 (DEFAULT)
cluster.halo-min-replicas 2 (DEFAULT)
features.selinux on
cluster.daemon-log-level INFO
debug.delay-gen off
delay-gen.delay-percentage 10% (DEFAULT)
delay-gen.delay-duration 100000 (DEFAULT)
delay-gen.enable (DEFAULT)
disperse.parallel-writes on (DEFAULT)
disperse.quorum-count 0 (DEFAULT)
features.sdfs off
features.cloudsync off
features.ctime on
ctime.noatime on
features.cloudsync-storetype (null) (DEFAULT)
features.enforce-mandatory-lock off
config.global-threading off
config.client-threads 16
config.brick-threads 16
features.cloudsync-remote-read off
features.cloudsync-store-id (null) (DEFAULT)
features.cloudsync-product-id (null) (DEFAULT)
features.acl enable
feature.simple-quota-pass-through true
feature.simple-quota.use-backend false
cluster.use-anonymous-inode yes
rebalance.ensure-durability on (DEFAULT)