bignum too big to convert into `long long'
Describe the bug
Flushing the buffer fails with RangeError - "bignum too big to convert into `long long'".
To Reproduce
Not exactly repro, but this is my conf setting. I'm not sure what bignum is too big, I guess some data comes across from time to time that is bigger than long long, I guess that passes through JSON parser well, but I'm not sure why it fails in mongo buffer.
Expected behavior
Anything else would be better behavior for me, to set -1 instead of real value, to set the biggest long long value, to give some replacement somehow, to remove that row completely , anything except for whole chunk failing constantly. It seems that this is leading to fluentd getting stuck after some time.
Your Environment
- Fluentd version: 1.15-1
- TD Agent version: /
- Operating system: alpine linux, v3.16.0
- Kernel version: 5.10.0-0.bpo.15-amd64
Your Configuration
# Inputs from container logs
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
exclude_path ["/var/log/containers/cilium*"]
pos_file /var/log/fluentd.log.pos
read_from_head
tag kubernetes.*
<parse>
@type cri
</parse>
</source>
# Merge logs split into multiple lines
<filter kubernetes.**>
@type concat
key message
use_partial_cri_logtag true
partial_cri_logtag_key logtag
partial_cri_stream_key stream
separator ""
</filter>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
# Prettify kubernetes metadata
<filter kubernetes.**>
@type record_transformer
enable_ruby
<record>
nodeName ${record.dig("kubernetes", "host")}
namespaceName ${record.dig("kubernetes", "namespace_name")}
podName ${record.dig("kubernetes", "pod_name")}
containerName ${record.dig("kubernetes", "container_name")}
containerImage ${record.dig("kubernetes", "container_image")}
</record>
remove_keys docker,kubernetes
</filter>
# Expands inner json
<filter kubernetes.**>
@type parser
format json
key_name message
reserve_data true
remove_key_name_field true
emit_invalid_record_to_error false
time_format %Y-%m-%dT%H:%M:%S.%NZ
time_key time
keep_time_key
</filter>
# Mongodb keys should not have dollar or a dot inside
<filter kubernetes.**>
@type rename_key
replace_rule1 \$ [dollar]
</filter>
# Mongodb keys should not have dollar or a dot inside
<filter kubernetes.**>
@type rename_key
replace_rule1 \. [dot]
</filter>
# Outputs to log db
<match kubernetes.**>
@type mongo
connection_string "#{ENV['MONGO_ANALYTICS_DB_HOST']}"
collection logs
<buffer>
@type file
path /var/log/file-buffer
flush_thread_count 8
flush_interval 3s
chunk_limit_size 32M
flush_mode interval
retry_max_interval 60
retry_forever true
</buffer>
</match>
Your Error Log
2023-07-28 07:44:22 +0000 [warn]: #0 retry succeeded. chunk_id="6018738263cb959ce87d310203d5692c"
2023-07-28 07:44:23 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2023-07-28 07:44:24 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:23 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:24 +0000 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2023-07-28 07:44:27 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:24 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:26 +0000 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2023-07-28 07:44:31 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:26 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:26 +0000 [warn]: #0 failed to flush the buffer. retry_times=2 next_retry_time=2023-07-28 07:44:31 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:26 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:30 +0000 [warn]: #0 failed to flush the buffer. retry_times=3 next_retry_time=2023-07-28 07:44:39 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:30 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:38 +0000 [warn]: #0 failed to flush the buffer. retry_times=4 next_retry_time=2023-07-28 07:44:54 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:38 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:38 +0000 [warn]: #0 failed to flush the buffer. retry_times=4 next_retry_time=2023-07-28 07:44:56 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:38 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:41 +0000 [info]: #0 stats - namespace_cache_size: 4, pod_cache_size: 18, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5, namespace_cache_host_updates: 4, pod_cache_host_updates: 18
2023-07-28 07:44:56 +0000 [warn]: #0 failed to flush the buffer. retry_times=5 next_retry_time=2023-07-28 07:45:29 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:56 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:44:56 +0000 [warn]: #0 failed to flush the buffer. retry_times=5 next_retry_time=2023-07-28 07:45:30 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:44:56 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:45:11 +0000 [info]: #0 stats - namespace_cache_size: 4, pod_cache_size: 18, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5, namespace_cache_host_updates: 4, pod_cache_host_updates: 18
2023-07-28 07:45:30 +0000 [warn]: #0 failed to flush the buffer. retry_times=6 next_retry_time=2023-07-28 07:46:32 +0000 chunk="5f0bc3088db314e9e88e1a6920da4a11" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:45:30 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:45:30 +0000 [warn]: #0 failed to flush the buffer. retry_times=6 next_retry_time=2023-07-28 07:46:33 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:45:30 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:45:30 +0000 [warn]: #0 failed to flush the buffer. retry_times=6 next_retry_time=2023-07-28 07:46:25 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:45:30 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:45:41 +0000 [info]: #0 stats - namespace_cache_size: 4, pod_cache_size: 18, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5, namespace_cache_host_updates: 4, pod_cache_host_updates: 18
2023-07-28 07:46:11 +0000 [info]: #0 stats - namespace_cache_size: 4, pod_cache_size: 18, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5, namespace_cache_host_updates: 4, pod_cache_host_updates: 18
2023-07-28 07:46:25 +0000 [warn]: #0 failed to flush the buffer. retry_times=7 next_retry_time=2023-07-28 07:47:27 +0000 chunk="6017a8f8ca2765e2e8eeb789257a6e08" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:46:25 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:46:25 +0000 [warn]: #0 failed to flush the buffer. retry_times=7 next_retry_time=2023-07-28 07:47:25 +0000 chunk="5f44a1f448d6b3feab006a95a5405527" error_class=RangeError error="bignum too big to convert into `long long'"
2023-07-28 07:46:25 +0000 [warn]: #0 suppressed same stacktrace
2023-07-28 07:46:41 +0000 [info]: #0 stats - namespace_cache_size: 4, pod_cache_size: 18, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5, namespace_cache_host_updates: 4, pod_cache_host_updates: 18
Additional context
It is running in digital ocean kubernetes, as DaemonSet.
Even more, I cannot understand all the buffer settings. If I remove retry_forever and retry_timeout of 24h (86400), I still don't get the issue chunks to be deleted, as these settings seem not to be on chunk level? I have chunks queued for months that keep failing, some even from last year, and they cannot get deleted because these settings are not on chunk level, whenever some chunk is flushed as expected retry_times, next_retry_time, everything is reset to the init state for all chunks, even the issue ones?
Just happened again, fluentd got completely stuck, this time in minikube. Same, it was reporting 'bignum too big' for two chunks, and when I removed the chunks and restarted fluentd, it was unstuck (simple restart didn't do it, had to remove those chunks before restart). Chunks in question are attached, pleasae advise, as I don't know what to do to mitigate this issue. logs.zip
This keeps happening. Any update?
Sorry for the late response.
Do you get these errors with specific chunks (Chunks that cause the error cause the same error repeatedly)?
Even more, I cannot understand all the buffer settings. If I remove retry_forever and retry_timeout of 24h (86400), I still don't get the issue chunks to be deleted, as these settings seem not to be on chunk level? I have chunks queued for months that keep failing, some even from last year, and they cannot get deleted because these settings are not on chunk level, whenever some chunk is flushed as expected retry_times, next_retry_time, everything is reset to the init state for all chunks, even the issue ones?
You can set retry_max_times to limit the retry count.
- https://docs.fluentd.org/configuration/buffer-section#retries-parameters
Looks like the stacktrace was omitted.
2023-07-28 07:44:23 +0000 [warn]: #0 suppressed same stacktrace
Could you please share the statcktrace? I'd like to know which code causes this error.
When we had big issue ~ a year ago I had all the data, but it got lost in the meantime. When we had that issue the last time, 3 weeks ago, I just removed bad chunks.
Chunks that cause errors cause the same error forever. The only solution was to remove the chunk file and restart the fluentd, as mentioned settings were not on chunk level. Is retry_max_times chunk level setting? Because all the other settings were reseting as soon as 1 chunk had been uploaded successfully.