You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're running a set up where we run fluent-bit with a forward input and filesystem buffering. Periodically a chunk lands on the filesystem that upon read crashes fluent-bit. That means every time it restarts it crashes as it tries to replay this chunk from backlog. We haven't been able to reproduce this issue reliably (apart from trying to load it with the faulty chunk, in which case it crashes). I unfortunately can't share the chunk as it contains customer data.
Here's the log line we're seeing upon the crash:
[2024/06/21 10:16:25] [engine] caught signal (SIGSEGV)
#0 0x55ae116e4ad3 in msgpack2json() at src/flb_pack.c:731
#1 0x55ae116e4ad3 in msgpack2json() at src/flb_pack.c:731
#2 0x55ae116e4ad3 in msgpack2json() at src/flb_pack.c:731
#3 0x55ae116e4ad3 in msgpack2json() at src/flb_pack.c:731
#4 0x55ae116e533a in flb_msgpack_to_json() at src/flb_pack.c:768
#5 0x55ae116e5457 in flb_msgpack_raw_to_json_sds() at src/flb_pack.c:808
#6 0x55ae117e5bc3 in splunk_format() at plugins/out_splunk/splunk.c:500
#7 0x55ae117e6424 in cb_splunk_flush() at plugins/out_splunk/splunk.c:658
#8 0x55ae11c03ae6 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#9 0xffffffffffffffff in ???() at ???:0
Your Environment
Version used: 3.0.6
Configuration:
[SERVICE]
HTTP_Server On
Health_Check On
Storage.max_chunks_up 512
Storage.backlog.mem_limit 100M
Storage.path /var/log/flb-storage/
Storage.sync normal
Storage.metrics On
[INPUT]
Name Forward
Storage.type filesystem
[OUTPUT]
Name Splunk
Match *
Host <our host>
Port 443
Splunk_Token ${SPLUNK_TOKEN}
TLS On
TLS.Verify On
Event_index <index>
Event_sourcetype fluentd
Retry_Limit False
Storage.total_limit_size 10GB
Environment name and version (e.g. Kubernetes? What version?): Kubernetes (but we can also reproduce locally if we offload the same chunk). We're using eks kubernetes v1.28
Server type and version:
Operating System and version:
Filters and plugins:
Additional context
We tried adjusting the chunk to determine which lines specifically in it might cause the crash, but were unable to manipulate the chunks in a way that would make them readable.
Please advise what we could try to assist with solving this somehow.
Bug Report
Describe the bug
We're running a set up where we run fluent-bit with a forward input and filesystem buffering. Periodically a chunk lands on the filesystem that upon read crashes fluent-bit. That means every time it restarts it crashes as it tries to replay this chunk from backlog. We haven't been able to reproduce this issue reliably (apart from trying to load it with the faulty chunk, in which case it crashes). I unfortunately can't share the chunk as it contains customer data.
Here's the log line we're seeing upon the crash:
Your Environment
Additional context
We tried adjusting the chunk to determine which lines specifically in it might cause the crash, but were unable to manipulate the chunks in a way that would make them readable.
Please advise what we could try to assist with solving this somehow.
We have a strong suspicion that this appeared after we bumped to v3. the only somewhat relevant code change we found was here: https://github.com/fluent/fluent-bit/pull/8589/files
The text was updated successfully, but these errors were encountered: