You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
We currently run vector in the recommended configuration, with the daemon sending logs to the aggregator, which then forwards logs on to Loki.
Recently, we experienced an issue where a log was malformed in a way that made the timestamp unparseable by chrono. This log was stored in the buffer for sending to loki so, on every attempt to send data to Loki, vector would run into the corrupted log and panic. Specifically, this happens here:
let timestamp = match event.as_log().get_timestamp(){
Some(Value::Timestamp(ts)) => ts.timestamp_nanos_opt().expect("Timestamp out of range"),
_ => chrono::Utc::now()
.timestamp_nanos_opt()
.expect("Timestamp out of range"),
};
Which is derived from these logs:
2024-07-01T13:23:41.529426Z INFO vector::app: Log level is enabled. level="info"
2024-07-01T13:23:41.530214Z INFO vector::app: Loading configs. paths=["/etc/vector"]
2024-07-01T13:23:41.532893Z WARN vector::config: Source has acknowledgements enabled by a sink, but acknowledgements are not supported by this source. Silent data loss could occur. source="vector_internal_logs" sink="loki"
2024-07-01T13:23:41.957917Z INFO vector::topology::running: Running healthchecks.
2024-07-01T13:23:41.958031Z INFO vector: Vector has started. debug="false" version="0.38.0" arch="x86_64" revision="ea0ec6f 2024-05-07 14:34:39.794027186"
2024-07-01T13:23:41.958146Z INFO source{component_kind="source" component_id=vector_agents component_type=vector}: vector::sources::util::grpc: Building gRPC server. address=0.0.0.0:6000
2024-07-01T13:23:41.958214Z INFO vector::topology::builder: Healthcheck passed.
2024-07-01T13:23:41.958337Z INFO vector::sinks::prometheus::exporter: Building HTTP server. address=0.0.0.0:9090
thread 'vector-worker' panicked at src/sinks/loki/sink.rs:256:68:
Timestamp out of range
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-07-01T13:23:41.959052Z INFO vector::internal_events::api: API server running. address=127.0.0.1:8686 playground=off graphql=http://127.0.0.1:8686/graphql
2024-07-01T13:23:41.959100Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector::topology: An error occurred that Vector couldn't handle: the task panicked and was aborted.
2024-07-01T13:23:41.959141Z INFO vector: Vector has stopped.
The expected behavior here should be to log that the timestamp is unparseable (not panic), and then return None so that the log is filtered out from the filter_map and doesn't get sent on to Loki. This would be preferable behavior to panicking.
As an aside, I couldn't find any information about manually inspecting the buffer database. I am curious about what log was generated that is actually unparseable -- would be good to fix within our own systems so that vector doesn't have to panic or log a warning // error.
Mistakenly posted this before on the wrong user :|
Configuration
I can provide the config if needed, but I don't think it is relevant here.
This is vector running in Kubernetes -- happy to grab additional info or debug logs, it will just take me a little bit.
Also happy to put up a PR, as the changes should be relatively straightforward in just returning None (after a log) from the function instead of panicking.
References
No response
The text was updated successfully, but these errors were encountered:
A note for the community
Problem
We currently run vector in the recommended configuration, with the daemon sending logs to the aggregator, which then forwards logs on to Loki.
Recently, we experienced an issue where a log was malformed in a way that made the timestamp unparseable by chrono. This log was stored in the buffer for sending to loki so, on every attempt to send data to Loki, vector would run into the corrupted log and panic. Specifically, this happens here:
vector/src/sinks/loki/sink.rs
Lines 253 to 258 in 3de6f0b
Which is derived from these logs:
The expected behavior here should be to log that the timestamp is unparseable (not panic), and then return
None
so that the log is filtered out from thefilter_map
and doesn't get sent on to Loki. This would be preferable behavior to panicking.As an aside, I couldn't find any information about manually inspecting the buffer database. I am curious about what log was generated that is actually unparseable -- would be good to fix within our own systems so that vector doesn't have to panic or log a warning // error.
Mistakenly posted this before on the wrong user :|
Configuration
Version
vector 0.38.0 (x86_64-unknown-linux-gnu ea0ec6f 2024-05-07 14:34:39.794027186)
Debug Output
No response
Example Data
No response
Additional Context
This is vector running in Kubernetes -- happy to grab additional info or debug logs, it will just take me a little bit.
Also happy to put up a PR, as the changes should be relatively straightforward in just returning
None
(after a log) from the function instead of panicking.References
No response
The text was updated successfully, but these errors were encountered: