Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logzio appender stops sending log events #45

Open
martijnblankestijn opened this issue Jul 1, 2019 · 25 comments
Open

Logzio appender stops sending log events #45

martijnblankestijn opened this issue Jul 1, 2019 · 25 comments

Comments

@martijnblankestijn
Copy link

Everything goes well with shipping the log evens to logz.io, until after a day of three the appender stops sending events to logzio.
This has happened on multiple occasions.

We enabled debug logging of the logzio appender and sender. A snippet of the output in the log file is captured below.

Looking at the log file, it tries to drain the queue with one logging event in it, but never sends it. After that it does not attempt to drain the queue again.
And the log line that should be send is identical to other log lines it already sent earlier in the run of the application, so we have no reason to suspect that it has anything to do with the content of that log line.

The version we use:
compile "io.logz.logback:logzio-logback-appender:1.0.22"

20:31:12,222 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:12,547 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - Successfully sent bulk to logz.io, size: 1658
20:31:17,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:22,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:27,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:32,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:37,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
^[[2m2019-06-29 20:31:40.158^[[0;39m ^[[32mDEBUG^[[0;39m ^[[35m8^[[0;39m ^[[2m---^[[0;39m ^[[2m[    scheduler-3]^[[0;39m ^[[36m [APP LOG LINE]
20:31:42,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:42,950 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - Successfully sent bulk to logz.io, size: 1658
20:31:47,950 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:52,951 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
^[[2m2019-06-29 20:31:53.562^[[0;39m ^[[32mDEBUG^[[0;39m ^[[35m8^[[0;39m ^[[2m---^[[0;39m ^[[2m[   scheduler-17]^[[0;39m ^[[36m [APP LOG LINE]
20:31:57,951 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
.. more log lines
@idohalevi
Copy link
Contributor

@martijnblankestijn Thank you for letting us know.
This appender is heavily used by other customers and we never had an issue with it.
Is it stop running? Or just don't send anything? Can you also check the queue and tell me if it's empty or not?
One last thing, can you please detail a bit more about your environment?

@martijnblankestijn
Copy link
Author

The application is running on
Kubernetes: v1.12.9+IKS
Docker image: openjdk:11.0.2-jre
Spring Boot: 2.1.6.RELEASE
Logback: 1.2.3

How do I check the queue? I have seen that it is a binary file in the /tmp file system, and I see that it is changed (looking at the timestamp of the file)

And would it be sensible to switch to the In-memory queue, as we do not really need the file based mechanism?

@idohalevi
Copy link
Contributor

@martijnblankestijn
You can see the queue path here. The disk queue is for persistence. In case something bad happens we can recover the data.

@martijnblankestijn
Copy link
Author

Just happened again, last log message send to logz.io was from 'July 6th 2019, 05:36:53.238'.
The logging through the file appender was still working.

Looking at the /tmp directory, I found the following files
. Current time is 08:42
-rw-r--r-- 1 134217728 Jul 5 06:01 page-0.dat
-rw-r--r-- 1 134217728 Jul 6 08:42 page-1.dat

Time of looking was 08:42, so the queue file page-1.dat is probably still being changed.
I took one of the most recent timestamps in regular log file and searched with it in the queue files like this:

grep -l '2019-07-06T08:23:39.750' *

This got me a hit on page-1.dat, so yes the appender is still writing to disk.

@idohalevi
Copy link
Contributor

@martijnblankestijn
Can you provide a way to reproduce it?
What happens if you restart the logger? or do you have any debug logs?

@martijnblankestijn
Copy link
Author

To reproduce it, is just running our application for a couple of days and then it happens.
I have no other way to force/reproduce it.
It does not happen after a specific amount of time, but between 2 and 4 days approximately.

I activate the debugging log, you can find typical logging in my first post, just that it stops sending and draining. If there is more debug logging I can turn on, just let me know.

What do you mean with restarting the logger? How can this be done?

@idohalevi
Copy link
Contributor

@martijnblankestijn I will try to reproduce it on our end. It's not that easy so it might take me some time. If you're running in a kubernetes set-up you can use for now our k8s daemonset or docker collector

@idohalevi
Copy link
Contributor

@martijnblankestijn Sorry for the late response. I can't figure out how to reproduce it on our end.
I will update when I will succeed

@martijnblankestijn
Copy link
Author

martijnblankestijn commented Aug 8, 2019 via email

@idohalevi
Copy link
Contributor

@martijnblankestijn Hi, checking to see if you managed or you still need our help?

@pachecopaulo
Copy link

@idohalevi The issue hasn't been fixed

@idohalevi
Copy link
Contributor

@Doron-Bargo @yyyogev can you assist?

@pajaroblanco
Copy link

pajaroblanco commented May 11, 2020

We're seeing the same issue with a similar stack using Micronaut. I've tried both in-memory and disk queues. In both cases all running replicas running in kubernetes will work fine for days and then all of a sudden stop shipping logs.

Kubernetes: v1.16.6
Docker image: adoptopenjdk/openjdk11-openj9:jdk-11.0.1.13-alpine-slim
Micronaut: 1.1.3
Logback: 1.2.3
logzio-logback-appender: 1.0.24

logback.groovy

appender("LOGZIO", LogzioLogbackAppender) {
    token = 'mytoken'
    logzioUrl = 'https://listener-wa.logz.io:8071'
    logzioType = 'my-app'
    additionalFields = "env=$environment;appStack=$stack;appOwner=$appOwner;appId=$appId"
    addHostname = true
    format = 'text'
    line = true
    inMemoryQueue = true
    inMemoryQueueCapacityBytes = 1024 * 1024 * 10 //10MB max memory buffer
}

For what it's worth in our case, a container seems to ship logs successfully for about 6 days and then stop. If I restart the container it starts logging again.

@pajaroblanco
Copy link

@idohalevi @Doron-Bargo @yyyogev Any progress or suggestion on how to proceed from here?

@yyyogev
Copy link
Contributor

yyyogev commented May 26, 2020

Does it happen consistently after 6 days?

@pajaroblanco
Copy link

@yyyogev Yes, so far. Could be less about time and more about how long it takes before the queue fills up. The log rate is pretty consistent.

@pachecopaulo
Copy link

Any update on this ?

@yyyogev
Copy link
Contributor

yyyogev commented Jun 21, 2020

Not yet, we are working on it but it's very hard to reproduce.. we have a machine running a few instances more than 2 weeks now and it's fine..
We need more context information. Did it happen to someone without using the appender? but when using the java-sender directly or from another source?
@pachecopaulo were you using it on a k8s pod as well?
What type of queue were you using? in-memory/on disk?

@pachecopaulo
Copy link

pachecopaulo commented Jun 22, 2020

We're running on K8S as well and using on disk. We tried to switch to in-memory but we got the same issue.
There is an application in our cluster that is using Filebeat instead of the appender and that one has been working nicely.

@pajaroblanco
Copy link

@pachecopaulo did you use a sidecar container approach with filebeat in the same pod?

@pachecopaulo
Copy link

@pajaroblanco I didn't. We just have another app running with Filebeat

@pajaroblanco
Copy link

For us, moving from the alpine-slim variant to the standard adoptopenjdk image fixed the issue.

Old: "adoptopenjdk/openjdk11-openj9:jdk-11.0.1.13-alpine-slim"
New: "adoptopenjdk/openjdk11"

After making that change logs have been shipping consistently from the logback appender.

@yyyogev
Copy link
Contributor

yyyogev commented Nov 17, 2020

Can anyone else who had this problem try the above fix and let us know if it works?
@martijnblankestijn @pachecopaulo

@martijnblankestijn
Copy link
Author

martijnblankestijn commented Nov 17, 2020 via email

@armaniehs
Copy link

This one was painful to debug but I think I got to the bottom of it after 4 hours of debugging.

In my case it stopped because fileSystemFullPercentThreshold is pre-set to 98%.

My laptop has 35GB left of disk space out of 2TB and that's why it stopped logging.

If you set it to -1 it will not check for disk space which is ok for a laptop however I don't recommend doing it if running inside a container.

I hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants