Logzio appender stops sending log events #45

martijnblankestijn · 2019-07-01T05:08:22Z

Everything goes well with shipping the log evens to logz.io, until after a day of three the appender stops sending events to logzio.
This has happened on multiple occasions.

We enabled debug logging of the logzio appender and sender. A snippet of the output in the log file is captured below.

Looking at the log file, it tries to drain the queue with one logging event in it, but never sends it. After that it does not attempt to drain the queue again.
And the log line that should be send is identical to other log lines it already sent earlier in the run of the application, so we have no reason to suspect that it has anything to do with the content of that log line.

The version we use:
compile "io.logz.logback:logzio-logback-appender:1.0.22"

20:31:12,222 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:12,547 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - Successfully sent bulk to logz.io, size: 1658
20:31:17,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:22,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:27,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:32,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:37,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
^[[2m2019-06-29 20:31:40.158^[[0;39m ^[[32mDEBUG^[[0;39m ^[[35m8^[[0;39m ^[[2m---^[[0;39m ^[[2m[    scheduler-3]^[[0;39m ^[[36m [APP LOG LINE]
20:31:42,548 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:42,950 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - Successfully sent bulk to logz.io, size: 1658
20:31:47,950 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
20:31:52,951 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
^[[2m2019-06-29 20:31:53.562^[[0;39m ^[[32mDEBUG^[[0;39m ^[[35m8^[[0;39m ^[[2m---^[[0;39m ^[[2m[   scheduler-17]^[[0;39m ^[[36m [APP LOG LINE]
20:31:57,951 |-INFO in io.logz.logback.LogzioLogbackAppender[LogzioLogbackAppender] - DEBUG: Attempting to drain queue
.. more log lines

The text was updated successfully, but these errors were encountered:

idohalevi · 2019-07-01T07:10:20Z

@martijnblankestijn Thank you for letting us know.
This appender is heavily used by other customers and we never had an issue with it.
Is it stop running? Or just don't send anything? Can you also check the queue and tell me if it's empty or not?
One last thing, can you please detail a bit more about your environment?

martijnblankestijn · 2019-07-01T08:16:30Z

The application is running on
Kubernetes: v1.12.9+IKS
Docker image: openjdk:11.0.2-jre
Spring Boot: 2.1.6.RELEASE
Logback: 1.2.3

How do I check the queue? I have seen that it is a binary file in the /tmp file system, and I see that it is changed (looking at the timestamp of the file)

And would it be sensible to switch to the In-memory queue, as we do not really need the file based mechanism?

idohalevi · 2019-07-01T10:32:07Z

@martijnblankestijn
You can see the queue path here. The disk queue is for persistence. In case something bad happens we can recover the data.

martijnblankestijn · 2019-07-06T08:53:36Z

Just happened again, last log message send to logz.io was from 'July 6th 2019, 05:36:53.238'.
The logging through the file appender was still working.

Looking at the /tmp directory, I found the following files
. Current time is 08:42
-rw-r--r-- 1 134217728 Jul 5 06:01 page-0.dat
-rw-r--r-- 1 134217728 Jul 6 08:42 page-1.dat

Time of looking was 08:42, so the queue file page-1.dat is probably still being changed.
I took one of the most recent timestamps in regular log file and searched with it in the queue files like this:

grep -l '2019-07-06T08:23:39.750' *

This got me a hit on page-1.dat, so yes the appender is still writing to disk.

idohalevi · 2019-07-07T08:29:55Z

@martijnblankestijn
Can you provide a way to reproduce it?
What happens if you restart the logger? or do you have any debug logs?

martijnblankestijn · 2019-07-07T10:29:15Z

To reproduce it, is just running our application for a couple of days and then it happens.
I have no other way to force/reproduce it.
It does not happen after a specific amount of time, but between 2 and 4 days approximately.

I activate the debugging log, you can find typical logging in my first post, just that it stops sending and draining. If there is more debug logging I can turn on, just let me know.

What do you mean with restarting the logger? How can this be done?

idohalevi · 2019-07-08T12:20:06Z

@martijnblankestijn I will try to reproduce it on our end. It's not that easy so it might take me some time. If you're running in a kubernetes set-up you can use for now our k8s daemonset or docker collector

idohalevi · 2019-08-08T10:52:27Z

@martijnblankestijn Sorry for the late response. I can't figure out how to reproduce it on our end.
I will update when I will succeed

martijnblankestijn · 2019-08-08T11:41:18Z

Hi Ido, No problem. I'm just back from holiday, so I can totally understand. We still see it happening after 2 or three days. We downgraded to a previous version: same result We tried the InMemory variant: same result. We have not tried the kubernetes variant yet.

…

On Thu, Aug 8, 2019 at 12:52 PM Ido Halevi ***@***.***> wrote: @martijnblankestijn <https://github.com/martijnblankestijn> Sorry for the late response. I can't figure out how to reproduce it on our end. I will update when I will succeed — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45?email_source=notifications&email_token=AACCF5VFZ7YO75DDDWXSOIDQDP3G3A5CNFSM4H4OWJPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33HURI#issuecomment-519469637>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACCF5X2WEZ2ANWE3WC2553QDP3G3ANCNFSM4H4OWJPA> .

idohalevi · 2019-10-22T06:06:46Z

@martijnblankestijn Hi, checking to see if you managed or you still need our help?

pachecopaulo · 2020-05-07T08:12:22Z

@idohalevi The issue hasn't been fixed

idohalevi · 2020-05-07T08:28:59Z

@Doron-Bargo @yyyogev can you assist?

pajaroblanco · 2020-05-11T22:20:51Z

We're seeing the same issue with a similar stack using Micronaut. I've tried both in-memory and disk queues. In both cases all running replicas running in kubernetes will work fine for days and then all of a sudden stop shipping logs.

Kubernetes: v1.16.6
Docker image: adoptopenjdk/openjdk11-openj9:jdk-11.0.1.13-alpine-slim
Micronaut: 1.1.3
Logback: 1.2.3
logzio-logback-appender: 1.0.24

logback.groovy

appender("LOGZIO", LogzioLogbackAppender) {
    token = 'mytoken'
    logzioUrl = 'https://listener-wa.logz.io:8071'
    logzioType = 'my-app'
    additionalFields = "env=$environment;appStack=$stack;appOwner=$appOwner;appId=$appId"
    addHostname = true
    format = 'text'
    line = true
    inMemoryQueue = true
    inMemoryQueueCapacityBytes = 1024 * 1024 * 10 //10MB max memory buffer
}

For what it's worth in our case, a container seems to ship logs successfully for about 6 days and then stop. If I restart the container it starts logging again.

pajaroblanco · 2020-05-26T18:49:59Z

@idohalevi @Doron-Bargo @yyyogev Any progress or suggestion on how to proceed from here?

yyyogev · 2020-05-26T20:12:24Z

Does it happen consistently after 6 days?

pajaroblanco · 2020-05-28T15:19:21Z

@yyyogev Yes, so far. Could be less about time and more about how long it takes before the queue fills up. The log rate is pretty consistent.

pachecopaulo · 2020-06-19T09:12:07Z

Any update on this ?

yyyogev · 2020-06-21T07:19:35Z

Not yet, we are working on it but it's very hard to reproduce.. we have a machine running a few instances more than 2 weeks now and it's fine..
We need more context information. Did it happen to someone without using the appender? but when using the java-sender directly or from another source?
@pachecopaulo were you using it on a k8s pod as well?
What type of queue were you using? in-memory/on disk?

pachecopaulo · 2020-06-22T13:22:41Z

We're running on K8S as well and using on disk. We tried to switch to in-memory but we got the same issue.
There is an application in our cluster that is using Filebeat instead of the appender and that one has been working nicely.

pajaroblanco · 2020-07-13T18:55:33Z

@pachecopaulo did you use a sidecar container approach with filebeat in the same pod?

pachecopaulo · 2020-07-20T14:49:49Z

@pajaroblanco I didn't. We just have another app running with Filebeat

pajaroblanco · 2020-11-13T18:59:36Z

For us, moving from the alpine-slim variant to the standard adoptopenjdk image fixed the issue.

Old: "adoptopenjdk/openjdk11-openj9:jdk-11.0.1.13-alpine-slim"
New: "adoptopenjdk/openjdk11"

After making that change logs have been shipping consistently from the logback appender.

yyyogev · 2020-11-17T15:40:59Z

Can anyone else who had this problem try the above fix and let us know if it works?
@martijnblankestijn @pachecopaulo

martijnblankestijn · 2020-11-17T15:47:49Z

Hmm, sorry, I left the project, so I do not have the option anymore to verify it. Kind regards, Martijn Blankestijn Op di 17 nov. 2020 16:41 schreef Yogev Mets <[email protected]>:

…

Can anyone else who had this problem try the above fix and let us know if it works? @martijnblankestijn <https://github.com/martijnblankestijn> @pachecopaulo <https://github.com/pachecopaulo> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACCF5XETGU42TABOS657JTSQKKRZANCNFSM4H4OWJPA> .

armaniehs · 2023-02-16T21:08:13Z

This one was painful to debug but I think I got to the bottom of it after 4 hours of debugging.

In my case it stopped because fileSystemFullPercentThreshold is pre-set to 98%.

My laptop has 35GB left of disk space out of 2TB and that's why it stopped logging.

If you set it to -1 it will not check for disk space which is ok for a laptop however I don't recommend doing it if running inside a container.

I hope it helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logzio appender stops sending log events #45

Logzio appender stops sending log events #45

martijnblankestijn commented Jul 1, 2019

idohalevi commented Jul 1, 2019

martijnblankestijn commented Jul 1, 2019

idohalevi commented Jul 1, 2019

martijnblankestijn commented Jul 6, 2019

idohalevi commented Jul 7, 2019

martijnblankestijn commented Jul 7, 2019

idohalevi commented Jul 8, 2019

idohalevi commented Aug 8, 2019

martijnblankestijn commented Aug 8, 2019 via email

idohalevi commented Oct 22, 2019

pachecopaulo commented May 7, 2020

idohalevi commented May 7, 2020

pajaroblanco commented May 11, 2020 •

edited

Loading

pajaroblanco commented May 26, 2020

yyyogev commented May 26, 2020 •

edited

Loading

pajaroblanco commented May 28, 2020

pachecopaulo commented Jun 19, 2020

yyyogev commented Jun 21, 2020

pachecopaulo commented Jun 22, 2020 •

edited

Loading

pajaroblanco commented Jul 13, 2020

pachecopaulo commented Jul 20, 2020

pajaroblanco commented Nov 13, 2020

yyyogev commented Nov 17, 2020

martijnblankestijn commented Nov 17, 2020 via email

armaniehs commented Feb 16, 2023

Logzio appender stops sending log events #45

Logzio appender stops sending log events #45

Comments

martijnblankestijn commented Jul 1, 2019

idohalevi commented Jul 1, 2019

martijnblankestijn commented Jul 1, 2019

idohalevi commented Jul 1, 2019

martijnblankestijn commented Jul 6, 2019

idohalevi commented Jul 7, 2019

martijnblankestijn commented Jul 7, 2019

idohalevi commented Jul 8, 2019

idohalevi commented Aug 8, 2019

martijnblankestijn commented Aug 8, 2019 via email

idohalevi commented Oct 22, 2019

pachecopaulo commented May 7, 2020

idohalevi commented May 7, 2020

pajaroblanco commented May 11, 2020 • edited Loading

pajaroblanco commented May 26, 2020

yyyogev commented May 26, 2020 • edited Loading

pajaroblanco commented May 28, 2020

pachecopaulo commented Jun 19, 2020

yyyogev commented Jun 21, 2020

pachecopaulo commented Jun 22, 2020 • edited Loading

pajaroblanco commented Jul 13, 2020

pachecopaulo commented Jul 20, 2020

pajaroblanco commented Nov 13, 2020

yyyogev commented Nov 17, 2020

martijnblankestijn commented Nov 17, 2020 via email

armaniehs commented Feb 16, 2023

pajaroblanco commented May 11, 2020 •

edited

Loading

yyyogev commented May 26, 2020 •

edited

Loading

pachecopaulo commented Jun 22, 2020 •

edited

Loading