Support structured logging #341

pixxon · 2024-02-02T12:55:31Z

Is your feature request related to a problem? Please describe.
Currently every information is dumped as a single line of text, using fmt.Sprintf to embed information and variable. This is hard to parse by third party application, such as Grafana/Loki.

Describe the solution you'd like
Utilize a standard logfmt that is easy to read and parse. This is already supported by slog. Even better if it can be configured to use json in case somebody wants that. ( slog has that option too. )

Describe alternatives you've considered
slog seems to be a straight choice, it's already used and supports this feature. Only thing needed to be changed is the approach of the logs.

Additional context
Would be nice to provide some additional context if possible. Such as when using the foreground feature, it could add the id of the cron job to the context by default.

The text was updated successfully, but these errors were encountered:

m90 · 2024-02-02T15:22:48Z

This sounds a bit as if you are trying to recreate something that is already there. During each backup run, statistics are collected and stored in a Stats object:

docker-volume-backup/cmd/backup/stats.go

Lines 43 to 55 in 6029225

    
           // Stats global stats regarding script execution 
        
           type Stats struct { 
        
           	sync.Mutex 
        
           	StartTime  time.Time 
        
           	EndTime    time.Time 
        
           	TookTime   time.Duration 
        
           	LockedTime time.Duration 
        
           	LogOutput  *bytes.Buffer 
        
           	Containers ContainersStats 
        
           	Services   ServicesStats 
        
           	BackupFile BackupFileStats 
        
           	Storages   map[string]StorageStats 
        
           }

After being populated, this is currently only made available to notification templates: https://offen.github.io/docker-volume-backup/how-tos/set-up-notifications.html#notification-templates-reference

I'm not sure if it makes sense to start emitting metrics in a different manner now and also keep supporting both. I also have to admit that I like log lines that are written for humans mostly, e.g. as it helps with getting users up to speed when the log message tells you what's going wrong. If you're looking to get such data into Loki or other log aggregation systems, I would propose to add an option that will dump a serialized form (JSON or logfmt) of the Stats object to stdout after the script has run. This could then be ingested by everything that reads logs.

Also related is the idea of making this available for Prometheus: #64

pixxon · 2024-02-02T19:32:03Z

I think it's a bit easier if I detail the setup that I have running. For the ease of monitoring my home stack and especially finding some bugs, I have Grafana and Loki running.

In the stack there are 4 components:

The service being monitored. These are all running in docker containers, outputting their logs to the stdout. Which is then exposed from the docker daemon.
Promtail, which is a monitoring agent that will discover the containers and collect the logs from them. ( This is done via docker labels. )
Loki, who will collect all the logs coming from different services and expose them to Grafana for visualization.
Grafana, the dashboard that can be used to display this information. For the ease of use this can be done via labels. {container="grafana"} |= `` | logfmt | level = debug

The output of the above filter would be something like:
logger=ngalert.sender.router t=2024-02-02T19:13:51.691945136Z level=debug msg="Finish of admin configuration sync"

In this setup, it is a much easier to have the logs just being fed into Loki automatically without any need for configuration. I would still keep the logs in the same human readable format, just add some labels and annotations to it. ( The current logs are actually already in this logfmt format, but everything is inside the "msg" tag.

m90 · 2024-02-02T19:52:30Z

The thing I am worried about in this approach is: each time the logger is called, there will be a decision to be made which information to include in a structured way and which one not. Obviously, this is also something that happens when writing logs that are meant for humans, but it's a bit more lenient. When people start consuming logs like this, I would expect requests like "oh I would like to know about X at position Y, can you add it?" to surface, and that's not really sustainable. With having the Stats object, this is well defined and adding / changing properties has to be done only once instead of having to revisit every log call.

I have zero idea about how complicated it would be to make your setup pick up a single line instead all of them.

My question would therefore be: would it be possible / helpful for you to reuse the Stats object here and append its current state to every log call if requested (i.e. enabled using a setting)? Would that give you what you need?

m90 · 2024-02-02T20:15:04Z

Also, when you start emitting logs in a structured way, you start creating an interface for other consumers, which means this will need to stay stable. I currently do not consider log output a stable interface and actually do change it from time to time. Stats already has stability guarantees so that users can safely render their templates. This would also be a good reason to reuse what's already there.

pixxon · 2024-02-02T20:20:31Z

I don't think such verbose information is really needed for the logging. I don't want to build metrics out of it, just create potential alerts with some details.

Currently the only thing that I would like to be changed is to output the error in its own tag instead of being embedded inside the msg. Additionally could be useful to assign a unique identifier to individual backup runs, in debug mode provide filename/linenumber if it makes sense.

I personally think there could be a leeway in the logs interface, I have not seen it being very strict in other applications either. ( They server as a secondary line after the metrics themselves. )

m90 · 2024-02-03T07:47:58Z

Ok, I guess adding the error when logging at ERROR level makes sense. From what I remember this is only happening in a single place anyways which is here:

docker-volume-backup/cmd/backup/main.go

Lines 23 to 49 in 6029225

    
           defer func() { 
        
           	if pArg := recover(); pArg != nil { 
        
           		if err, ok := pArg.(error); ok { 
        
           			s.logger.Error( 
        
           				fmt.Sprintf("Executing the script encountered a panic: %v", err), 
        
           			) 
        
           			if hookErr := s.runHooks(err); hookErr != nil { 
        
           				s.logger.Error( 
        
           					fmt.Sprintf("An error occurred calling the registered hooks: %s", hookErr), 
        
           				) 
        
           			} 
        
           			os.Exit(1) 
        
           		} 
        
           		panic(pArg) 
        
           	} 
        
           	if err := s.runHooks(nil); err != nil { 
        
           		s.logger.Error( 
        
           			fmt.Sprintf( 
        
           				"Backup procedure ran successfully, but an error ocurred calling the registered hooks: %v", 
        
           				err, 
        
           			), 
        
           		) 
        
           		os.Exit(1) 
        
           	} 
        
           	s.logger.Info("Finished running backup tasks.") 
        
           }()

I'd just really like to avoid duplicating data that is already available in Stats in the logs.

pixxon · 2024-02-03T10:25:54Z

There are a few more instances wheree it applies, but I do not want to introduce statistics or any complex data in the logging.

Yesterday while running the test suite I also found out that some data appears in the logs already. The storage context is present in the storage label. So I would really just add these metadatas that can be detected automatically into the logs. What I think will be useful for the new scheduler is the id of the job that printed the given log.

m90 · 2024-02-16T19:32:12Z

Closing this as I also improved error wrapping and logging in #360

Every logger.Error call will now also emit the underlying error while presenting something a bit more digestable to the user.

pixxon mentioned this issue Feb 2, 2024

Move cron scheduling inside application #338

Merged

m90 added the enhancement New feature or request label Feb 2, 2024

m90 closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support structured logging #341

Support structured logging #341

pixxon commented Feb 2, 2024

m90 commented Feb 2, 2024

pixxon commented Feb 2, 2024 •

edited

Loading

m90 commented Feb 2, 2024 •

edited

Loading

m90 commented Feb 2, 2024 •

edited

Loading

pixxon commented Feb 2, 2024

m90 commented Feb 3, 2024

pixxon commented Feb 3, 2024

m90 commented Feb 16, 2024 •

edited

Loading

Support structured logging #341

Support structured logging #341

Comments

pixxon commented Feb 2, 2024

m90 commented Feb 2, 2024

pixxon commented Feb 2, 2024 • edited Loading

m90 commented Feb 2, 2024 • edited Loading

m90 commented Feb 2, 2024 • edited Loading

pixxon commented Feb 2, 2024

m90 commented Feb 3, 2024

pixxon commented Feb 3, 2024

m90 commented Feb 16, 2024 • edited Loading

pixxon commented Feb 2, 2024 •

edited

Loading

m90 commented Feb 2, 2024 •

edited

Loading

m90 commented Feb 2, 2024 •

edited

Loading

m90 commented Feb 16, 2024 •

edited

Loading