Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm trying to read files from a GCS bucket in gzip format #867

Open
JWThorne opened this issue May 21, 2024 · 1 comment
Open

I'm trying to read files from a GCS bucket in gzip format #867

JWThorne opened this issue May 21, 2024 · 1 comment
Labels
more-info-needed Waiting for an answer from the issue reporter

Comments

@JWThorne
Copy link

This may be an unsupported way to use mtail, but here goes.

I have a GCS bucket mounted in a container as a file system. The files from an external system show up as gzipped json files

Filenames are like logs_20240521_20240521T003517Z_20240521T003619Z_d0afe812.json.gz

I'm seeking a solution to get mtail to read the file in its entirety as they show up.

Problems are:
1> mtail doesn't directly read gzip encoding
2> when a new file shows up, it seeks to the end of the file instead of reading it in its entirety.
It ALWAYS seeks to the end instead of reading new files from the beginning.
This feature is kind of counterintuitive to me. Several log rotator system use daily or hourly filenames.

I have a perfectly working mtail program, and the statistics work fine. I just need to figure out what to do about the compression and reading files from beginning.

The options I see are either
gzcat filename | /proc/pidof mtail/fd/1 with shell script in a loop monitoring the filesystem or
gzcat filename > constant_file_that_mtail_actually_monitors;

Perhaps with inotify to find new files.

@jaqx0r
Copy link
Contributor

jaqx0r commented May 22, 2024

Yep, mtail doesn't support reading any sort of compression. The assumption there is that compression happens after log rotation.

mtails also goes to the end of file because of log rotation, assuming that when it starts it is reading an append-only log. It will read from the start of the file when it detects a log rotation has occurred, e.g. by a rename or a truncation.

Your system looks like you want mtail to read logs not directly from the source but after an archival stage, these logs are already compressed and timestamped and uploaded to the GCS bucket, right?

@jaqx0r jaqx0r added the more-info-needed Waiting for an answer from the issue reporter label May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-info-needed Waiting for an answer from the issue reporter
Projects
None yet
Development

No branches or pull requests

2 participants