mpv fixed bug does not get closed #11958

kasper93 · 2024-05-15T19:47:27Z

Hi,

Initially I though it was due to excessive timeouts, but they have been fixed now. Some of testcases are stuck, all I see is pending status and progression started that never ends.

oss-fuzz-linux-zone8-host-scn6-11: Progression task started.

Sure enough after searching similar issues, I found #11490 that was related to disk space issues on runners. And now is my fault, because we were leaking files in /tmp... oops, sorry, though it would be one per process, not that much data. It now has been fixed and rewritten to use memfd_create. mpv-player/mpv@6ede789

I'm creating this issue, because there is not much visibility into runners. Currently I don't see many of fuzz binaries running, stats and logs are missing, coverage build is failing. So I presume /tmp is persistent and it is failing?

Could you take a look and see if runners rebuild is needed similar to #11490?

EDIT:

One more generic question, what are the limits of concurrent jobs? FAQ says

Fuzzing machines only have a single core and fuzz targets should not use more than 2.5GB of RAM.

Say we have N fuzzing targets multiplied by sanitizers and fuzzing engines, each target is allowed one fuzz runner or they are queued and what's the limit?

EDIT2: ~~I think I found the root cause #11965 (will close this issue if this helps after merge)~~

EDIT3: Nothing changed, still there is no progression.

EDIT4: Example of completely stuck testcase https://oss-fuzz.com/testcase-detail/4875501058457600

Thanks,
Kacper

The text was updated successfully, but these errors were encountered:

Should fix arbitrary DNS resolutions. I think this is the root cause of #11958, so let's fix it. Although I'm only guessing. Everything is stuck, even sanitizer that cannot trigger DNS doesn't run, so there might be more to it. It wasn't clear that this error causes so much trouble. There is https://oss-fuzz.com/testcase-detail/6494370936193024, but on crash statistic it says ``` Time to crash: 5916.00s Total count: 1 ``` but if we dig into the statistic table on the actual testcase-detail page, I can see a lot of crashes. Which make sense of course. What is little bit puzzling is that on the one log that is there, I can see it gone all the way ``` INFO: fuzzed for 5916 seconds, wrapping up soon ``` and apparently reported error after doing whole 6000 seconds. There is no detail, no more logs saved. My current understanding is that we got stuck in this case. Signed-off-by: Kacper Michajłow <[email protected]>

kasper93 · 2024-05-21T12:28:40Z

Sorry to bother you again. Is there anything I can do to help resolve this situation? Currently there seem to be no jobs running at all. So far only clue I have is that disk quota is exceeded and this makes runners stuck somehow. Is /tmp storage persistent? In libfuzzer fork mode (which seems to be used) it would indeed leak some files there previously, but I have no way to validate that this is the problem. I don't think fuzzers itself are that big to cause the problem.

Everything is working fine locally and with cifuzz workflow, only clusterfuzz (oss-fuzz) seems to be stuck completely.

oliverchang · 2024-05-24T07:20:14Z

Sorry for the delay. It doesn't appear to be a disk space issue, and I'm not sure why they're stuck. I'll kick off a restart of all the machines to see if that resolves it.

kasper93 · 2024-05-24T15:10:52Z

Thank you. Unfortunately nothing moved. On fuzzer statistic I get Got error with status: 404, on testcase(s) [2024-05-24 13:08:05 UTC] oss-fuzz-linux-zone8-host-lt79-0: Progression task started. and Pending status.

In fairness, it never fully worked, since the initial integration we got some crash reports and some of them were detected as fixed. So far so good, but we never got corpus saved, coverage build since the beginning is failing with

Step #5: Failed to unpack the corpus for fuzzer_load_config_file. This usually means that corpus backup for a particular fuzz target does not exist. If a fuzz target was added in the last 24 hours, please wait one more day. Otherwise, something is wrong with the fuzz target or the infrastructure, and corpus pruning task does not finish successfully.

I thought it needs to stabilize, but now it doesn't seem to give any sign of life, no logs, reports.

I've tested locally full infra/helper.py pipeline and I can generate coverage report without issue, so build and fuzzers seems to be ok. I'd appreciate any help on this matter. I had plans to improve things, add initial corpus, but first we need to stabilize things. There is no rush, but if you need anything on my side to change/update, let me know.

kasper93 · 2024-06-07T18:21:42Z

Friendly ping. Any pointers on how we can resolve this? It works on CIFuzz and locally. Thanks!

kasper93 · 2024-06-17T16:02:00Z

Sorry for the delay. It doesn't appear to be a disk space issue, and I'm not sure why they're stuck. I'll kick off a restart of all the machines to see if that resolves it.

@oliverchang: Sorry for direct ping. Are you sure about that? I disabled half of our fuzzing targets and things seems to unblock. I get logs and corpus saved now.

I've based my assumptions on documentation.

Our builders have a disk size of 250GB
In addition, please keep the size of the build (everything copied to $OUT) small (<10GB uncompressed).

which should fit our case. Our statically linked binaries are not that small ~200MB, but this makes space for 50 of them in $OUT, which is well above what we have and yet we hit limits, also recently during build, that's why I disabled some targets.

I still see some stubborn cases not closing, I will monitor, but things seems to be rolling now, at least I see the logs from fuzzers being saved.

Keeping it open, because I would like to understand what is the limit and if we can enable more targets. There are few protocols and demuxers, better to test them separately.

maflcko · 2024-06-17T16:04:33Z

cross-ref to #11993 (comment)

kasper93 · 2024-06-25T22:22:43Z

I still see some stubborn cases not closing, I will monitor, but things seems to be rolling now, at least I see the logs from fuzzers being saved.

It has been over a week, things seems to work for new issues. Now that the build itself is smaller. Old ones are still stuck, though.

Specifically this:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68817
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68832
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68837
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68843
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68844
https://oss-fuzz.com/testcase-detail/6265069141819392
https://oss-fuzz.com/testcase-detail/5128934898335744
https://oss-fuzz.com/testcase-detail/6637317872222208

I suspect it tries to use old build that somehow is exceeding disk quota and things are still stuck there.

EDIT:

Another

OSError: [Errno 28] No space left on device

https://oss-fuzz-build-logs.storage.googleapis.com/log-cf36ebe3-1630-48d5-83b7-147c765a50cd.txt

kasper93 · 2024-07-16T13:56:53Z

selective_unpack (#12212) did the magic to unblock old reports. Things are getting closed today. Some are still pending, but the future looks bright.

Closing, from my point of view the issue is resolved.

P.S. Still would be nice to have some feedback for failures like this, but now I know it is disk space issue most of the time.

kasper93 mentioned this issue May 16, 2024

mpv: disable network #11965

Merged

kasper93 changed the title ~~mpv: Runners seems to be stuck~~ mpv fixed bug does not get closed Jun 5, 2024

kasper93 closed this as completed Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpv fixed bug does not get closed #11958

mpv fixed bug does not get closed #11958

kasper93 commented May 15, 2024 •

edited

Loading

kasper93 commented May 21, 2024 •

edited

Loading

oliverchang commented May 24, 2024

kasper93 commented May 24, 2024 •

edited

Loading

kasper93 commented Jun 7, 2024

kasper93 commented Jun 17, 2024 •

edited

Loading

maflcko commented Jun 17, 2024

kasper93 commented Jun 25, 2024 •

edited

Loading

kasper93 commented Jul 16, 2024 •

edited

Loading

mpv fixed bug does not get closed #11958

mpv fixed bug does not get closed #11958

Comments

kasper93 commented May 15, 2024 • edited Loading

kasper93 commented May 21, 2024 • edited Loading

oliverchang commented May 24, 2024

kasper93 commented May 24, 2024 • edited Loading

kasper93 commented Jun 7, 2024

kasper93 commented Jun 17, 2024 • edited Loading

maflcko commented Jun 17, 2024

kasper93 commented Jun 25, 2024 • edited Loading

kasper93 commented Jul 16, 2024 • edited Loading

kasper93 commented May 15, 2024 •

edited

Loading

kasper93 commented May 21, 2024 •

edited

Loading

kasper93 commented May 24, 2024 •

edited

Loading

kasper93 commented Jun 17, 2024 •

edited

Loading

kasper93 commented Jun 25, 2024 •

edited

Loading

kasper93 commented Jul 16, 2024 •

edited

Loading