[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

xuzhao9 · 2024-08-12T20:25:59Z

TorchAO workflow has fixed a bug (bf0e5a9) and should give reasonable numbers right now, the workflow also succeeded: https://github.com/pytorch/benchmark/actions/workflows/torchao.yml

However, we do not see numbers are updated on the TorchAO dashboard: https://hud.pytorch.org/benchmark/torchao

Only the huggingface numbers are available and only the autoquant numbers are correct:

The text was updated successfully, but these errors were encountered:

kit1980 · 2024-08-12T23:46:13Z

Is this because the PR to benchmark is not merged? #2394

kit1980 · 2024-08-13T00:04:54Z

Looks like something wrong with ShipIt, trying to fix.

kit1980 · 2024-08-13T00:33:30Z

Actually the commit from #2394 already in main, the PR can probably be just closed. This should be irrelevant to the actual issue.

xuzhao9 · 2024-08-13T15:21:58Z

I think we can close #2405, but this is an independent problem

kit1980 · 2024-08-16T23:23:58Z

I looked at ossci-metrics/torchbench-csv/torchao/ on AWS and it doesn't have any recent runs, only couple of runs from Jun 7 and Jun 8.
Investigating.

kit1980 · 2024-08-16T23:49:13Z

If you look at https://github.com/pytorch/benchmark/actions/runs/10408276046/job/28825370645, WORKFLOW_RUN_ID and WORKFLOW_RUN_ATTEMPT are empty.

xuzhao9 · 2024-08-18T22:56:55Z

@kit1980 I am curious where are the data reported by the dashboard coming from? https://hud.pytorch.org/benchmark/torchao For example, I can see the latest commit hash is b523f9fe1 (2024/08/18)

kit1980 · 2024-08-19T23:27:44Z

I've realized the data is actually in s3, but because WORKFLOW_RUN_ID and WORKFLOW_RUN_ATTEMPT are not populated, that path to the data includes two literal /, so I missed it.

kit1980 · 2024-08-21T17:38:31Z

#2426 fixed WORKFLOW_RUN_ID problem.
Now https://hud.pytorch.org/benchmark/torchao shows the data.
The torchbench part of the run is still in progress, will need to look closely if the data correct later.

msaroufim · 2024-08-22T05:55:15Z

Yeah data doesn't look correct quite yet - cc @HDCharles

kit1980 · 2024-08-22T05:58:46Z

Yeah data doesn't look correct quite yet

There was a conflict between my manual run and a nightly run.
It was looking differently at some point, with TorchBench, HF, and TIMM data populated.

Let's wait for the result's of today's nightly run.

kit1980 · 2024-08-23T17:13:55Z

Hm, now we have TorchBench and TIMM, but no HF

Investigating further.

kit1980 · 2024-08-23T23:15:16Z

OK, I debugged the reason for missing HF.

HF is the fastest part (7 hours) vs 17-20 hours for TorchBench and TIMM.
Because of that, HF part computed the data for older commit 9860194881, while TorchBench and TIMM used newer commit 0ed30902e3.

So if you select 9860194881 on the right, it will show only HF.
Selecting 0ed30902e3 will show TorchBench and TIMM.
This is reflecting the data and should be less frequent if we can make the benchmarks shorter.

kit1980 · 2024-08-23T23:18:01Z

Direct link to an older date that shows the data for all three:

https://hud.pytorch.org/benchmark/torchao?dashboard=torchao&startTime=Fri%2C%2016%20Aug%202024%2023%3A16%3A34%20GMT&stopTime=Fri%2C%2023%20Aug%202024%2023%3A16%3A34%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&deviceName=cuda%20(a100)&lBranch=main&lCommit=227d4bfece6bb00f0365642ad1d7afd509079ba4&rBranch=main&rCommit=b523f9f9e15b6fb80d10f585d9cf45e0c5e4d10e

xuzhao9 assigned kit1980 Aug 12, 2024

huydhn added this to PyTorch OSS Dev Infra Oct 1, 2024

ZainRizvi moved this to In Progress in PyTorch OSS Dev Infra Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

xuzhao9 commented Aug 12, 2024 •

edited

Loading

kit1980 commented Aug 12, 2024

kit1980 commented Aug 13, 2024

kit1980 commented Aug 13, 2024

xuzhao9 commented Aug 13, 2024

kit1980 commented Aug 16, 2024

kit1980 commented Aug 16, 2024

xuzhao9 commented Aug 18, 2024

kit1980 commented Aug 19, 2024

kit1980 commented Aug 21, 2024 •

edited

Loading

msaroufim commented Aug 22, 2024

kit1980 commented Aug 22, 2024 •

edited

Loading

kit1980 commented Aug 23, 2024

kit1980 commented Aug 23, 2024

kit1980 commented Aug 23, 2024

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

Comments

xuzhao9 commented Aug 12, 2024 • edited Loading

kit1980 commented Aug 12, 2024

kit1980 commented Aug 13, 2024

kit1980 commented Aug 13, 2024

xuzhao9 commented Aug 13, 2024

kit1980 commented Aug 16, 2024

kit1980 commented Aug 16, 2024

xuzhao9 commented Aug 18, 2024

kit1980 commented Aug 19, 2024

kit1980 commented Aug 21, 2024 • edited Loading

msaroufim commented Aug 22, 2024

kit1980 commented Aug 22, 2024 • edited Loading

kit1980 commented Aug 23, 2024

kit1980 commented Aug 23, 2024

kit1980 commented Aug 23, 2024

xuzhao9 commented Aug 12, 2024 •

edited

Loading

kit1980 commented Aug 21, 2024 •

edited

Loading

kit1980 commented Aug 22, 2024 •

edited

Loading