Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

Open
xuzhao9 opened this issue Aug 12, 2024 · 14 comments
Open
Assignees

Comments

@xuzhao9
Copy link
Contributor

xuzhao9 commented Aug 12, 2024

TorchAO workflow has fixed a bug (bf0e5a9) and should give reasonable numbers right now, the workflow also succeeded: https://github.com/pytorch/benchmark/actions/workflows/torchao.yml

However, we do not see numbers are updated on the TorchAO dashboard: https://hud.pytorch.org/benchmark/torchao

Only the huggingface numbers are available and only the autoquant numbers are correct:

image
@kit1980
Copy link
Member

kit1980 commented Aug 12, 2024

Is this because the PR to benchmark is not merged? #2394

@kit1980
Copy link
Member

kit1980 commented Aug 13, 2024

Looks like something wrong with ShipIt, trying to fix.

@kit1980
Copy link
Member

kit1980 commented Aug 13, 2024

Actually the commit from #2394 already in main, the PR can probably be just closed. This should be irrelevant to the actual issue.

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Aug 13, 2024

I think we can close #2405, but this is an independent problem

@kit1980
Copy link
Member

kit1980 commented Aug 16, 2024

I looked at ossci-metrics/torchbench-csv/torchao/ on AWS and it doesn't have any recent runs, only couple of runs from Jun 7 and Jun 8.
Investigating.

@kit1980
Copy link
Member

kit1980 commented Aug 16, 2024

If you look at https://github.com/pytorch/benchmark/actions/runs/10408276046/job/28825370645, WORKFLOW_RUN_ID and WORKFLOW_RUN_ATTEMPT are empty.

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Aug 18, 2024

@kit1980 I am curious where are the data reported by the dashboard coming from? https://hud.pytorch.org/benchmark/torchao For example, I can see the latest commit hash is b523f9fe1 (2024/08/18)

@kit1980
Copy link
Member

kit1980 commented Aug 19, 2024

I've realized the data is actually in s3, but because WORKFLOW_RUN_ID and WORKFLOW_RUN_ATTEMPT are not populated, that path to the data includes two literal /, so I missed it.

@kit1980
Copy link
Member

kit1980 commented Aug 21, 2024

#2426 fixed WORKFLOW_RUN_ID problem.
Now https://hud.pytorch.org/benchmark/torchao shows the data.
The torchbench part of the run is still in progress, will need to look closely if the data correct later.

@msaroufim
Copy link
Member

Yeah data doesn't look correct quite yet - cc @HDCharles

@kit1980
Copy link
Member

kit1980 commented Aug 22, 2024

Yeah data doesn't look correct quite yet

There was a conflict between my manual run and a nightly run.
It was looking differently at some point, with TorchBench, HF, and TIMM data populated.

Let's wait for the result's of today's nightly run.

@kit1980
Copy link
Member

kit1980 commented Aug 23, 2024

Hm, now we have TorchBench and TIMM, but no HF
image
Investigating further.

@kit1980
Copy link
Member

kit1980 commented Aug 23, 2024

OK, I debugged the reason for missing HF.

HF is the fastest part (7 hours) vs 17-20 hours for TorchBench and TIMM.
Because of that, HF part computed the data for older commit 9860194881, while TorchBench and TIMM used newer commit 0ed30902e3.

So if you select 9860194881 on the right, it will show only HF.
Selecting 0ed30902e3 will show TorchBench and TIMM.
This is reflecting the data and should be less frequent if we can make the benchmarks shorter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants