Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch timer calibration fails with large min_duration values. #526

Closed
EchoStone1101 opened this issue Dec 1, 2024 · 1 comment
Closed

Comments

@EchoStone1101
Copy link

Setting large min_duration values (say, 100us) with log_torch=True triggers a "Torch timer calibration failed" error:

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from viztracer import VizTracer
>>> t = VizTracer(min_duration=100, log_torch=True)
STAGE:2024-12-01 10:41:10 307114:307114 ActivityProfilerController.cpp:314] Completed Stage: Warm Up
STAGE:2024-12-01 10:41:10 307114:307114 ActivityProfilerController.cpp:320] Completed Stage: Collection
STAGE:2024-12-01 10:41:10 307114:307114 ActivityProfilerController.cpp:324] Completed Stage: Post Processing
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/python_venv/vllm/lib/python3.10/site-packages/viztracer/viztracer.py", line 96, in __init__
    self.calibrate_torch_timer()
  File "/home/python_venv/vllm/lib/python3.10/site-packages/viztracer/viztracer.py", line 227, in calibrate_torch_timer
    raise RuntimeError("Torch timer calibration failed")  # pragma: no cover
RuntimeError: Torch timer calibration failed

I'm quite sure this is because the calibration mechanism uses lightweight torch operations, which are however filtered with non-default min_duration setting:

def calibrate_torch_timer(self, force=False):
    if self.enable:
        raise RuntimeError("You can't calibrate torch timer while tracer is running")
    if self.torch_offset and not force:
        return
    import json
    import tempfile
    import torch  # type: ignore
    from torch.profiler import profile, supported_activities  # type: ignore
    verbose = self.verbose
    # Silent the tracer during calibration
    self.verbose = 0
    with profile(activities=supported_activities()) as prof:
        _VizTracer.start(self)
        for _ in range(20):
            torch.empty(100) # <================ This is filtered
        _VizTracer.stop(self)

A simple fix would be to temporarily set min_duration to zero during calibration.

@gaogaotiantian
Copy link
Owner

Upgrade to v1.0.0 :)

@gaogaotiantian gaogaotiantian closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants