Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text diffs on huge files are slow #12406

Open
d-b-w opened this issue Jun 2, 2024 · 2 comments
Open

text diffs on huge files are slow #12406

d-b-w opened this issue Jun 2, 2024 · 2 comments
Labels
topic: rewrite related to the assertion rewrite mechanism type: performance performance or memory problem/improvement

Comments

@d-b-w
Copy link

d-b-w commented Jun 2, 2024

Ok, of course they are... But usually* when a dev commits the test, the test is passing, so the dev may not notice how preposterous a diff they are inadvertently asking for.

In my case, a 5 year old test happened to be comparing text files about 2million lines long as strings, functionally:

    assert fh1.read() == fh2.read()

This was fine, until the order of some fields changed and the test started hanging in CI for hours. The right thing to do is to fix this annoying test, but I thought that it might also make sense to push a fix up to pytest.

tl;dr - _diff_text() already knows the verbosity level - would it make sense to truncate the length of the diff calculated in "non verbose" mode? By default, the diff is truncated to the first 10 lines, so _diff_text() is doing extra computation that the caller will never see or use.

@nicoddemus
Copy link
Member

@d-b-w thanks for the report!

I believe so yes. A PR in that direction would be appreciated! 👍

@d-b-w
Copy link
Author

d-b-w commented Jun 3, 2024

OK!

@Zac-HD Zac-HD added type: performance performance or memory problem/improvement topic: rewrite related to the assertion rewrite mechanism labels Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: rewrite related to the assertion rewrite mechanism type: performance performance or memory problem/improvement
Projects
None yet
Development

No branches or pull requests

3 participants