[Bug]: Skipped tests are reported in CSV export, and are marked as `ok: true` in JSON report #34174

whimboo · 2024-12-30T19:54:29Z

Version

1.50.0-next

Steps to reproduce

When running the WebDriver BiDi tests for Firefox we currently have a good number of tests skipped which are timing out in CI. We actually want to know about those failing (skipped) tests.

Whereby it's not clear to me how we could differentiate between skipped tests due to a timeout, and skipped tests because they aren't needed.

@yury-s do you have an idea how we could accomplish that?

Expected behavior

Skipped tests should be listed in the CSV export and marked not as ok (?) in the JSON report.

Actual behavior

The CSV export doesn't contain those skipped tests. As well the JSON report lists the test status as ok: true, which is probably not correct and should be false?

Additional context

No response

Environment

System:
    OS: macOS 14.7.1
    CPU: (10) arm64 Apple M1 Pro
    Memory: 3.49 GB / 32.00 GB
  Binaries:
    Node: 22.8.0 - /opt/homebrew/bin/node
    Yarn: 1.22.22 - /opt/homebrew/bin/yarn
    npm: 10.8.2 - /opt/homebrew/bin/npm
  IDEs:
    VSCode: 1.96.0 - /opt/homebrew/bin/code
  Languages:
    Bash: 3.2.57 - /bin/bash

The text was updated successfully, but these errors were encountered:

yury-s · 2024-12-30T23:00:37Z

We should use test.fixme() to mark tests that are fixable but currently failing and use test.skip() for those which are not expected to pass in Bidi mode ever. We can then look at the annotations and if it contains fixme then the test didn't run because it is known to timeout.

Fixes microsoft#34174

whimboo · 2025-01-02T08:47:36Z

This is great. Thanks for getting the CSV reporter updated that quickly!

Now we have closer numbers for the list of failures but there is still a discrepancy of 73 between the results as reported in CI which are 3664 (all) - 2718 (passing) = 958 to the 885 reported entries in the CVS. Do you have an idea what these could be?

yury-s · 2025-01-02T19:02:24Z

As far as I can see, for this run the stats reported in the terminal:

3676 tests using 2 workers

731 failed
227 skipped
2718 passed

These add up just fine.

Then in the .csv report there are 907 lines with the failure results which include both actual failures of the tests that were executed and failed (731) as well as some tests that we marked as fixme based on the expectations file. Maybe skipping tests based on the expectations files does more disservice by confusing people than saving resources, we can remove them altogether if you like. Alternatively, we can export all test results to the report.csv rather than failures and fixme (here is the PR: #34199). That way the total in the terminal and in the .csv report will always match. WDYT?

whimboo · 2025-01-03T08:34:29Z

Then in the .csv report there are 907 lines with the failure results which include both actual failures of the tests that were executed and failed (731) as well as some tests that we marked as fixme based on the expectations file. Maybe skipping tests based on the expectations files does more disservice by confusing people than saving resources, we can remove them altogether if you like.

Ah, there it is coming from! For now I would be fine given that we understand where the gap is coming from. Not sure what you mean with remove them altogether. Are you saying those should still be run even when they timeout and the tests take longer to run? I would still like to see all the tests running and maybe we take a look at getting an understanding of why tests are timing out first and get those fixed if easily possible (like the page.url() issue that you fixed). Once we have the easy ones fixed we could enable all tests?

Alternatively, we can export all test results to the report.csv rather than failures and fixme (here is the PR: #34199). That way the total in the terminal and in the .csv report will always match. WDYT?

The problem here is that we need special filtering in the Google spreadsheet so that we ignore or immediately remove all the passing tests when we import the data. I assume that we would have to write an App Script to do that instead of just having the =IMPORTDATA function running. I wont have the time for that right now.

yury-s added a commit to yury-s/playwright that referenced this issue Dec 30, 2024

chore(bidi): mark test expected to timeout as fixme

80e16ce

Fixes microsoft#34174

yury-s mentioned this issue Dec 30, 2024

chore(bidi): mark test expected to timeout as fixme #34176

Merged

yury-s self-assigned this Dec 30, 2024

yury-s added the v1.50 label Dec 30, 2024

yury-s closed this as completed in #34176 Dec 31, 2024

yury-s mentioned this issue Jan 2, 2025

chore(bidi): print all tests in csv report #34199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Skipped tests are reported in CSV export, and are marked as `ok: true` in JSON report #34174

[Bug]: Skipped tests are reported in CSV export, and are marked as `ok: true` in JSON report #34174

whimboo commented Dec 30, 2024

yury-s commented Dec 30, 2024

whimboo commented Jan 2, 2025

yury-s commented Jan 2, 2025 •

edited

Loading

whimboo commented Jan 3, 2025

[Bug]: Skipped tests are reported in CSV export, and are marked as ok: true in JSON report #34174

[Bug]: Skipped tests are reported in CSV export, and are marked as ok: true in JSON report #34174

Comments

whimboo commented Dec 30, 2024

Version

Steps to reproduce

Expected behavior

Actual behavior

Additional context

Environment

yury-s commented Dec 30, 2024

whimboo commented Jan 2, 2025

yury-s commented Jan 2, 2025 • edited Loading

whimboo commented Jan 3, 2025

[Bug]: Skipped tests are reported in CSV export, and are marked as `ok: true` in JSON report #34174

[Bug]: Skipped tests are reported in CSV export, and are marked as `ok: true` in JSON report #34174

yury-s commented Jan 2, 2025 •

edited

Loading