Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip unsupported regression tests #615

Merged
merged 42 commits into from
Sep 9, 2024
Merged

Skip unsupported regression tests #615

merged 42 commits into from
Sep 9, 2024

Conversation

schuessf
Copy link
Contributor

@schuessf schuessf commented Feb 28, 2023

Currently many regression tests still fail (see #611). Therefore the Jenkins status ("unstable") is not a good indicator. However, some of the tests are expected to fail (and always have), because they only pass for some settings/toolchains (e.g. due to overapproximation).

This PR provides the possibility to mark those tests as skipped after running them. Therefore we store all tests (consisting of file, settings, toolchain) along with a verdict in a separate file. If a tests fails with this verdict, it is marked as skipped, otherwise as failed.

There are still some open points to discuss:

  • Currently the files containing the verdicts need to be in the same folder as the settings or toolchains (and not the file itself!) to parse the skipped-file beforehand.
  • In the current version the verdicts are the results as strings (e.g. TIMEOUT, UNKNOWN, EXCEPTION_OR_ERROR,...). This has two possible issues. First we need to convert these Strings to enums, which might fail. Second these categories might not be as precise as wanted. Therefore we could use the description (e.g. Unable to prove ... Reason: overapproximation of ...) as verdict in this skipped-file.
  • And of course, if it is reasonable to mark the given tests as skipped.

@danieldietsch
Copy link
Member

When you run a test suite with skipped tests, can you infer from the test suite result alone which tests where skipped? For example by looking into the TEST-*.xml?

trunk/examples/programs/FloatingPoint/regression/c/.skip Outdated Show resolved Hide resolved
trunk/examples/programs/FloatingPoint/regression/c/.skip Outdated Show resolved Hide resolved
trunk/examples/programs/regression/c/.skip Outdated Show resolved Hide resolved
BitwiseOperations02.c KojakC-Reach-32Bit-Default.epf KojakC.xml UNKNOWN

// Timeout
BitwiseOperations01.c BlockEncodingV2AutomizerC-FP-MaxSaneBE.epf BlockEncodingV2AutomizerC.xml TIMEOUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these worked with MaxSaneBE at some point. Does increasing the timeout help here?

@@ -183,7 +184,12 @@ public void test() {
}
if (th != null) {
message += " (Ultimate threw an Exception: " + th.getMessage() + ")";
if (result == TestResult.IGNORE) {
skipTest(message, th);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure its a good idea to throw a new exception and dont keep the old one? at least as wrapped exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maul-esel wrote this 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this was a long time ago, but: From my reading of the code, it seems like we (now) do keep the old exception th as a wrapped exception, right?

@danieldietsch
Copy link
Member

@schuessf Is there a reason why this isnt merged?

@schuessf schuessf force-pushed the wip/dk/skip-unsupported2 branch 2 times, most recently from 46204d4 to d43de7f Compare August 20, 2024 09:02
@schuessf
Copy link
Contributor Author

schuessf commented Aug 20, 2024

Okay, I revisited this PR after a long time 🙃 and updated a few files. These are the downsides and open questions:

  • What should be name of the files that specify tests that are skipped (currently: .skip)? Should it be a fixed name or do we want to allow some pattern?
  • Currently the files have to be in the same folder as the settings or toolchains, but not the file itself. This allows to parse the file only once beforehand and not for every test iself.
  • The verdict in the .skip-files can currently only be the result (e.g. UNKNOWN, TIMEOUT, ...), but not a substring of the more detailed error message. This would allow us a more precise check, whether the test should be indeed marked as skipped (or if an unexpected error occured).

If we can agree on these questions, this should be ready to be merged -- and finally all our tests might pass 😉

@danieldietsch
Copy link
Member

I would say:

  • Keep .skip, do not allow different names. Easier to find with find and the like.
  • Keep it in the same folder as settings or toolchains. Just one concept, would be more complex if we had different ones. And parsing once is of course also important :)
  • It would be nice to allow regexes here; then we could also exclude all results with .*.

@schuessf
Copy link
Contributor Author

I updated the code to use regular expressions that also match on the message and not only the result (7b67a7e), without changing the .skip files yet.
If we also want to use regular expressions in the .skip files, I think the current space-separated format is not optimal (since there might be also spaces inside the regexes). Maybe we can change the format also to be more "category-based", to avoid duplicating the same regexes.

@danieldietsch
Copy link
Member

Yeah, using them in the .skip files would be needed. Category-based would also be nice. Perhaps use YAML or something for which we already have a parser?

Copy link
Contributor

@maul-esel maul-esel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviving this! 🎉

See below for a few comments and questions.

@danieldietsch
Copy link
Member

Just to bump, not that it lays dormant for too long :)

@schuessf
Copy link
Contributor Author

Just to bump, not that it lays dormant for too long :)

There are some conflicts now. I will try to resolve them next week and finalize this PR 😉

@schuessf
Copy link
Contributor Author

schuessf commented Sep 4, 2024

I just discovered some bugs and restarted the nightly. I think, this PR should be ready now.

Just one thing I recently discovered: All tests that are marked as skipped, occur also as "success" in Jenkins (and maybe even in JUnit?), e.g. see here and here. @maul-esel do you know why this is happening?

@maul-esel
Copy link
Contributor

I do not know. When running tests in Eclipse, it looks fine. But I get the same phenomenon when running tests via the maven command line:

Tests run: 3, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 27.804 s - in de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-Automizer.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite) skipped
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-Automizer.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite)  Time elapsed: 26.044 s
I_my-test_regression_sum_invariant.bpl S_my-test_regression_ReachSafety-32bit-GemCutter.epf T_my-test_regression_ReachSafety.xml(de.uni_freiburg.informatik.ultimate.regressiontest.generic.RegressionTestSuite)  Time elapsed: 1.751 s

(the third line should not be there)

@schuessf schuessf merged commit a9b967e into dev Sep 9, 2024
3 of 4 checks passed
@schuessf schuessf deleted the wip/dk/skip-unsupported2 branch September 9, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants