-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing test(s): TestAccKMSAutokeyConfig_kmsAutokeyConfigAllExample #18935
Comments
Note from triage: This looks like a permadiff: https://googlecloudplatform.github.io/magic-modules/develop/permadiff/ Weird that it only happens sometimes though. |
The fact that this isn't failing all the time is weird, I wonder if it's just because I've seen the AutokeyConfig take a while to be updated. There is a short, dependency-caused propagation delay for AutokeyConfigs, and I've personally seen failures in the past where setting the AutokeyConfig did not complete fast enough for the KeyHandle creation request to succeed right after. Re-applying worked fine because at that point the config had been set. IIUC, this test is |
that seems plausible! a time_sleep resource would let one insert a wait |
Indeed, I've added it to the relevant test as part of GoogleCloudPlatform/magic-modules#11413. Fingers crossed! |
@melinath @SarahFrench is the test reliably passing now that the change has been merged? (also curious if I have a way to check myself?) |
The test is still failing for the same reason I'm afraid. This screenshot shows a chart where each bar is a day the test ran, red means test failure, and the height is the duration of the test. The last 6 bars show the new sleep's effect on test duration. I'm not sure on how Googlers access nightly test data, so I defer to @melinath |
I'll reach out internally. |
Hmm, weird, I'm pretty confused then.. the google_kms_key_handle basic example test hasn't been failing in a similar way, right? We have a similar wait in there after the AutokeyConfig resource (with an even shorter time of 10s): https://github.com/GoogleCloudPlatform/magic-modules/blob/82cf1c53c32b0a5c7b5d29d4742a8b7bbb471eb0/mmv1/templates/terraform/examples/kms_key_handle_basic.tf.erb#L83 Are there any differences caused by the fact that the AutokeyConfig resource is the last one being created in this test, and that the time_sleep is at the very end of the test? It's definitely not being ignored, because of the test duration increase highlighted above. But if it was a true permadiff, it would show up every time, no? |
It is currently showing up as a permadiff every time in nightly tests since 8/4/2024. Previously it was failing much less frequently. This kind of pattern often indicates a feature that is rolling out to an increasing percentage of servers over time before finally going to 100%. Maybe something went live around 8/4/24 that could be related? The basic example is also failing at 100% (though with a different error message) and was previously flakey with this same permadiff. That failure is tracked in #18934. |
@melinath thanks for letting me know about that issue, I believe it got misrouted internally so I wasn't aware of it! I've opened a draft PR to start addressing the folder name / making resources sweepable: GoogleCloudPlatform/magic-modules#11693 I'd like to use that PR to also fix the permadiff about AutokeyConfig, but I haven't been able to figure out a root cause for it yet :( I'm not aware of anything rolling out and going live around that time, but I've just asked the team. |
I couldn't figure it out the root cause of it, but I made some tests and I strongly believe that this is being caused by an infrastructure issue. Feels like the CI is hitting a different version of the KMS API (maybe an alpha non released version) instead of v1... I'm not sure, it's just a guess and I can't confirm by myself. Here are the tests I made to help on this investigation:
Some other interesting points: With
When we have the key_project attribute on autokey config tf resource, this is the behavior: So with these APIs requests and responses, with all the fields being returned, I don't see how TF could be reporting a permadiff (and I could evidence that with the 1.1K requests test - check re-apply logs here. That's why I suspect that the API that CI is hitting might be having a different behavior. Maybe a good approach to continue this investigation is: someone with proper access into CI infrastructure set TG_LOG env var there and check which is the behavior of the KMS API for autokey config requests. Probably will be different from the one posted in this comment. |
Impacted tests
Affected Resource(s)
Failure rates
Message(s)
Nightly build test history
b/357622504
The text was updated successfully, but these errors were encountered: