You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once in this state any further request to rescan it will not do anything to change the IndexReport. The logs from the call to scan the previously scanned manifest show:
info index request start
debug locking attempt
debug locking OK
info starting scan
info index request done
The logs on any future attempts show an almost identical output except they include the manifest already scanned line:
info index request start
debug locking attempt
debug locking OK
info starting scan
info manifest already scanned
info index request done
Suspected Code Path
Looking at the code I think this happens when an attempt to get the IndexReport in checkManifest timeouts. The flow I believe is:
The Controller enters the run method and calls CheckManifest as the first state. At this point the IndexReport on the controller is empty
checkManifest discovers that the manifest is already scanned
checkManifest attempts to load the IndexReport but the call to s.Store.IndexReporttimes out (5s timeout) and an err is returned. checkManifest does not log anything in this case and IndexReport on the controller is not set because it was not loaded.
Back in the calling controller it goes through the checks as to whether an error occurred. As this is a timeout errors.Is(err, context.DeadlineExceeded) will be true and it will go into that case
The Controller then continues by going into the retry logic which pauses for a bit and then sets err to nil.
When checkManifest returned it returned the next state as Terminal so Controller now sees this and exits
I think this is consistent with what we see in the logs where we get no logging at all indicating an error has occurred. Nowhere in the code path above does a log get emitted with the error from checkManifest. Future calls to scan on this manifest follow a very similar path except the call to IndexReport does not timeout but all it loads is an empty IndexReport, it then emits the manifest already scanned log line and does not return an error but we are left with the empty IndexReport which check manifests thinks is OK.
The solution
I think there are a couple of things to be addressed here. I'm happy to work on a PR but would like advice first as I think some change to behaviour is required.
Retry logic in controller
I do not think that the retry is doing anything. Every state that I can see that returns an error also sets the next state to Terminal. This means if any of these errors are timeouts then they will go through the retry logic path of having a small pause before seeing that the next state is Terminal and exiting.
Personally I would delete the errors.Is(err, context.DeadlineExceeded) case and retry variables rather than attempting to fix them (the fix would be to change that case to have what is currently inside the if retryblock and put acontinueon the end so it goes back around the loop without changing thenextstate and without persisting a potentially invalidIndexReport`).
Checking the validity of manifests in checkManifest
Fixing the retry logic maybe enough to fix this but care would need to be taken to not persist a bad IndexReport when the call to s.Store.IndexReport fails. Even if the Controller's standard error handling code was used for one of these errors then it would result in an IndexReport being persisted with state IndexError which again isn't what is wanted. I wonder if a check should be added to checkManifest to actually make sure the loaded IndexReport is valid, something like:
ifsr.State!=IndexFinished {
zlog.Info(ctx).Msg("IndexReport not valid so manifest to be scanned")
// Potentially do the filtering that happens in the case where a Manifest has not been scanned, but in this case there will be no scanners left as we know if we get this far `ManifestScanned` returned `ok`s.Vscnrs= indexer.VersionedScanners{}
returnFetchLayers, nil
}
The text was updated successfully, but these errors were encountered:
The Problem
We are finding that sometimes when we scan a manifest that has already been scanned we will end up with an empty IndexReport stored for that manifest:
Once in this state any further request to rescan it will not do anything to change the IndexReport. The logs from the call to scan the previously scanned manifest show:
The logs on any future attempts show an almost identical output except they include the
manifest already scanned
line:Suspected Code Path
Looking at the code I think this happens when an attempt to get the
IndexReport
incheckManifest
timeouts. The flow I believe is:Controller
enters the run method and callsCheckManifest
as the first state. At this point theIndexReport
on the controller is emptycheckManifest
discovers that the manifest is already scannedcheckManifest
attempts to load theIndexReport
but the call tos.Store.IndexReport
times out (5s timeout) and an err is returned.checkManifest
does not log anything in this case andIndexReport
on the controller is not set because it was not loaded.errors.Is(err, context.DeadlineExceeded)
will be true and it will go into that casefor
loop early it just sets theretry
variable to true theController
then proceeds to store theIndexReport
into the database. This stores an emptyIndexReport
into the database.Controller
then continues by going into the retry logic which pauses for a bit and then setserr
to nil.checkManifest
returned it returned the next state asTerminal
soController
now sees this and exitsI think this is consistent with what we see in the logs where we get no logging at all indicating an error has occurred. Nowhere in the code path above does a log get emitted with the error from
checkManifest
. Future calls to scan on this manifest follow a very similar path except the call toIndexReport
does not timeout but all it loads is an emptyIndexReport
, it then emits themanifest already scanned
log line and does not return an error but we are left with the emptyIndexReport
which check manifests thinks is OK.The solution
I think there are a couple of things to be addressed here. I'm happy to work on a PR but would like advice first as I think some change to behaviour is required.
Retry logic in controller
I do not think that the retry is doing anything. Every state that I can see that returns an error also sets the
next
state toTerminal
. This means if any of these errors are timeouts then they will go through theretry
logic path of having a small pause before seeing that the next state is Terminal and exiting.Personally I would delete the
errors.Is(err, context.DeadlineExceeded)
case andretry variables rather than attempting to fix them (the fix would be to change that case to have what is currently inside the
if retryblock and put a
continueon the end so it goes back around the loop without changing the
nextstate and without persisting a potentially invalid
IndexReport`).Checking the validity of manifests in
checkManifest
Fixing the retry logic maybe enough to fix this but care would need to be taken to not persist a bad
IndexReport
when the call tos.Store.IndexReport
fails. Even if theController
's standard error handling code was used for one of these errors then it would result in anIndexReport
being persisted with stateIndexError
which again isn't what is wanted. I wonder if a check should be added tocheckManifest
to actually make sure the loadedIndexReport
is valid, something like:The text was updated successfully, but these errors were encountered: