Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMP算法 #12

Open
zhuzhuaicoding opened this issue Sep 30, 2018 · 0 comments
Open

KMP算法 #12

zhuzhuaicoding opened this issue Sep 30, 2018 · 0 comments

Comments

@zhuzhuaicoding
Copy link
Owner

zhuzhuaicoding commented Sep 30, 2018

部分匹配表

难点:理解T[4], T[5]

We consider the example of W = "ABCDABD" first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set T[0] = -1. To find T[1], we must discover a proper suffix of "A" which is also a prefix of pattern W. But there are no proper suffixes of "A", so we set T[1] = 0. To find T[2], we see that the substring W[0] - W[1] ("AB") has a proper suffix "B". However "B" is not a prefix of the pattern W. Therefore, we set T[2] = 0.

Continuing to T[3], we first check the proper suffix of length 1, and as in the previous case it fails. Should we also check longer suffixes? No, we now note that there is a shortcut to checking all suffixes: let us say that we discovered a proper suffix which is a proper prefix (A proper prefix of a string is not equal to the string itself) and ending at W[2] with length 2 (the maximum possible); then its first character is also a proper prefix of W, hence a proper prefix itself, and it ends at W[1], which we already determined did not occur as T[2] = 0 and not T[2] = 1. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. T[x] = m) and should not bother to check m+2, m+3, etc.

Therefore, we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so T[3] = 0.

We pass to the subsequent W[4], 'A'. The same logic shows that the longest substring we need consider has length 1, and as in the previous case it fails since "D" is not a prefix of W. But instead of setting T[4] = 0, we can do better by noting that W[4] = W[0], and also that a look-up of T[4] implies the corresponding S character, S[m+4], was a mismatch and therefore S[m+4] ≠ 'A'. Thus there is no point in restarting the search at S[m+4]; we should begin 1 ahead. This means that we may shift pattern W by match length plus one character, so T[4] = -1.

Considering now the next character, W[5], which is 'B': though by inspection the longest substring would appear to be 'A', we still set T[5] = 0. The reasoning is similar to why T[4] = -1. W[5] itself extends the prefix match begun with W[4], and we can assume that the corresponding character in S, S[m+5] ≠ 'B'. So backtracking before W[5] is pointless, but S[m+5] may be 'A', hence T[5] = 0.

Finally, we see that the next character in the ongoing segment starting at W[4] = 'A' would be 'B', and indeed this is also W[5]. Furthermore, the same argument as above shows that we need not look before W[4] to find a segment for W[6], so that this is it, and we take T[6] = 2.

i 0 1 2 3 4 5 6
A B C D A B D
-1 0 0 0 -1 0 2
i 0 1 2 3 4 5 6 7 8
A B A C A B A B C
-1 0 -1 1 -1 0 -1 3 2

4fa4a7b3-d0a9-4093-8277-0a8ae18e56b5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant