KMP算法 #12

zhuzhuaicoding · 2018-09-30T03:33:19Z

部分匹配表

难点：理解T[4], T[5]

We consider the example of W = "ABCDABD" first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set T[0] = -1. To find T[1], we must discover a proper suffix of "A" which is also a prefix of pattern W. But there are no proper suffixes of "A", so we set T[1] = 0. To find T[2], we see that the substring W[0] - W[1] ("AB") has a proper suffix "B". However "B" is not a prefix of the pattern W. Therefore, we set T[2] = 0.

Continuing to T[3], we first check the proper suffix of length 1, and as in the previous case it fails. Should we also check longer suffixes? No, we now note that there is a shortcut to checking all suffixes: let us say that we discovered a proper suffix which is a proper prefix (A proper prefix of a string is not equal to the string itself) and ending at W[2] with length 2 (the maximum possible); then its first character is also a proper prefix of W, hence a proper prefix itself, and it ends at W[1], which we already determined did not occur as T[2] = 0 and not T[2] = 1. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. T[x] = m) and should not bother to check m+2, m+3, etc.

Therefore, we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so T[3] = 0.

We pass to the subsequent W[4], 'A'. The same logic shows that the longest substring we need consider has length 1, and as in the previous case it fails since "D" is not a prefix of W. But instead of setting T[4] = 0, we can do better by noting that W[4] = W[0], and also that a look-up of T[4] implies the corresponding S character, S[m+4], was a mismatch and therefore S[m+4] ≠ 'A'. Thus there is no point in restarting the search at S[m+4]; we should begin 1 ahead. This means that we may shift pattern W by match length plus one character, so T[4] = -1.

Considering now the next character, W[5], which is 'B': though by inspection the longest substring would appear to be 'A', we still set T[5] = 0. The reasoning is similar to why T[4] = -1. W[5] itself extends the prefix match begun with W[4], and we can assume that the corresponding character in S, S[m+5] ≠ 'B'. So backtracking before W[5] is pointless, but S[m+5] may be 'A', hence T[5] = 0.

Finally, we see that the next character in the ongoing segment starting at W[4] = 'A' would be 'B', and indeed this is also W[5]. Furthermore, the same argument as above shows that we need not look before W[4] to find a segment for W[6], so that this is it, and we take T[6] = 2.

i	0	1	2	3	4	5	6
A	B	C	D	A	B	D
-1	0	0	0	-1	0	2

i	0	1	2	3	4	5	6	7	8
A	B	A	C	A	B	A	B	C
-1	0	-1	1	-1	0	-1	3	2

zhuzhuaicoding added the algorithm label Sep 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KMP算法 #12

KMP算法 #12

zhuzhuaicoding commented Sep 30, 2018 •

edited

Loading

KMP算法 #12

KMP算法 #12

Comments

zhuzhuaicoding commented Sep 30, 2018 • edited Loading

部分匹配表

难点：理解T[4], T[5]

zhuzhuaicoding commented Sep 30, 2018 •

edited

Loading