Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

子类继承 AhoCorasickDoubleArrayTrie 实现匹配时忽略特定字符 #51

Open
XhstormR opened this issue Aug 14, 2021 · 2 comments

Comments

@XhstormR
Copy link

请问子类继承 AhoCorasickDoubleArrayTrie,能否重写某些方法,实现匹配时忽略一些特定字符,例如标点符号?

val acdat = CustomizeAhoCorasickDoubleArrayTrie<String>()
acdat.build(mapOf("hello" to "hello"))

val wordList = acdat.parseText("hel,l.o") // omit , . char, matched
@hankcs
Copy link
Owner

hankcs commented Aug 14, 2021

protected int transitionWithRoot(int nodePos, char c)

这个方法直接return nodePos就可以。

@XhstormR
Copy link
Author

XhstormR commented Aug 14, 2021

试了下直接返回 nodePos 可以匹配成功,但是 Hit 结果中的 begin 不对了,匹配成功的字符串比 keyword 长度要长。

看了下代码匹配成功后保留的信息只有当前 position 和 keyword index,通过 keyword index 从 int[] l 获取长度,然后与 position 相减取得 begin,没有额外信息,无法知道真正的起始点在哪。。

W.or.ld
[2:7]=World
or.ld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants