Comentarios (5)

1 calee ha hecho un comentario el Enlace permanente

Thank you for a great summary on tokenization. Could I use Custom Tokenizer in Appendix B commercially?

2 CraigTrim ha hecho un comentario el Enlace permanente

Yes - help yourself - thanks!

3 E5UP_Srikant_Jakilinki ha hecho un comentario el Enlace permanente

Thank you so much for the article and the data and code. CustomTokenizer refers to helpers like addToken() and CodeUtilities(). Where can we get those?

4 CraigTrim ha hecho un comentario el Enlace permanente

@calee - thanks for noticing that. for addToken, use this:
protected void addToken(List<String> tokens, String text) {
if (!StringUtilities.isEmpty(text)) {
tokens.add(text);
}
}

 
protected void addToken(List<String> tokens, StringBuilder buffer) {
if (null != buffer && 0 != buffer.length()) {
addToken(tokens, buffer.toString().trim());
}
}
 
for the CodeUtilities.java calls, I recommend replacing these references with the native java.lang.Character class. Method calls such as Character.isLetter(...), Character.isDigit(...) etc, should be sufficient.</String></String>

5 nurdo ha hecho un comentario el Enlace permanente

How about CodeUtilities.isSpecial()? What's the corresponding Character... ?

 
Thanks!