Revision 955d4f53b528ed8836147f33f54cbb049e73126e authored by David Roberts on 14 May 2021, 06:04:55 UTC, committed by GitHub on 14 May 2021, 06:04:55 UTC
The ml_classic tokenizer creates two (or more) tokens for email addresses, at minimum splitting on the @ symbol. This change makes the new ml_standard tokenizer preserve email addresses as a single token. Tokens that contain an @ symbol but are otherwise purely numeric are ignored as though they were just numbers. Additionally @ symbols are ignored at the beginning and end of tokens.
1 parent d6e9d18
File | Mode | Size |
---|---|---|
cli | ||
core | ||
dissect | ||
geo | ||
grok | ||
nio | ||
plugin-classloader | ||
secure-sm | ||
ssl-config | ||
x-content | ||
build.gradle | -rw-r--r-- | 1.3 KB |
Computing file changes ...