Revision 955d4f53b528ed8836147f33f54cbb049e73126e authored by David Roberts on 14 May 2021, 06:04:55 UTC, committed by GitHub on 14 May 2021, 06:04:55 UTC
The ml_classic tokenizer creates two (or more) tokens for email addresses, at minimum splitting on the @ symbol. This change makes the new ml_standard tokenizer preserve email addresses as a single token. Tokens that contain an @ symbol but are otherwise purely numeric are ignored as though they were just numbers. Additionally @ symbols are ignored at the beginning and end of tokens.
1 parent d6e9d18
File | Mode | Size |
---|---|---|
benchmark | ||
client-benchmark-noop-api-plugin | ||
rest | ||
rest-high-level | ||
sniffer | ||
test |
Computing file changes ...