Revision 955d4f53b528ed8836147f33f54cbb049e73126e authored by David Roberts on 14 May 2021, 06:04:55 UTC, committed by GitHub on 14 May 2021, 06:04:55 UTC
The ml_classic tokenizer creates two (or more) tokens for email
addresses, at minimum splitting on the @ symbol.

This change makes the new ml_standard tokenizer preserve email
addresses as a single token.

Tokens that contain an @ symbol but are otherwise purely numeric
are ignored as though they were just numbers. Additionally @
symbols are ignored at the beginning and end of tokens.
1 parent d6e9d18
History
File Mode Size
.ci
.github
.idea
benchmarks
buildSrc
client
dev-tools
distribution
docs
gradle
libs
licenses
modules
plugins
qa
rest-api-spec
server
test
x-pack
.dir-locals.el -rw-r--r-- 3.3 KB
.editorconfig -rw-r--r-- 419 bytes
.gitattributes -rw-r--r-- 32 bytes
.gitignore -rw-r--r-- 1.2 KB
CONTRIBUTING.md -rw-r--r-- 36.8 KB
LICENSE.txt -rw-r--r-- 546 bytes
NOTICE.txt -rw-r--r-- 228 bytes
README.asciidoc -rw-r--r-- 2.6 KB
TESTING.asciidoc -rw-r--r-- 32.5 KB
Vagrantfile -rw-r--r-- 14.9 KB
build.gradle -rw-r--r-- 21.2 KB
gradle.properties -rw-r--r-- 876 bytes
gradlew -rwxr-xr-x 5.6 KB
gradlew.bat -rw-r--r-- 2.7 KB
settings.gradle -rw-r--r-- 4.4 KB

README.asciidoc

back to top