Revision 955d4f53b528ed8836147f33f54cbb049e73126e authored by David Roberts on 14 May 2021, 06:04:55 UTC, committed by GitHub on 14 May 2021, 06:04:55 UTC
The ml_classic tokenizer creates two (or more) tokens for email addresses, at minimum splitting on the @ symbol. This change makes the new ml_standard tokenizer preserve email addresses as a single token. Tokens that contain an @ symbol but are otherwise purely numeric are ignored as though they were just numbers. Additionally @ symbols are ignored at the beginning and end of tokens.
1 parent d6e9d18
File | Mode | Size |
---|---|---|
.ci | ||
.github | ||
.idea | ||
benchmarks | ||
buildSrc | ||
client | ||
dev-tools | ||
distribution | ||
docs | ||
gradle | ||
libs | ||
licenses | ||
modules | ||
plugins | ||
qa | ||
rest-api-spec | ||
server | ||
test | ||
x-pack | ||
.dir-locals.el | -rw-r--r-- | 3.3 KB |
.editorconfig | -rw-r--r-- | 419 bytes |
.gitattributes | -rw-r--r-- | 32 bytes |
.gitignore | -rw-r--r-- | 1.2 KB |
CONTRIBUTING.md | -rw-r--r-- | 36.8 KB |
LICENSE.txt | -rw-r--r-- | 546 bytes |
NOTICE.txt | -rw-r--r-- | 228 bytes |
README.asciidoc | -rw-r--r-- | 2.6 KB |
TESTING.asciidoc | -rw-r--r-- | 32.5 KB |
Vagrantfile | -rw-r--r-- | 14.9 KB |
build.gradle | -rw-r--r-- | 21.2 KB |
gradle.properties | -rw-r--r-- | 876 bytes |
gradlew | -rwxr-xr-x | 5.6 KB |
gradlew.bat | -rw-r--r-- | 2.7 KB |
settings.gradle | -rw-r--r-- | 4.4 KB |
Computing file changes ...