Token Filters
Token filters for Japanese text analysis provided by the kuromoji plugin.
Installation
bash
npm install @dynamosearch/plugin-analysis-kuromojiAvailable Token Filters
KuromojiBaseFormFilter
Converts tokens to their base (dictionary) form using morphological analysis.
Best for: Improving recall by matching different word forms, Japanese verb and adjective normalization
KuromojiPartOfSpeechStopFilter
Removes tokens based on their part-of-speech tags.
Best for: Removing grammatical particles and function words, focusing on content words, improving precision
JapaneseStopFilter
Removes common Japanese stop words (similar to English stop words filter).
Best for: Removing very common words, reducing index size, focusing on meaningful content
KuromojiKatakanaStemFilter
Removes trailing prolonged sound marks (ー) from katakana words.
Best for: Normalizing katakana loanword variations, handling inconsistent katakana spelling