Token Filters

Token filters for Japanese text analysis provided by the kuromoji plugin.

Installation

bash

npm install @dynamosearch/plugin-analysis-kuromoji

Converts tokens to their base (dictionary) form using morphological analysis.

Best for: Improving recall by matching different word forms, Japanese verb and adjective normalization

Removes tokens based on their part-of-speech tags.

Best for: Removing grammatical particles and function words, focusing on content words, improving precision

Removes common Japanese stop words (similar to English stop words filter).

Best for: Removing very common words, reducing index size, focusing on meaningful content

Removes trailing prolonged sound marks (ー) from katakana words.

Best for: Normalizing katakana loanword variations, handling inconsistent katakana spelling