KuromojiAnalyzer
Japanese text analyzer using Kuromoji morphological analysis.
Import
typescript
import KuromojiAnalyzer from '@dynamosearch/plugin-analysis-kuromoji/analyzers/KuromojiAnalyzer';Installation
bash
npm install @dynamosearch/plugin-analysis-kuromojiConstructor
typescript
new KuromojiAnalyzer()No parameters required.
Pipeline
- Tokenizer:
KuromojiTokenizer - Filters:
LowerCaseFilter,CJKWidthFilter
Example
typescript
const analyzer = new KuromojiAnalyzer();
const tokens = await analyzer.analyze('東京タワーに行きました');
// [
// { token: '東京', startOffset: 0, endOffset: 2, position: 0 },
// { token: 'タワー', startOffset: 2, endOffset: 5, position: 1 },
// { token: 'に', startOffset: 5, endOffset: 6, position: 2 },
// { token: '行き', startOffset: 6, endOffset: 8, position: 3 },
// { token: 'まし', startOffset: 8, endOffset: 10, position: 4 },
// { token: 'た', startOffset: 10, endOffset: 11, position: 5 }
// ]How It Works
- KuromojiTokenizer: Performs Japanese morphological analysis to segment text into words
- LowerCaseFilter: Converts alphabetic characters to lowercase
- CJKWidthFilter: Normalizes full-width/half-width characters
Best For
- Japanese text
- Mixed Japanese/English content
- Japanese search applications