Skip to content

Analyzers

Analyzers convert text into searchable tokens through a pipeline of character filters, tokenizer, and token filters.

Available Analyzers

Analyzer

Base class for creating custom analyzers with specified character filters, tokenizer, and token filters.

StandardAnalyzer

English text analyzer with word tokenization, lowercase normalization, and optional stop word filtering.

Best for: English text, Western languages, general text search

SimpleAnalyzer

Letter-based tokenization with automatic lowercasing.

Best for: Simple text tokenization, case-insensitive search without stop words

WhitespaceAnalyzer

Splits text on whitespace characters.

Best for: Preserving punctuation and special characters, pre-tokenized input

KeywordAnalyzer

Treats the entire input as a single token for exact matching.

Best for: IDs and identifiers, categories and tags, exact string matching

StopAnalyzer

Letter-based tokenization with lowercasing and stop word filtering.

Best for: English text with stop word removal, reducing index size

PatternAnalyzer

Regex-based tokenization with optional lowercasing and stop word filtering.

Best for: Custom tokenization patterns, domain-specific text formats

EnglishAnalyzer

Optimized analyzer for English text with stemming and stop word filtering.

Best for: English text search with stemming, handling English word variations

FrenchAnalyzer

Optimized analyzer for French text with elision and stemming support.

Best for: French text search, handling French elisions

SpanishAnalyzer

Optimized analyzer for Spanish text with stemming support.

Best for: Spanish text search, Spanish word stemming

Released under the MIT License.