Issue
We would like Elasticsearch to stop analyzing words that are the company's proper name.
For example, if our company is called "Smiths" we want to avoid Elasticsearch to analyze that word and drop the trailing 's'
Environment
- Any Liferay DXP version
Resolution
How to customize the Elasticsearch analyzers configuration in Liferay
You have an example of how to redefine an analyzer in this article: How to create a custom Elasticsearch analyzer that is insensitive to accents
How to handle some words in Elasticsearch to avoid being processed by the regular parsers
To avoid some words being processed by the regular parsers, you can use the "stemmer override" token filter or the "keyword marker" token filter, more information:
On the other hand, there is also the "mapping character" filter, which allows to apply some text rewriting. See the article Mapping character filter where for example it rewrites the ":(" with "_sad_" although in this case it would be managed by the successive tokenizers.
Seeing all this, if you want the word "Smiths" to be treated differently, you can set up a stemmer override so that "Smiths" is stored as a special keyword in Elasticsearch.
For example, you can apply a stemmer override with the rule: "Smiths, smiths=> __smiths__" with two "_" before and after so that it is not confused with other words that are in the language itself.
Additional Information
- Liferay documentation:
- Elasticsearch documentation: