How to change the Similarity function of the Lucene Search Engine

Alexandros Stougiannis
Information Processing Laboratory
Athens University of Economics and Business

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project available for free download. In the following pages we give information on how to change the similarity function of the Lucene Serch Engine.

Based on the implementation by Joaquin Perez-Inglesias for integrating the probalilistic model BM25/BM25F into Lucene we have realize three new similarity functions: The BM25 similarity function of the Okapi system , a version that uses a normalized term frequency formulae and applies pivoted document normalization and finally the what we call axiomatic similarity function was embeded.