The Axiomatic similarity function

The Axiomatic similarity function [1,2] is defined by

  1. tf(t,d) correlates to the term's frequency, defined as the number of times the term t appears in the document d.

  2. |d| is the length of the document d in words (terms).
    In our implementation the |d| is defined as : |d| = 1/(norm*norm) , where norm is the score factor used by Lucene's default similarity function.

  3. avgdl is the average document length in the collection.

  4. tf(t,q) is the query term frequency, defined as the total number of times term t appears in the query q.

  5. df(t) is the document frequency of the query term t , defined as the number of documents query term tis appeared .

For implementation details follow the link: How to run the Axiomatic similarity function using Lucene


  1. H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 2005 ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.
  2. Evaluation of the Default Similarity Function in Lucene, Hui Fang, ChengXiang Zhai, July 15, 2007