The Axiomatic similarity function


The Axiomatic similarity function [1,2] is defined by


where
  1. tf(t,d) correlates to the term's frequency, defined as the number of times the term t appears in the document d.

  2. |d| is the length of the document d in words (terms).
    In our implementation the |d| is defined as : |d| = 1/(norm*norm) , where norm is the score factor used by Lucene's default similarity function.

  3. avgdl is the average document length in the collection.

  4. tf(t,q) is the query term frequency, defined as the total number of times term t appears in the query q.

  5. df(t) is the document frequency of the query term t , defined as the number of documents query term tis appeared .


For implementation details follow the link: How to run the Axiomatic similarity function using Lucene

References

  1. H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 2005 ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.
  2. Evaluation of the Default Similarity Function in Lucene, Hui Fang, ChengXiang Zhai, July 15, 2007


RETURN TO THE MAIN PAGE