The Axiomatic similarity function
The Axiomatic similarity function [1,2] is defined by
where
- tf(t,d) correlates to the term's frequency, defined as the number of times the term t appears in the document d.
- |d| is the length of the document d in words (terms).
In our implementation the |d| is defined as : |d| = 1/(norm*norm) , where norm is the score factor used by Lucene's default similarity function.
- avgdl is the average document length in the collection.
- tf(t,q) is the query term frequency, defined as the total number of times term t appears in the query q.
- df(t) is the document frequency of the query term t , defined as the number of documents query term tis appeared .
For implementation details follow the link: How to run the Axiomatic similarity function using Lucene
References
- H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 2005 ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.
- Evaluation of the Default Similarity Function in Lucene, Hui Fang, ChengXiang Zhai, July 15, 2007
RETURN TO THE MAIN PAGE