How to run the Axiomatic similarity function using Lucene ?

Step 1: Download the lucene version 2.4 from here.

Step 2: Download the jar file from here. You can also download the javadoc for this packet from here.

Step 3: Import the packet src/org/ninit/models/bm25 in Lucene code.

Step 4: Find the BM25TermScorer.java class from the packet src/org/ninit/models/bm25 and overwrite the method public float score().

public float

score()

throws

IOException{

float

div=0f;

float

num=0f;

float

den=0f;

float

length = 0f;

float

norm = Similarity.decodeNorm(

this

.norm[this.doc()]);
length = 1 / (norm * norm);
den= (0.5f*length)/

this

.av_length;
den=den+

this

.termDocs.freq()+0.5f;
num=this.reader.numDocs()/

this

.reader.docFreq(this.term.getTerm());
num=(

float

)Math.pow(num, 0.35f);
num=num*

this

.termDocs.freq();
div=num/den;
return div;
}


Step 5: The following method finds the average document length (avgdl). The first parameter is an IndexReader that will parse the documents in the collection and the second parameter is a string that represents the Field for searching. In the next Step we will call this method to set the average document length automatically.

public static float

getAvgLength(IndexReader reader,

String

field)

throws

IOException{

   int

sum=0;
   for (

int

i = 0; i < reader.numDocs(); i++){
      TermFreqVector tfv= reader.getTermFreqVector(i, field);
      if(tfv!=

null

){
         int[] tfs=tfv.getTermFrequencies();
         for(

int

j= 0;j < tfv.size(); j++){
            sum=sum+tfs[j];
         }
      }
   }

   float

avg=(

float

)sum/reader.numDocs();

   System.out.println

("average length = " + avg);

   return

avg;
}
//end of method


Step 6: Now you are ready to search using the Axiomatic score function.

First of all, writing the next code you can automatically set the value of Average Document Length.

BM25Parameters.setAverageLength(field,getAvgLength(reader,field));
// the variable field is a string that repsesents the name of the Searched Field.


You can use the next example creating a new java file in your lucene code.For instance you can create this file named "Axiomatic_Search.java" to lucene/demo/Axiomatic_Search.java
Writing the following example you can search for the query = "Cystic hydroma" and print the top-10 results according to the Axiomatic score function.

public class

Axiomatic_Search{


public static float

getAvgLength(IndexReader reader,

String

field)

throws

IOException{

   int

sum=0;
   for (

int

i = 0; i < reader.numDocs(); i++){
      TermFreqVector tfv= reader.getTermFreqVector(i, field);
         if(tfv!=

null

){
            int[] tfs=tfv.getTermFrequencies();
            for(

int

j= 0;j < tfv.size(); j++){
               sum=sum+tfs[j];
            }
         }
   }

   float

avg=(

float

)sum/reader.numDocs();

   System.out.println

("average length = " + avg);

   return

avg;
}
//end of method


public static void

main (

String

args[])

throws

IOException, ParseException{

   String

index="index";
   IndexReader reader=

null

;

   try

{
      reader=IndexReader.open(index);
      }

catch

(CorruptIndexException e1) {e1.printStackTrace();}
       

catch

(IOException e1) {e1.printStackTrace();}

   String

field="Search_Field";
   Searcher searcher = new IndexSearcher("index");
   Analyzer analyzer = new StandardAnalyzer();
   //the second parameter calls the getAvgLength method and automatically calculates the   Average Length value

   BM25Parameters.setAverageLength(field,getAvgLength(reader,field));
   BM25BooleanQuery query = new BM25BooleanQuery( "Cystic hydroma" ,field,analyzer);

   System.out.println

("Searching for: " + query.toString(field));
   TopDocs top=searcher.search(query, 10);
   ScoreDoc[] docs = top.scoreDocs;
   for (

int

i= 0;i<10; i++){

   System.out.println

("the document with id= " + docs[i].doc + " has score ="+docs[i].score);
   }
}
//end of main


}
//end of class





RETURN TO THE MAIN PAGE