Text Categorization




Download the collection Reuters. This collection contains a train set with 10 categories and a test set. Use the Lucene search engine and implement the k-NN algorithm for the categorization of the documents in the test set. For the categorization use the relations:

Evaluate your results calculating the accuracy defined by the ratio of the number of documents correctly classified to the total number of documents in the test set.