Download the collection Reuters. This collection contains a train set with 10 categories and a test set.
Use the Lucene search engine and implement the k-NN algorithm for the categorization of the documents in the test set.
For the categorization use the relations:
Evaluate your results calculating the accuracy defined by the ratio of the number of documents correctly classified to the total number of documents
in the test set.