Find Polarized Words In Social Networks

Customer reviews of products have positive or negative orientation depending on their experience. Determining the polarity of the reviews depend on the orientation of the words in the text, where some words are positive oriented and some are negative oriented. In this work, we present an approach to build polarized lexicon tables, and then apply these tables to the review polarity classification task.

Random Walks with Restart

Our approach is based on the "random walks with restart" algorithm [1,2]. We split each document of a training collection into sentences and we suppose that there is a relation between consecutive words inside the sentence. For example, the sentence "Smoking causes cancer and other chronic diseases" is interpreted by the graph: {Smoking --> causes --> cancer --> chronic --> diseases}. In this way a directed graph is constructed with nodes labeled by the words in the training examples and links connecting adjacent words inside the same sentence.

Example: Suppose that our traing collection contains a documents with three sentences:

"Smoking causes chronic heart diseases. Smoking reduses oxygen from heart. Cancer and other chronic diseases are caused by smoking."

The resulted graph from these sentences is presented in the following figure. We note that we have remove stopwords from sentences and we have applied stemming in the remaining terms.

graph

A link from node i to node j is weighted by the number of co-occurrences of all the ordered pairs of terms i and j in the training set. We define a small number of polarized words and then the bias vector p is constructed assigning a high probability mass to these words. The intuition is that after some iterations of the algorithm the initial polarized words will "pull" neab by other words of the same polarity until convergence where all words will have a constant polarity score.

For more details read the theoretical background of the algorithm.

Software used and Implementation Details

The software used in this project includes:

- End of Sentence Detector: A heuristic sentence boundary detector was used from LingPipe.

- StringTokenizer class of Java.

-StopWord List.

For more details read the implementation details.

 

Experimental results

For testing our algorithm we used the movie reviews collection. This dataset was introduced in 2004 by Pang/Lee [3] and contains 1000 positive and 1000 negative movie reviews.

The size of the initial set of positive words was 39. This set was constructed from the words {good, like, magnificent, adore} enriched by some of their synomyms obtain from the merriam-webster dictionary.

How to use this application

To use this application you need the following:

-The jar file of the program.

-A file for the positive words and another for the negative words, related to a topic.

-A set of training files.

-The library jar lingpipe 4.1.0 .

For more details read the instructions section.