A Sentiment Analysis approach by identifying the subject object relationship

Meta

Goals

Result

Author

Year

2017

Vakkalanka Sri Harika

Tetali Sai Krishna Chaitanya


Prabadevi.B

Analyzing a given sentence/paragraph which has multiple subjects, objects, verbs, adjectives and adverbs

Information

Abstract

This relation is useful for identifying the sentiment and the elements responsible for exhibiting the respective emotion. Adjective(s) are used to add more attributes to the respective subject/object. Based on this, the polarity for a given statement is analyzed

A relationship is established between subject(s) with the respective object(s) based on the verb(s) and their adverb(s).

A novel model is put forward for analyzing a given sentence/paragraph which has multiple subjects, objects, verbs, adjectives and adverbs

The opinion of a text is determined by the semantics and the contextual information

We, humans, can analyze the sentence and determine the sentiment or the polarity of the sentence by having a glance at it, but for a system to perform the same job, a clear understanding over the types of sentences and their structure is required

The structure of a sentence plays a huge role in determining the context as few changes in the position of the words can change the entire meaning of the sentence. The structure educates us on the features like who(subject) does what action(verb) on whom(object) and hence, the opinion or action of the subject on the object can be identified

This gives a great idea on the relationship between the subject and object and it is also very useful in calculating the sentence polarity.

parts-of-speech

Adjective

Verb

Describes the action done by the subject on the object, or the feeling subject has on the object

Adverbs

Adverbs can also be added to the verbs to multiply else lessen the emotion of the verb

Generally in simple sentences, there will be single pair of subject and object. In compound sentences, the conjunctions used are coordinating conjunctions (for, and, nor, but, or, yet, so). This consists of multiple subjects and objects.

With a clear knowledge over the structure of sentences, we have to consider the parsing, analysis, tagging, tokenization and sentimental analysis of the given input sentence/paragraph.

Tokenization method is beneficial in splitting the given sentence into list of words and we can analyze each word basing upon it’s position and it’s parts-of-speech. For knowing the parts-of-speech we use tagging, where each word is tagged with it’s respective partsof-speech.

NLTK

Natural Language Processing Toolkit

Human language data can be processed with NTLK using its interfaces, lexical resources (WordNet) and other text processing libraries.

NTLK features

There are multiple features available in NLTK toolkit and we use it’s , and features. does tokenization. function easens the job in tagging the words with their respective parts-of-speech.

word_tokenize()

pos_tag()

senti_synset()

Does tokenization

Easens the job in tagging the words with their respective parts-of-speech

Uses the data from SentiWordNet which is a lexical resource for opinion mining to evaluate the positive, negative or neutral scores for the verbs, adverbs and adjectives in the given sentence

The respective positivity, negativity the given word carries are determined through these scores. It only deals with verbs, adjectives and adverbs as these alone have the ability to
determine polarity of the sentence as the remaining parts-of speech doesn’t have the ability to carry the polarity

The cumulative weighted value of individual scores gives the
complete sentiment of the given statement.

Objective

To determine the subject and object relationship and the sentimental analysis value using the verb and adverb aggregate feeling.

To determine the polarity of the sentence using the sentiment analysis value and adjective's aggregate feeling value.

Proposed System

Implementation

A simple sentence is basically constructed with a single subject or with a set of subjects and it’s object or set of objects. Whereas in case of compound sentences, there can be multiple subjects and their multiple objects involved in multiple phrases.

Hence, the number of subjects and the number of objects each subject has are entirely variable. The model is built to determine the subject with its set of objects and the action it does on them.

  1. Analyze the Parts-of-speech

Analyze the parts-of-speech of individual words in the given sentence with the help of NLTK’s POS-tagger

The data is sent to POS tagger (PartsOf-Speech tagger) which gives out the parts of speech of every word. The word’s parts-of-speech depends upon the position it is in and the structure of the sentence.

For example, a single word can act as adjective for the subject or as the adjective of the object depending upon its position

Nouns and Pronouns act as subjects and objects depending upon the verb’s position. The adjectives are useful in adding more information to the subject or the object.

Similar parts-of speech can be made as a collection using conjunctions and these are also useful to determine the compound statements.

Hence, we consider only Noun/Pronoun, Verb, Adverb, Adjective and Conjunction and ignore other parts of speech as they do not contribute to our analysis.

3.Identify the subject-object relationship

The verbs are useful in identifying whether a given noun is subject or object. If the noun occurs before the verb, then it is classified as a subject and if it occurs after the verb, then it is classified as object.

If there are multiple nouns before the verb occurrence, then they are considered as the set of subjects, else they are considered as the set of objects.

These set of subjects and objects are maintained in 2 respective lists. Conjunctions contribute here by appending multiple subjects/objects for making the respective sets

For every single subject in subject’s list, there can be multiple objects associated with it and the verb specifies the action or the opinion the subject has on object. Using this, the relationship between the subject and the object can be identified as(polarity-wise) positive, negative or neutral[9] depending upon the verb scores.

If an adjective is met, then it is added as an attribute to the last inserted subject/object of the respective list of the same sentence, else if no subject/object is inserted previously, it is added to the next met noun(subject/object).

For the list of subjects else objects, each element in them can have its own list of adjectives. The adverb is added as an attribute to the last inserted verb in the verb list, else it is added to the next met verb. In case verb list has more than 1 verb in the same phrase, there can be different list of adverbs associated with each verb.

  1. Sentiment analysis

Obtain the scores of adjectives using SentiWordNet and calculate the aggregate adjective’s value by adding the scores of the adjectives.

The aggregate feeling of adverbs and verbs are calculated using these scores obtained from SentiWordNet to determine the sentiment of the sentence and to identify the relationship between subject and object

If positive value is greater than negative value then it is positive word. Else it is a negative word. Else if both the values are same or 0 then, we considered it to be neutral word. Labelling the given word into positive, negative or neutral type is done.

Using SentiWordNet, calculate the positive and negative
scores associated with each verb, adverb and adjective.

So, for the complete polarity of the given statement, we add the aggregate sentiment value to the adjective's aggregate value.

To obtain the above value, we start by adding the attributes to the respective words. So, we add the adjective(s) to the respective subject(s), object(s) and adverb(s) to the respective verb(s).

The aggregate sentiment value is calculated using the score of verb list and adverb list associated with the respective subjects and objects. The adjectives don’t determine the relationship between the subject and object, but they add more polarity to the given statement.

For continuously occurring adverbs, the given adverb defines the feeling of following adverb. Here, the positive and negative polarities of the next adverb is increased or decreased by the respective positive and negative polarities of the present adverb

If the adverb is of the same type(positive/negative/neutral) as the following adverb then, the positive and negative feeling of the following adverb are increased by the respective positive and negative feelings of the present adverb. If the adverb is not of the same type as the following adverb then the converse is applied.

If the adverbs occur separately, defining the same verb, then the adverbs will not have any relationship among themselves. Here, the summation of the respective polarities yield the adverbs aggregate feeling.

If a verb has both continuous and separated adverbs, then the summation of individual scenarios mentioned above results the aggregate adverb value.

The positive and negative scores of the aggregate adverbs are added to the respective verb’s positive and negative scores to get the complete polarity of the given verb. Add the complete polarities obtained for each verb in the list to get the complete polarity of the sentence.

  1. Compound Statements

In this module, we don’t consider the recognized parts-of-speech of the before phrase. The verbs, subjects, objects, adjectives and adverbs are identified for this phrase using the above methods and the result lists are appended to the previous phrase lists in a way that a nested list is formed where each list corresponds to each phrase in the sentence.

In this scenario, we considered the later phrase of the sentence to be a complete new sentence and perform the respective subject and object analysis similar to the first part

In case of compound sentences, the ,(comma) occurs followed by a coordinating conjunction.

A compound sentence is a sentence that has at least two independent clauses joined by a comma, semicolon or conjunction.

According to the position of the phrase in the sentence, the list of parts-of-speech are identified and their relationship with each other is also established in these nested lists.

These are all maintained in the same lists because at the end when calculating the sentiment and polarity of the sentence, we require to consider all these elements.

Algorithm

Last word we meet is them, a pronoun. So, this noun is added into object_list in the nested_object_list of the given sentence as the verb_list in the nested_verb_list is not empty

The next word is hate, a verb so, we add hate to the verb_list in the nested_verb_list of this sentence.

I is read and since the verb_list in nested_verb_list is empty for this sentence, I is added into subject_list of nested_subject_list.

Now the compound sentence condition is met (<space>,+conjunction). So the nestedlists are created in a way that the present sentence's attributes are all added into the list and the later part of the sentence is considered as a new sentence.

After Ramesh, we come across really and completely adverbs. These are continuous adverbs.

The next word we meet is Mahesh which is a noun, since verb_list is still empty, we add Mahesh to the subject_list. As a noun is met, we add the adjectives glorious and mighty to the adjective_list of Mahesh.

Glorious and mighty are the next words we come across, separated with a conjunction and since these are adjectives, we wait for a noun to be added.

Next word we read will be Suresh, a noun. Since verb_list is empty is met, it is added into subject_list and the adjective skillful that was added before is appended to the adjectives list of this subject.

When the parts of speech tagging is done for the given sentence, we start with skillful. Since it is an adjective no subject or object is added, it waits till a noun is met

Initially declare lists of parts of speech i.e., verb_list for verbs, subject_list for subjects, object_list for objects, adverb_list for adverbs(attribute for verb in verb_list), adjective_list for adjectives(attribute for subject/object in their respective list).

"Skillful Suresh, glorious and mighty Mahesh, peaceful Ramesh really completely like apples, but I hate them"

The next met word is like, a verb. So, we append the verb like into the verb_list and the adverbs really and completely to the adverbs_list of like. apples is the next noun we meet as the verb_list is not empty, apples is added into object_list.

We check if there are any other attributes that are not appended. Since, there are no words remaining that are to be added, so the execution is done. Else, we have to add the remaining attributes to the respective words.

click to edit

This subject and object can be associated with their respective adjectives which add more information to the respective subject and the object

  1. Split the Paragraph

Initially the data, which is structured and pre-processed is extracted from the documents. In turn, it is split into various sentences using the delimiter .(dot). These sentences are stored in a list.

The given set of adverbs might occur continuously, defining a single verb, else occur separately defining the same verb

Similarly peaceful is added to the subject Ramesh and Ramesh is added to the subject_list.