A Sentiment Analysis approach by identifying the subject object relationship
Meta
Goals
Result
Author
Year
2017
Vakkalanka Sri Harika
Tetali Sai Krishna Chaitanya
Prabadevi.B
Analyzing a given sentence/paragraph which has multiple subjects, objects, verbs, adjectives and adverbs
Information
Abstract
This relation is useful for identifying the sentiment and the elements responsible for exhibiting the respective emotion. Adjective(s) are used to add more attributes to the respective subject/object. Based on this, the polarity for a given statement is analyzed
A relationship is established between subject(s) with the respective object(s) based on the verb(s) and their adverb(s).
A novel model is put forward for analyzing a given sentence/paragraph which has multiple subjects, objects, verbs, adjectives and adverbs
The opinion of a text is determined by the semantics and the contextual information
We, humans, can analyze the sentence and determine the sentiment or the polarity of the sentence by having a glance at it, but for a system to perform the same job, a clear understanding over the types of sentences and their structure is required
The structure of a sentence plays a huge role in determining the context as few changes in the position of the words can change the entire meaning of the sentence. The structure educates us on the features like who(subject) does what action(verb) on whom(object)
and hence, the opinion or action of the subject on the object can be identified
This gives a great idea on the relationship between the subject and object and it is also very useful in calculating the sentence polarity.
parts-of-speech
Adjective
Verb
Describes the action done by the subject on the object, or the feeling subject has on the object
Adverbs
Adverbs can also be added to the verbs to multiply else lessen the emotion of the verb
Generally in simple sentences, there will be single pair of subject and object. In compound sentences, the conjunctions used are coordinating conjunctions (for, and, nor, but, or, yet, so). This consists of multiple subjects and objects.
With a clear knowledge over the structure of sentences, we have to consider the parsing, analysis, tagging, tokenization and sentimental analysis of the given input sentence/paragraph.
Tokenization method is beneficial in splitting the given sentence into list of words and we can analyze each word basing upon it’s position and it’s parts-of-speech. For knowing the parts-of-speech we use tagging, where each word is tagged with it’s respective partsof-speech.
NLTK
Natural Language Processing Toolkit
Human language data can be processed with NTLK using its interfaces, lexical resources (WordNet) and other text processing libraries.
NTLK features
There are multiple features available in NLTK toolkit and we use it’s , and features. does tokenization. function easens the job in tagging the words with their respective parts-of-speech.
word_tokenize()
pos_tag()
senti_synset()
Does tokenization
Easens the job in tagging the words with their respective parts-of-speech
Uses the data from SentiWordNet which is a lexical resource for opinion mining to evaluate the positive, negative or neutral scores for the verbs, adverbs and adjectives in the given sentence
The respective positivity, negativity the given word carries are determined through these scores. It only deals with verbs, adjectives and adverbs as these alone have the ability to
determine polarity of the sentence as the remaining parts-of speech doesn’t have the ability to carry the polarity
The cumulative weighted value of individual scores gives the
complete sentiment of the given statement.
Objective
To determine the subject and object relationship and the sentimental analysis value using the verb and adverb aggregate feeling.
To determine the polarity of the sentence using the sentiment analysis value and adjective's aggregate feeling value.
Proposed System
Implementation
A simple sentence is basically constructed with a single subject or with a set of subjects and it’s object or set of objects. Whereas in case of compound sentences, there can be multiple subjects and their multiple objects involved in multiple phrases.
Hence, the number of subjects and the number of objects each subject has are entirely variable. The model is built to determine the subject with its set of objects and the action it does on them.
- Analyze the Parts-of-speech
Analyze the parts-of-speech of individual words in the given sentence with the help of NLTK’s POS-tagger
The data is sent to POS tagger (PartsOf-Speech tagger) which gives out the parts of speech of every word. The word’s parts-of-speech depends upon the position it is in and the structure of the sentence.
For example, a single word can act as adjective for the subject or as the adjective of the object depending upon its position
Nouns and Pronouns act as subjects and objects depending upon the verb’s position. The adjectives are useful in adding more information to the subject or the object.
Similar parts-of speech can be made as a collection using conjunctions and these are also useful to determine the compound statements.
Hence, we consider only Noun/Pronoun, Verb, Adverb, Adjective and Conjunction and ignore other parts of speech as they do not contribute to our analysis.
3.Identify the subject-object relationship
The verbs are useful in identifying whether a given noun is subject or object. If the noun occurs before the verb, then it is classified as a subject and if it occurs after the verb, then it is classified as object.
If there are multiple nouns before the verb occurrence, then they are considered as the set of subjects, else they are considered as the set of objects.
These set of subjects and objects are maintained in 2 respective lists. Conjunctions contribute here by appending multiple subjects/objects for making the respective sets
For every single subject in subject’s list, there can be multiple objects associated with it and the verb specifies the action or the opinion the subject has on object. Using this, the relationship between the subject and the object can be identified as(polarity-wise) positive, negative or neutral[9] depending upon the verb scores.
If an adjective is met, then it is added as an attribute to the last inserted subject/object of the respective list of the same sentence, else if no subject/object is inserted previously, it is added to the next met noun(subject/object).
For the list of subjects else objects, each element in them can have its own list of adjectives. The adverb is added as an attribute to the last inserted verb in the verb list, else it is added to the next met verb. In case verb list has more than 1 verb in the same phrase, there can be different list of adverbs associated with each verb.
- Sentiment analysis
Obtain the scores of adjectives using SentiWordNet and calculate the aggregate adjective’s value by adding the scores of the adjectives.
The aggregate feeling of adverbs and verbs are calculated using these scores obtained from SentiWordNet to determine the sentiment of the sentence and to identify the relationship between subject and object
If positive value is greater than negative value then it is positive word. Else it is a negative word. Else if both the values are same or 0 then, we considered it to be neutral word. Labelling the given word into positive, negative or neutral type is done.
Using SentiWordNet, calculate the positive and negative
scores associated with each verb, adverb and adjective.
So, for the complete polarity of the given statement, we add the aggregate sentiment value to the adjective's aggregate value.
To obtain the above value, we start by adding the attributes to the respective words. So, we add the adjective(s) to the respective subject(s), object(s) and adverb(s) to the respective verb(s).
The aggregate sentiment value is calculated using the score of verb list and adverb list associated with the respective subjects and objects. The adjectives don’t determine the relationship between the subject and object, but they add more polarity to the given statement.
For continuously occurring adverbs, the given adverb defines the feeling of following adverb. Here, the positive and negative polarities of the next adverb is increased or decreased by the respective positive and negative polarities of the present adverb
If the adverb is of the same type(positive/negative/neutral) as the following adverb then, the positive and negative feeling of the following adverb are increased by the respective positive and negative feelings of the present adverb. If the adverb is not of the same type as the following adverb then the converse is applied.
If the adverbs occur separately, defining the same verb, then the adverbs will not have any relationship among themselves. Here, the summation of the respective polarities yield the adverbs aggregate feeling.
If a verb has both continuous and separated adverbs, then the summation of individual scenarios mentioned above results the aggregate adverb value.
The positive and negative scores of the aggregate adverbs are added to the respective verb’s positive and negative scores to get the complete polarity of the given verb. Add the complete polarities obtained for each verb in the list to get the complete polarity of the sentence.
- Compound Statements
In this module, we don’t consider the recognized parts-of-speech of the before phrase. The verbs, subjects, objects, adjectives and adverbs are identified for this phrase using the above methods and the result lists are appended to the previous phrase lists in a way that a nested list is formed where each list corresponds to each phrase in the sentence.
In this scenario, we considered the later phrase of the sentence to be a complete new sentence and perform the respective subject and object analysis similar to the first part
In case of compound sentences, the ,
(comma) occurs followed by a coordinating conjunction.
A compound sentence is a sentence that has at least two independent clauses joined by a comma, semicolon or conjunction.
According to the position of the phrase in the sentence, the list of parts-of-speech are identified and their relationship with each other is also established in these nested lists.
These are all maintained in the same lists because at the end when calculating the sentiment and polarity of the sentence, we require to consider all these elements.
Algorithm
Last word we meet is them
, a pronoun. So, this noun is added into object_list in the nested_object_list of the given sentence as the verb_list in the nested_verb_list is not empty
The next word is hate
, a verb so, we add hate
to the verb_list in the nested_verb_list of this sentence.
I is read and since the verb_list in nested_verb_list is empty for this sentence, I is added into subject_list of nested_subject_list.
Now the compound sentence condition is met (<space>,
+conjunction
). So the nestedlists are created in a way that the present sentence's attributes are all added into the list and the later part of the sentence is considered as a new sentence.
After Ramesh, we come across really
and completely
adverbs. These are continuous adverbs.
The next word we meet is Mahesh which is a noun, since verb_list is still empty, we add Mahesh to the subject_list. As a noun is met, we add the adjectives glorious
and mighty
to the adjective_list of Mahesh.
Glorious
and mighty
are the next words we come across, separated with a conjunction and since these are adjectives, we wait for a noun to be added.
Next word we read will be Suresh, a noun. Since verb_list is empty is met, it is added into subject_list and the adjective skillful
that was added before is appended to the adjectives list of this subject.
When the parts of speech tagging is done for the given sentence, we start with skillful
. Since it is an adjective no subject or object is added, it waits till a noun is met
Initially declare lists of parts of speech i.e., verb_list for verbs, subject_list for subjects, object_list for objects, adverb_list for adverbs(attribute for verb in verb_list), adjective_list for adjectives(attribute for subject/object in their respective list).
"Skillful Suresh, glorious and mighty Mahesh, peaceful Ramesh really completely like apples, but I hate them"
The next met word is like
, a verb. So, we append the verb like
into the verb_list and the adverbs really
and completely
to the adverbs_list of like
. apples
is the next noun we meet as the verb_list is not empty, apples
is added into object_list.
We check if there are any other attributes that are not appended. Since, there are no words remaining that are to be added, so the execution is done. Else, we have to add the remaining attributes to the respective words.
click to edit
This subject and object can be associated with their respective adjectives which add more information to the respective subject and the object
- Split the Paragraph
Initially the data, which is structured and pre-processed is extracted from the documents. In turn, it is split into various sentences using the delimiter .
(dot). These sentences are stored in a list.
The given set of adverbs might occur continuously, defining a single verb, else occur separately defining the same verb
Similarly peaceful
is added to the subject Ramesh
and Ramesh
is added to the subject_list.