Pre-trained Python model for Sentiment Analysis
--
This article will teach you how to use VADER for Sentiment Analysis tasks.
What is Sentiment Analysis?
Sentiment Analysis is the process of extracting information from a message to determine its tone (positive, negative, neutral, etc.) or intensity (super happy, somewhat happy, just happy, etc.).
Training your own Sentiment Analyzer is typically very expensive. You would need a large data set labeled by humans and would have to carry out a long process of feature extraction, creation, and selection.
Luckily, there are pre-trained Sentiment Analyzers that can save you a ton of work. In particular, VADER is a popular rule-based model that can break down a given text into positive, neutral and negative sentiments.
How to use VADER
VADER stands for Valence Aware Dictionary and sEntiment Reasoner. It is an open-source project that was specifically trained with content posted on social media. You can check out the source code here.
Step 1. Install VADER
First you need to install the library, so open up a new Terminal and:
% pip install vaderSentiment
Step 2. Use the model
Open a Python file, import the library and start classifying text!
# Import sentiment analyzer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer# Initialize model
analyzer = SentimentIntensityAnalyzer()# Declare some text
angry_review = 'The food was disgusting. I am never coming back here again!!'# Analyze the text with polarityScores
analyzer.polarity_scores(angry_review)> {'compound': -0.6103, 'neg': 0.285, 'neu': 0.715, 'pos': 0.0}
The model receives a string and returns a dictionary with four scores. The most important output is the compound
score. It is a number bound between -1 and 1 that combines information from the negative, neutral and positive scores to determine the overall tone of the message.
Step 3. Turn the model into a Classifier
You can turn the score into a binary classifier by defining a cutoff point such that compound scores above this threshold are classified as positive (1), while compound scores below it are classified as non-positive (0).
def classify_positive(text, threshold=0):
# Score text
score = analyzer.polarity_scores(text) # Get compound score from dictionary
score = score.get('compound') # Classify text according to threshold
if score >= threshold:
pred_class = 1
else:
pred_class = 0 # Return prediciton
return pred_class
Let’s try out the function on the angry review:
# Test function on an angry review
print('Predicted class:', classify_positive(text=angry_review, threshold=0))> Predicted class: 0
Further steps
If you want to take this one step further, you can manually label a few texts as positive (1) or non-positive (0), get their compound scores and choose the cutoff point that maximizes the accuracy (or some other metric).
Additionally, you can re-train VADER by adding new words to the model’s lexicon file. You can learn how to do so by clicking here.
Closing Remarks
VADER is an extremely flexible model trained for Sentiment Analysis. It can be easily turned into a classifier and can further learn new words to adapt to new kinds of slang.