L23: Sentiment Analysis

Bogdan G. Popescu

John Cabot University

Intro

One of the most important and practical subfields of text analysis is sentiment analysis

Sentiment analysis allows us to determine the emotional tone of a text.

It can be applied to:

  • brand monitoring
  • customer feedback analysis
  • political party manifestos
  • social media sentiment analysis, etc.

Models of Sentiment Analysis

Various approaches have been developed to perform sentiment analysis. These range from simple rule-based methods to cutting-edge deep learning techniques.

Some of the most common sentiment analysis models include:

  • Lexicon-based analysis
  • Machine learning
  • Pre-trained transformer-based deep learning

Models of Sentiment Analysis

Lexicon-based analysis

This type of analysis entails using predefined rules to determine the sentiment of text.

For example, presence of actual positive or negative words determine the sentiment of text.

Lexicon-based analysis are easy to implement but may not be as accurate (e.g. there might be issues like context sensitivity and sarcasm detection)

Models of Sentiment Analysis

Machine Learning

This entails training a data to identify the sentiment of a piece based on a set of labeled training data.

This can be trained on a variety of ML algorithms: e.g. decision trees, support vector machines (SVMs), and neural networks.

These are typically more accurate than lexicon analysis, but they require a larger amount of training data.

Sentiment Analysis: Vader

Let us now perform sentiment analysis using NLTK in python

We will do this on the same data

#Step1: Load library
import pandas as pd
import string
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
nltk.download('all')
from nltk.stem import WordNetLemmatizer, PorterStemmer

#Step2: Reading the CSV data into a DataFrame
import pandas as pd
import numpy as np
texts_df = pd.read_csv("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week8/lecture8b/data/aggression_texts.csv")
True

Pre-processing Text

We can use the function that we created previously to clean the text:

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
import string
import nltk

# Ensure necessary NLTK resources are downloaded
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')

# Initialize lemmatizer and stemmer
lemmatizer = WordNetLemmatizer()
stemmer = PorterStemmer()

# Define lemmatization function
def lemmatize_text(tokens):
    return [lemmatizer.lemmatize(word) for word in tokens]

# Define stemming function
def stem_text(tokens):
    return [stemmer.stem(word) for word in tokens]

# Preprocessing pipeline
def preprocess_text(df, text_column):
    # Step 1: Ensure the specified column contains only strings
    df[text_column] = df[text_column].astype(str)

    # Step 2: Convert text to lowercase
    df["text2"] = df[text_column].str.lower()

    # Step 3: Remove punctuation
    df["text2"] = df["text2"].str.translate(str.maketrans('', '', string.punctuation))

    # Step 4: Remove numbers
    df["text2"] = df["text2"].str.replace(r'[0-9]+', " ", regex=True)

    # Step 5: Tokenize, lemmatize, stem, and remove stopwords
    stop_words = set(stopwords.words('english'))
    df["text2"] = df["text2"].apply(
        lambda text: ' '.join(
            [stemmer.stem(lemmatizer.lemmatize(word))  # Apply lemmatization and stemming
             for word in word_tokenize(text)          # Tokenize text
             if word not in stop_words]               # Remove stopwords
        )
    )

    # Step 6: Remove rows with NaN values in the processed column
    df = df.dropna(subset=["text2"])

    return df
True
True
True

Pre-processing Text

We now apply the function and let us observe the changes.

# Randomly sample 1000 rows
#sampled_df = texts_df.sample(n=1000, random_state=42)  # random_state ensures reproducibility
preprocessed_df = preprocess_text(texts_df, 'body')
preprocessed_df['document_id'] = range(1, len(preprocessed_df) + 1)
preprocessed_df.head(10)
   person_id  ... document_id
0      10042  ...           1
1      11727  ...           2
2      24938  ...           3
3      10579  ...           4
4      25034  ...           5
5      24749  ...           6
6      10250  ...           7
7      10358  ...           8
8      11455  ...           9
9      10257  ...          10

[10 rows x 28 columns]

Sentiment Analysis

Let us first initialize a Sentiment Intensity Analyzer object from the nltk.sentiment.vader library.

Note VADER’s strengths are in analyzing social media and informal text. It is less specific for texts with domain-specific vocabulary.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

We will first define a function called rate_sentiment which takes a string as an input

  • the function calls the polarity_scores method to get a dictionary of sentiment scores for the text
  • these could be positive, negative, or neutral
  • we code positive sentiment with 1 and 0 otherwise

Sentiment Analysis

This is the function:

def get_sentiment(text):
    scores = analyzer.polarity_scores(text)
    return scores['pos'], scores['neg'], scores['neu']  # Return positive, negative, and neutral scores

And this is how we apply it to our customer reviews:

# Apply the function to your DataFrame and store the results in new columns
preprocessed_df[['pos_score', 'neg_score', 'neu_score']] = preprocessed_df['body'].apply(
    lambda text: pd.Series(get_sentiment(text))
)

Sentiment Analysis

This is how we now calculate sentiment by year and gender.

import numpy as np
import pandas as pd
from scipy import stats

# Group by year and calculate total speeches, mean sentiment, and standard deviation
trends = (
    preprocessed_df
    .groupby(["year", "gender"], as_index=False)
    .agg(
        total_speeches=('document_id', 'count'),  # Total number of speeches
        mean_sentiment=('pos_score', 'mean'),    # Mean sentiment
        sd_sentiment=('pos_score', 'std')        # Standard deviation of sentiment
    )
)

Sentiment Analysis

This is how we now calculate sentiment by year and gender.

# Use total speeches as the sample size (n)
trends["n"] = trends["total_speeches"]

# Calculate standard error (SE) of the mean sentiment
trends["se"] = trends["sd_sentiment"] / np.sqrt(trends["n"])

# Set confidence level for the confidence intervals
confidence_level = 0.95
t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, df=trends["n"] - 1)

# Calculate confidence intervals for the mean sentiment
trends["ci_lower"] = trends["mean_sentiment"] - t_value * trends["se"]
trends["ci_upper"] = trends["mean_sentiment"] + t_value * trends["se"]

Sentiment Analysis

This is how mean sentiment varies by year and gender.

Transformers

We will be using an NLP transformers model: the `bert-base-multilingual-uncased-sentiment’

Transformers are a subset of machine learning models.

This is a model fine-tuned for sentiment analysis on product analysis in languages such as:

  • English
  • German
  • French
  • Spanish
  • Italian

The model will predict sentiment of a text consisting of rankings between 1 and 5.

Transformers

Let us first load the relevant packages

Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

And let us now look at one example:

Python
tokens = tokenizer.encode('I loved this, absolutely the best', return_tensors="pt")

The output from the model is a one-hot encoded list of scores.

The scores are logits (unscaled probabilities) and require softmax transformation to interpret as probabilities.

The position with the highest score represents the sentiment rating.

For example. [.9, .2, .1, -.2, -5] is a rating of 1.

Results

These are the results:

Python
tokens = tokenizer.encode('I loved this, absolutely the best', return_tensors="pt")
result = model(tokens)
result
SequenceClassifierOutput(loss=None, logits=tensor([[-2.3882, -2.5314, -1.2469,  1.1947,  4.1058]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
Python
result.logits
tensor([[-2.3882, -2.5314, -1.2469,  1.1947,  4.1058]],
       grad_fn=<AddmmBackward0>)
int(torch.argmax(result.logits))
4

Another example

Let us now look at another example:

Python
tokens = tokenizer.encode('This was pretty bad. I hated it', return_tensors="pt")
result = model(tokens)

The predicted result will be:

Python
result.logits
tensor([[ 3.4890,  2.3720,  0.4171, -2.3260, -3.1634]],
       grad_fn=<AddmmBackward0>)
int(torch.argmax(result.logits))
0

Running the model on a dataframe

Remember that we already had a dataframe preprocessed_df

We first should define a function:

Python
def sentiment_score_bert(review):
  tokens = tokenizer.encode(review, return_tensors = "pt")
  result = model(tokens)
  return int(torch.argmax(result.logits))+1

We can now apply this to the dataframe

Python
preprocessed_df2 = preprocessed_df.sample(n=1000, random_state=42)  # random_state ensures reproducibility
preprocessed_df2["sentiment_bert"] = preprocessed_df2["body"].apply(lambda x: sentiment_score_bert(x[:512]))
Python
preprocessed_df2["sentiment_bert_transform"] = preprocessed_df2['sentiment_bert'].apply(lambda x: 1 if x > 3 else 0)

Investigating the Results

Let us now investigate some of the results:

Python
good_speech = preprocessed_df2["body"][preprocessed_df2["sentiment_bert"]==5].iloc[0]
good_speech
'What I say to the right honourable Gentleman, for whom I have great respect, is that the motion says "exclusively" ISIL because that was a promise I made in this House in response to points made from both sides of the House. As far as I am concerned, wherever members of ISIL are, wherever they can be properly targeted, that is what we should do. Let me just make this point, because I think it is important when we come to the argument about ground troops. In my discussions with the King of Jordan, he made the point that in the south of Syria there is already not only co-operation among the Jordanian Government, the French and the Americans, and the Free Syrian Army, but a growing ceasefire between the regime troops and the Free Syrian Army so that they can turn their guns on ISIL. That is what I have said: this is an ISIL-first strategy. They are the threat. They are the ones we should be targeting. This is about our national security.   Several honourable Members  rose   -'

Investigating the Results

Let us now look at a bad review

Python
bad_speech = preprocessed_df2["body"][preprocessed_df2["sentiment_bert"]==1].iloc[1]
bad_speech
"I have absolutely no confidence whatever. My honourable Friend's point goes to the heart of the dodgy economics in the dodgy dossier. The figure of £100 million is entirely predicated on the assumption that there will almost be a net offset of the £3 billion static loss as a result of the change to the 50p rate, through people deciding to work harder, save less, take their income in the current year, and hide their income less. Those are the behavioural changes that the Government are assuming will take place, but there is no real evidence base for the change. It is fundamentally dodgy."

Other Sentiments

DistilRoBERTa-base is a good model to classify emotions from text.

The j-hartmann/emotion-english-distilroberta-base classifies texts into seven classes:

  • anger
  • disgust
  • fear
  • joy
  • neutral
  • sadness
  • surprise

Other Sentiments

We first read our dataset:

Python
import pandas as pd
from transformers import pipeline
import numpy as np  # Add this line
from transformers import pipeline

# Step 2: Read the CSV file into a DataFrame
texts_df = pd.read_csv("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week8/lecture8b/data/aggression_texts.csv")

# Randomly sample 1,000 speeches from the "body" column
sampled_df = texts_df.sample(n=1000, random_state=42)
sampled_texts = sampled_df["body"].tolist()

Other Sentiments

We then initiate the model:

Python
import torch
# Check if a GPU is available
device = 0 if torch.cuda.is_available() else -1
# download pre-trained emotion classification model
model = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", device=device)

Zero-Shot Text Classification

We first need to truncate our texts to 512 tokens:

Python
# Function to truncate long texts
def truncate_text(text, max_length=512):
    """Truncate text to the specified max length (in tokens)."""
    return text[:max_length]

# Truncate texts if needed
max_length = 512  # Set this to the model's token limit
truncated_texts = [truncate_text(text, max_length) for text in sampled_texts]

# compute the emotion of each tweet using the model
all_texts = truncated_texts

Zero-Shot Text Classification

We can now classify our texts.

Python
import time
from tqdm import tqdm  # For progress bars

# Batch size for processing
batch_size = 10  # Adjust based on your model and system capabilities
total_batches = len(truncated_texts) // batch_size + (len(truncated_texts) % batch_size != 0)

# Initialize a list to store all emotions
all_emotions = []

# Start processing with a timer and progress bar
start_time = time.time()
print("Starting emotion analysis...")

# Use tqdm for a progress bar
for i in tqdm(range(0, len(truncated_texts), batch_size), desc="Processing batches"):
    # Get the current batch
    batch_texts = truncated_texts[i:i + batch_size]
    
    # Process the batch and extend the results
    all_emotions.extend(model(batch_texts))

# End time
end_time = time.time()

# Print the time taken
print(f"Emotion analysis completed in {end_time - start_time:.2f} seconds.")
Starting emotion analysis...

Processing batches:   0%|          | 0/100 [00:00<?, ?it/s]
Processing batches:   1%|1         | 1/100 [00:00<00:52,  1.90it/s]
Processing batches:   2%|2         | 2/100 [00:00<00:34,  2.87it/s]
Processing batches:   3%|3         | 3/100 [00:00<00:27,  3.48it/s]
Processing batches:   4%|4         | 4/100 [00:01<00:24,  3.85it/s]
Processing batches:   5%|5         | 5/100 [00:01<00:22,  4.18it/s]
Processing batches:   6%|6         | 6/100 [00:01<00:21,  4.31it/s]
Processing batches:   7%|7         | 7/100 [00:01<00:21,  4.32it/s]
Processing batches:   8%|8         | 8/100 [00:02<00:20,  4.46it/s]
Processing batches:   9%|9         | 9/100 [00:02<00:20,  4.47it/s]
Processing batches:  10%|#         | 10/100 [00:02<00:19,  4.61it/s]
Processing batches:  11%|#1        | 11/100 [00:02<00:19,  4.62it/s]
Processing batches:  12%|#2        | 12/100 [00:02<00:19,  4.57it/s]
Processing batches:  13%|#3        | 13/100 [00:03<00:19,  4.45it/s]
Processing batches:  14%|#4        | 14/100 [00:03<00:19,  4.50it/s]
Processing batches:  15%|#5        | 15/100 [00:03<00:19,  4.43it/s]
Processing batches:  16%|#6        | 16/100 [00:03<00:18,  4.46it/s]
Processing batches:  17%|#7        | 17/100 [00:04<00:18,  4.50it/s]
Processing batches:  18%|#8        | 18/100 [00:04<00:18,  4.54it/s]
Processing batches:  19%|#9        | 19/100 [00:04<00:17,  4.54it/s]
Processing batches:  20%|##        | 20/100 [00:04<00:17,  4.56it/s]
Processing batches:  21%|##1       | 21/100 [00:04<00:16,  4.66it/s]
Processing batches:  22%|##2       | 22/100 [00:05<00:17,  4.53it/s]
Processing batches:  23%|##3       | 23/100 [00:05<00:16,  4.63it/s]
Processing batches:  24%|##4       | 24/100 [00:05<00:16,  4.62it/s]
Processing batches:  25%|##5       | 25/100 [00:05<00:15,  4.72it/s]
Processing batches:  26%|##6       | 26/100 [00:05<00:15,  4.71it/s]
Processing batches:  27%|##7       | 27/100 [00:06<00:15,  4.69it/s]
Processing batches:  28%|##8       | 28/100 [00:06<00:15,  4.69it/s]
Processing batches:  29%|##9       | 29/100 [00:06<00:14,  4.74it/s]
Processing batches:  30%|###       | 30/100 [00:06<00:15,  4.65it/s]
Processing batches:  31%|###1      | 31/100 [00:07<00:14,  4.72it/s]
Processing batches:  32%|###2      | 32/100 [00:07<00:14,  4.57it/s]
Processing batches:  33%|###3      | 33/100 [00:07<00:14,  4.63it/s]
Processing batches:  34%|###4      | 34/100 [00:07<00:14,  4.68it/s]
Processing batches:  35%|###5      | 35/100 [00:07<00:14,  4.61it/s]
Processing batches:  36%|###6      | 36/100 [00:08<00:13,  4.57it/s]
Processing batches:  37%|###7      | 37/100 [00:08<00:13,  4.52it/s]
Processing batches:  38%|###8      | 38/100 [00:08<00:13,  4.49it/s]
Processing batches:  39%|###9      | 39/100 [00:08<00:14,  4.31it/s]
Processing batches:  40%|####      | 40/100 [00:09<00:15,  3.86it/s]
Processing batches:  41%|####1     | 41/100 [00:09<00:15,  3.93it/s]
Processing batches:  42%|####2     | 42/100 [00:09<00:14,  4.05it/s]
Processing batches:  43%|####3     | 43/100 [00:09<00:13,  4.14it/s]
Processing batches:  44%|####4     | 44/100 [00:10<00:13,  4.25it/s]
Processing batches:  45%|####5     | 45/100 [00:10<00:12,  4.34it/s]
Processing batches:  46%|####6     | 46/100 [00:10<00:12,  4.32it/s]
Processing batches:  47%|####6     | 47/100 [00:10<00:12,  4.29it/s]
Processing batches:  48%|####8     | 48/100 [00:11<00:12,  4.11it/s]
Processing batches:  49%|####9     | 49/100 [00:11<00:12,  4.03it/s]
Processing batches:  50%|#####     | 50/100 [00:11<00:11,  4.19it/s]
Processing batches:  51%|#####1    | 51/100 [00:11<00:11,  4.24it/s]
Processing batches:  52%|#####2    | 52/100 [00:11<00:11,  4.26it/s]
Processing batches:  53%|#####3    | 53/100 [00:12<00:10,  4.39it/s]
Processing batches:  54%|#####4    | 54/100 [00:12<00:10,  4.36it/s]
Processing batches:  55%|#####5    | 55/100 [00:12<00:10,  4.44it/s]
Processing batches:  56%|#####6    | 56/100 [00:12<00:09,  4.46it/s]
Processing batches:  57%|#####6    | 57/100 [00:13<00:09,  4.60it/s]
Processing batches:  58%|#####8    | 58/100 [00:13<00:09,  4.59it/s]
Processing batches:  59%|#####8    | 59/100 [00:13<00:09,  4.53it/s]
Processing batches:  60%|######    | 60/100 [00:13<00:08,  4.61it/s]
Processing batches:  61%|######1   | 61/100 [00:13<00:08,  4.54it/s]
Processing batches:  62%|######2   | 62/100 [00:14<00:08,  4.62it/s]
Processing batches:  63%|######3   | 63/100 [00:14<00:07,  4.68it/s]
Processing batches:  64%|######4   | 64/100 [00:14<00:07,  4.66it/s]
Processing batches:  65%|######5   | 65/100 [00:14<00:07,  4.64it/s]
Processing batches:  66%|######6   | 66/100 [00:15<00:07,  4.57it/s]
Processing batches:  67%|######7   | 67/100 [00:15<00:07,  4.50it/s]
Processing batches:  68%|######8   | 68/100 [00:15<00:06,  4.65it/s]
Processing batches:  69%|######9   | 69/100 [00:15<00:06,  4.51it/s]
Processing batches:  70%|#######   | 70/100 [00:15<00:06,  4.48it/s]
Processing batches:  71%|#######1  | 71/100 [00:16<00:06,  4.52it/s]
Processing batches:  72%|#######2  | 72/100 [00:16<00:06,  4.41it/s]
Processing batches:  73%|#######3  | 73/100 [00:16<00:06,  4.45it/s]
Processing batches:  74%|#######4  | 74/100 [00:16<00:05,  4.46it/s]
Processing batches:  75%|#######5  | 75/100 [00:17<00:05,  4.55it/s]
Processing batches:  76%|#######6  | 76/100 [00:17<00:05,  4.57it/s]
Processing batches:  77%|#######7  | 77/100 [00:17<00:05,  4.54it/s]
Processing batches:  78%|#######8  | 78/100 [00:17<00:04,  4.52it/s]
Processing batches:  79%|#######9  | 79/100 [00:17<00:04,  4.55it/s]
Processing batches:  80%|########  | 80/100 [00:18<00:04,  4.54it/s]
Processing batches:  81%|########1 | 81/100 [00:18<00:04,  4.57it/s]
Processing batches:  82%|########2 | 82/100 [00:18<00:03,  4.63it/s]
Processing batches:  83%|########2 | 83/100 [00:18<00:03,  4.43it/s]
Processing batches:  84%|########4 | 84/100 [00:19<00:03,  4.40it/s]
Processing batches:  85%|########5 | 85/100 [00:19<00:03,  4.37it/s]
Processing batches:  86%|########6 | 86/100 [00:19<00:03,  4.38it/s]
Processing batches:  87%|########7 | 87/100 [00:19<00:02,  4.36it/s]
Processing batches:  88%|########8 | 88/100 [00:19<00:02,  4.50it/s]
Processing batches:  89%|########9 | 89/100 [00:20<00:02,  4.56it/s]
Processing batches:  90%|######### | 90/100 [00:20<00:02,  4.60it/s]
Processing batches:  91%|#########1| 91/100 [00:20<00:01,  4.67it/s]
Processing batches:  92%|#########2| 92/100 [00:20<00:01,  4.72it/s]
Processing batches:  93%|#########3| 93/100 [00:20<00:01,  4.67it/s]
Processing batches:  94%|#########3| 94/100 [00:21<00:01,  4.50it/s]
Processing batches:  95%|#########5| 95/100 [00:21<00:01,  4.46it/s]
Processing batches:  96%|#########6| 96/100 [00:21<00:00,  4.59it/s]
Processing batches:  97%|#########7| 97/100 [00:21<00:00,  4.63it/s]
Processing batches:  98%|#########8| 98/100 [00:22<00:00,  4.68it/s]
Processing batches:  99%|#########9| 99/100 [00:22<00:00,  4.52it/s]
Processing batches: 100%|##########| 100/100 [00:22<00:00,  4.56it/s]
Processing batches: 100%|##########| 100/100 [00:22<00:00,  4.44it/s]
Emotion analysis completed in 22.56 seconds.

Zero-Shot Text Classification

And now we can attach the labels:

Python
# Step 7: Extract labels and scores
labels = [emotion["label"] for emotion in all_emotions]
scores = [emotion["score"] for emotion in all_emotions]

# Step 8: Create a new DataFrame with the results
sampled_df["emotion_label"] = labels
sampled_df["emotion_score"] = scores
sampled_df["emotion_label"].unique()
array(['surprise', 'neutral', 'sadness', 'fear', 'joy', 'anger',
       'disgust'], dtype=object)

Mapping Anger and Disgust over Time

We can add anger and disgust together.

# Step 1: Add a binary column for aggression
sampled_df["is_aggressive"] = sampled_df["emotion_label"].isin(["anger", "disgust"])

# Step 2: Add a numeric column for aggression scores (optional, if you want weighted metrics)
sampled_df["aggression_score"] = sampled_df["is_aggressive"] * sampled_df["emotion_score"]
sampled_df["document_id"] = range(1, len(sampled_df) + 1)

Mapping Anger and Disgust over Time

And here is how we calculate emotions per year:

import numpy as np
import pandas as pd
from scipy import stats

# Group by year and calculate total speeches, mean sentiment, and standard deviation
trends = (
    sampled_df
    .groupby(["year"], as_index=False)
    .agg(
        total_speeches=('document_id', 'count'),     # Total number of speeches
        total_aggressive_speeches=('is_aggressive', 'sum'),  # Count of aggressive speeches
        mean_aggression_score=('is_aggressive', 'mean'),  # Mean aggression score
        sd_aggression_score=('is_aggressive', 'std')      # Standard deviation of aggression scores
    )
)

Sentiment Analysis

This is how we now calculate sentiment by year and gender.

# Use total speeches as the sample size (n)
trends["n"] = trends["total_speeches"]

# Calculate standard error (SE) of the mean sentiment
trends["se"] = trends["sd_aggression_score"] / np.sqrt(trends["n"])

# Set confidence level for the confidence intervals
confidence_level = 0.95
t_value = stats.t.ppf(1 - (1 - confidence_level) / 2, df=trends["n"] - 1)

# Calculate confidence intervals for the mean sentiment
trends["ci_lower"] = trends["mean_aggression_score"] - t_value * trends["se"]
trends["ci_upper"] = trends["mean_aggression_score"] + t_value * trends["se"]

trends["normalized_aggression_score"] = trends["total_aggressive_speeches"] / trends["total_speeches"]

Sentiment Analysis

This is how mean sentiment varies by year.

Conclusion

Sentiment Analysis is a powerful tool for extracting emotional tone and classifying text sentiment

Methods:

  • Lexicon-based approaches are straightforward but less precise.
  • Machine Learning models offer improved accuracy but require substantial training data.
  • Transformer-based models like BERT provide state-of-the-art results.

Advanced models like DistilRoBERTa enable nuanced emotion classification beyond simple polarity.