Upon successful completion of this course, the students will be able to:
Write Python programs to perform loops, conditional statements, and function definitions.
Employ quantitative techniques to process and analyze textual data.
Utilize Python libraries like Numpy, Pandas, and NLTK for text manipulation and analysis.
Apply advanced methods such as sentiment analysis, topic modeling, word embeddings, and supervised learning to text data.
Use ChatGPT and prompt engineering to enhance text analysis tasks, including summarization, classification, and generating structured outputs.
Create and publish a professional website showcasing their portfolio and analytical capabilities.
Introduction
Jobs where these skills are valued
Data Science and Analytics
Data Scientist: Developing predictive models and conducting advanced analysis with text data.
Text/Language Data Analyst: Extracting insights from text data for business or research.
Introduction
Jobs where these skills are valued
Natural Language Processing (NLP) and AI
NLP Engineer: Working on synthesizing customer reviews, language translation, and sentiment analysis.
AI Prompt Engineer: Designing and optimizing prompts for large language models like ChatGPT.
Research and Academia
Research Scientist: Conduct text-based research in political science, sociology, or computational linguistics.
Academic/Teaching Positions: Teaching Python and text analysis at universities or boot camps.
Introduction
Jobs where these skills are valued
Communication and Technical Writing
Technical Writer: Explain complex computational methods clearly and in a structured manner.
Science Communicator: Building accessible content from technical analyses.
Business and Consulting
Business Analyst: Using Python and text analysis to drive data-driven decision-making.
Consultant: Advising clients on leveraging data and text-based insights.
Logistics
Hours: MW 10:00-11:15AM
Room: Garibaldi Comp Lab-Garibaldi Computer Lab
Office Hours: per appointment
You will use your own laptops (Mac or Windows): ideally you should have 8GB of RAM
Evaluation
Grading
You will be graded on four problem sets during the semester (each worth 12.5% of your grade) and on a final report and presentation (each worth 25%).
4 problem sets: 50% of the final grade (12.5% each)
final presentation: 25% of the final grade
final project report: 25% of the final grade
Evaluation
Grading
Initial Individual Submission
This component contributes 50% of the overall grade for the problem set.
When you complete the problem set independently and submit it to the instructor, your grade for this component will be calculated based on the quality of your independent work
Final Submission After Group Consultation
This component also contributes 50% of the overall grade for the problem sets.
After discussing the problem set with your group members and documenting the correct answers, you will submit this revised version.
Evaluation
Final Project
You will undertake a text analysis project emphasizing the practical application of text analysis techniques using Python programming.
The project entails a few steps:
choose a topic that involves text data
acquire data either from the course materials or from external sources
employ text analysis procedures in Python
craft a well-structured two-page report containing an intro to the problem, objectives, data sources, methodology, results, and conclusion
appendix with the Python code for the text analysis procedures.
Grimmer, Justin, Brandon M. Stewart, and Margaret E. Roberts. Text As Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press, 2022.
Language is at the core of nearly all forms of social interaction:
Laws are written and codified
Political debates shape public opinion.
Historical events are recorded and remembered.
People communicate ideas, emotions, and stories.
However, until recently, analyzing these interactions at scale was nearly impossible.
There are large quantities of text available in the form of digitized books and texts
Text Analysis
Today, thanks to the digitization of vast textual archives—books, articles, comments, and more—we have access to unprecedented amounts of text.
Even better, we now have cutting-edge tools and techniques to make sense of it all.
What is Quantitative Text Analysis?
Quantitative text analysis involves converting text into numbers, assigning values to words and documents, and uncovering hidden meanings.
Our focus will be on identifying latent concepts:
We cannot directly observe abstract ideas, like political ideology, aggression, or sentiment.
Instead, we infer them by analyzing patterns in text.
To achieve this, we’ll develop strategies for assigning meaningful scores to words and documents, enabling us to extract insights from even the most significant text datasets.
Examples of Latent Concepts based on Observed Text
Quantitative text analysis can uncover concepts like:
Aggression in political speeches.
Economic themes in news coverage.
Hate speech in online forums.
Ideological leanings in party platforms.
Quantitative Text Analysis Techniques
1. Dictionaries
Predefined word lists categorize text (e.g., “positive” vs. “negative” words).
Documents are scored based on the presence or absence of these words.
2. Supervised Learning
Words are weighted based on their usage in labeled data (e.g., spam vs. not spam).
Documents are classified by their resemblance to different categories.
3. Text Scaling
Words are given weights reflecting their patterns of use across contexts.
Documents are positioned on scales such as “liberal” to “conservative.”
4. Topic Models
Words are grouped into topics based on co-occurrence patterns.
Documents are represented as combinations of these topics.
Quantitative Text Analysis Techniques
5. World Embedding Models
Words are encoded as vectors that capture their meanings in context.
Documents are analyzed by aggregating the vectors of the words they contain.
Some Cool Things That You Will Learn
Learning how to use ChatGPT OpenAI for Text Analysis
We will learn how to:
do prompt engineering: writing prompts to get the language model to do what we want it to do
provide targeted summaries of customer reviews for relevant company departments: e.g., summarize customer feedback about product design for the company’s design department
decide the sentiment of texts, extract labels, names, locations, topics
translate text and fix grammar mistakes
Some Cool Things That You Will Learn
Example 1: Targeted Summaries
We want more targeted summaries to inform specific departments within companies.
Python
# Assuming longest_review is defined and contains textprompt_template2 ="""Task: Summarize the e-commerce product review below, delimited by triple backticks,\to provide feedback to the engineering department responsible for the product's design.Instructions:1. Create a concise summary in at most 30 words.2. Focus on aspects relevant to the product's design, craftmanship, or engineering.3. If the review does not mention design or engineering, respond with precisely the\the word "None" (without quotes or additional text).Review: ```{review_text_here}```"""
Some Cool Things That You Will Learn
Example 1: Targeted Summaries
This is the original customer feedback on a PS5.
profile_name
rating
rating_date
title
review_text
word_count
0
J.F. Carroll
5.0 out of 5 stars
Reviewed in the United States on May 6, 2024
5.0 out of 5 stars\nIt's a PS5, everyone wants one, was there any doubt?\n
\nNobody could get their hands on this until just recently. It's a PS5, the craftmanship is awesome.Operates well, system works well with store, everything is easy, this console is a masterpiece and I keep it close. Haven't got into too many games yet. It looks sexy as heck, it looks dumb in ads but in real life it's fantastic looking. And you can buy sexy plates for it as well.Controller feels great. Considering the upgrade controller, but I haven't decided. The game choices are phenomenal. It's official: XBOX lost, buy a Playstation.Playstation all the way.\n
95
Some Cool Things That You Will Learn
Example 1: Targeted Summaries
This is what we get:
profile_name
rating
rating_date
title
review_text
word_count
targeted_summary
0
J.F. Carroll
5.0 out of 5 stars
Reviewed in the United States on May 6, 2024
5.0 out of 5 stars\nIt's a PS5, everyone wants one, was there any doubt?\n
\nNobody could get their hands on this until just recently. It's a PS5, the craftmanship is awesome.Operates well, system works well with store, everything is easy, this console is a masterpiece and I keep it close. Haven't got into too many games yet. It looks sexy as heck, it looks dumb in ads but in real life it's fantastic looking. And you can buy sexy plates for it as well.Controller feels great. Considering the upgrade controller, but I haven't decided. The game choices are phenomenal. It's official: XBOX lost, buy a Playstation.Playstation all the way.\n
95
The review praises the PS5's craftsmanship, ease of use, and aesthetics. Positive feedback on controller and game choices. No negative comments on design or engineering.
Some Cool Things That You Will Learn
Example 2: Fixing Grammar Mistakes
We can also use ChatGPT to proofread and fix grammar mistakes.
Python
text_to_fix ="""This is a mock short paragraf to see how ChatGPT performs when it comes to\ grammer and spelling. These is a wrong sentence. Speling is of."""
Some Cool Things That You Will Learn
Example 2: Fixing Grammar Mistakes
Here is the prompt:
Python
prompt =f"""Proofread and correct the following text, whichis delimited with triple backticks.Rewrite the corrected version in a simple string.If you don't find and errors, leave as such.```{text_to_fix}```"""
Some Cool Things That You Will Learn
Example 2: Fixing Grammar Mistakes
And this is the output:
This is a mock short paragraf paragraph to see how ChatGPT performs when it comes to\ ¶ grammer to grammar and spelling. These There is a wrong sentence. Speling Spelling is of.off.
Some Cool Things That You Will Learn
Example 3: Comparing speeches by politicians
Who are the politicians who have speeches most similar to Boris Johnson?
Some Cool Things That You Will Learn
Example 4: Wordclouds
How to create wordclouds?
Some Cool Things That You Will Learn
Example 5: Language Complexity
How do we measure language complexity?
Some Cool Things That You Will Learn
Example 6: Measuring Aggression
Have women become more aggressive in their political communication over time?
Some Cool Things That You Will Learn
Example 7: Topics in Speeches
What does Boris Johnson talk about in his political speeches?