Applied Computational Methods for Social Sciences
Course Information
Course Number: P1012
Course Title: Text Analysis
Time: Tue/Fri: 13:10pm-14:50pm
Location: Aulas VII 103
Instructor Details
Instructors: Manuel Toral and Bogdan Popescu
Office: xx
E-Mail: xx
Office Hours: xx
Course Description
This is an introductory course on using Python, a free and versatile programming language, to analyze text data. The course begins by building foundational programming skills in Python, including working with variables, loops, conditional statements, and data structures like lists, dictionaries, and arrays. In the second part, students will explore Python’s powerful tools for analyzing text data, such as social media posts, email correspondence, customer reviews, and political debates. The course also includes an introduction to ChatGPT and prompt engineering, teaching students how to leverage large language models for advanced text analysis tasks.
Through hands-on exercises, lectures, and projects, students will learn to process, clean, and analyze textual data systematically and reproducibly. By the end of the course, participants will be equipped with the skills to derive meaningful insights from textual information and integrate generative AI tools like ChatGPT into their workflows.
Note: You should have a computer either Windows or Mac with at least 8GB of RAM memory.
Learning Outcomes
Upon successful completion of this course the students will be able to:
- Write Python programs to perform tasks such as loops, conditional statements, and function definitions.
- Employ quantitative techniques to process and analyze textual data.
- Utilize Python libraries like Numpy, Pandas, and NLTK for text manipulation and analysis.
- Apply advanced methods such as sentiment analysis, topic modeling, word embeddings, and supervised learning to text data.
- Use ChatGPT and prompt engineering to enhance text analysis tasks, including summarization, classification, and generating structured outputs.
- Create and publish a professional website showcasing their portfolio and analytical capabilities.
Textbook and/or Resource Materials
No textbook required
Assessment
Assessment methods:
You will be graded on four problem sets during the semester (each 12.5% of your grade) and a final report and presentation (each 25% of your grade).
- 4 problem sets: 50% of the final grade (12.5% each)
- final presentation: 25% of the final grade
- final project report: 25% of the final grade
Problem Sets
1.Initial Individual Submission: This component contributes 50% of the overall grade for the problem set. When you complete the problem set independently and submit it to the instructor, your grade for this component will be calculated based on the quality of your independent work. This grade will be weighted at 50% of the total assignment grade.
2.Final Submission After Group Consultation: This component also contributes 50% of the overall grade for the problem sets. After discussing the problem set with your group members and documenting the correct answers, you will submit this revised version individually. Note: no group submission is permitted. Each one of you has to submit the second attempt of the assignment individually. Your grade for this component will be based on the quality of your final submission after group consultation. This grade will also be weighted at 50% of the total assignment grade.
Presentation
Each student will deliver a 10-15 minute presentation on a topic assigned in advance. Presentations should include an overview of the:
- question
- description of the selected text analysis method
- description of the dataset used
- description of how the data was validated
- critical analysis
Please deliver your presentations in a quarto presentation and use relevant visuals. Practice beforehand to stay within the time limit and maintain a confident, professional tone. Be prepared to answer 2-3 questions from peers or the instructor during and after the presentation. Remember to cite your sources and avoid reading verbatim from slides or notes.
In addition to summarizing the key arguments or findings, your presentation should include critical analysis of the material. Highlight what the author does not address, the limitations of their research, or potential problems in their analysis or methodology. Think about how the research could be improved, expanded, or connected to broader themes discussed in class, and incorporate these insights into your presentation.
Final Project
Students will undertake a text analysis project emphasizing the practical application of text analysis techniques using the python programming language. This project offers an opportunity to showcase the acquired skills in manipulating text data and conducting meaningful analyses. Participants are encouraged to choose a topic of interest or relevance, utilizing datasets provided in the course or exploring new ones. The project entails a few steps:
- choosing a topic that involves text data
- acquire data either from the course materials or from external sources
Students have to submit a two-page report in Quarto and give a 20-minute and give an in-class presentation in Quarto.
Grading Criteria for the project:
- Relevance
- Methodology
- Analysis
- Presentation
- Code Quality
Here are examples of projects that previous students created:
Attendance
Students are required to attend classes following the University’s policies. Students with more than three unexcused absences are assumed to have withdrawn from the course. Students with a justified reason not to attend class have to send me an email explaining why they cannot attend ahead of class.
Academic Honesty
As stated in the university catalog, any student who commits an act of academic dishonesty will receive a failing grade on the work in which the dishonesty occurred. In addition, acts of academic dishonesty, irrespective of the weight of the assignment, may result in the student receiving a failing grade in the course. Instances of academic dishonesty will be reported to the Dean of Academic Affairs. A student reported twice for academic dishonesty is subject to summary dismissal from the University. In such a case, the Academic Council will then make a recommendation to the President, who will make the final decision.
Students with Learning Difficulties and other Disabilities
The University does not discriminate based on disability. Students with approved accommodations must inform their professors at the beginning of the term. Please see the website for the complete policy.
Required Books
General Books on Python
McKinney, Wes, 2022. Python for Data Analysis. O’Reilly. https://wesmckinney.com/book/.
Text Analysis
Hovy, Dirk. 2020. Text Analysis in Python for Social Scientists. Cambridge University Press. https://www.cambridge.org/core/elements/abs/text-analysis-in-python-for-social-scientists/BFAB0A3604C7E29F6198EA2F7941DFF3
Bengfort, Benjamin, Bilbro, Rebecca, and Tony Ojeda. 2018. Applied Text Analysis with Python. O’Reilly Media, Inc. https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/
Hvitfeldt, Emil and Silge, Julia. 2022. Supervised Machine Learning for Text Analysis in R. CRC Press. https://smltar.com
Grimmer, Justin, Brandon M. Stewart, and Margaret E. Roberts. Text As Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press, 2022.
Silge, Julia, and David Robinson. Text Mining with R: A Tidy Approach.. First edition. O’Reilly, 2017.