Applied Computational Methods for Social Sciences
Course Information
Course Number: P1012
Course Title: Data Science
Time: Tue/Wed/Fri: 09:00am-11:00am
Location: xx
Instructor Details
Instructors: Manuel Toral, Jorge Juvenal Campos, and Bogdan Popescu
E-Mail: jmtoral@tec.mx; juvenal.campos@tec.mx; bgpopescu@tec.mx Office Hours: xx
Course Description
This is an introductory course in data science for the social sciences and public policy analysis. The course begins with core conceptual foundations, including causality in the social sciences and the use of machine learning as a tool for empirical political and policy analysis. Students will then study predictive methods commonly used in social research, such as linear regression beyond traditional econometrics and classification techniques for categorizing social phenomena, along with sampling methods, model selection, and dimensionality reduction.
In the second part of the course, students are introduced to more flexible modeling approaches, including polynomial regression, splines, and decision trees, as well as Support Vector Machines and unsupervised learning methods for constructing social indicators. Through lectures and hands-on exercises based on real public policy problems, participants will learn to analyze data systematically, evaluate and interpret predictive models, and produce quantitative evidence relevant to contemporary social and political issues.
Through hands-on exercises, lectures, and projects, students will learn to process, clean, and analyze data systematically and reproducibly.
Note: You should have a computer either Windows or Mac with at least 8GB of RAM memory.
Activities
- Individual and team-based research on the different prediction methods discussed in the module.
- Solving exercises, problems, and case studies both individually and collaboratively in order to develop the ability to identify public policy problems or research questions that require accurate prediction of a phenomenon. Once the nature of the problem has been identified, students will select the statistical tools that best fit the data and the research questions in order to carry out the statistical analysis.
- Fieldwork in real-world settings where the challenge involves making accurate predictions of social phenomena, interacting with colleagues and key stakeholders involved in solving the problem.
- Preparation of documentation of the results produced by the computational tool.
Textbook and/or Resource Materials
No textbook required
Assessment
Assessment methods:
You will be graded on four problem sets during the semester (each 12.5% of your grade) and a final report and presentation (each 25% of your grade).
- 4 problem sets: 40% of the final grade (12.5% each)
- midtem: 15% of the final grade
- final exam: 15% of the final grade
- final presentation: 15% of the final grade
- final project report: 15% of the final grade
Problem Sets
1.Initial Individual Submission: This component contributes 50% of the overall grade for the problem set. When you complete the problem set independently and submit it to the instructor, your grade for this component will be calculated based on the quality of your independent work. This grade will be weighted at 50% of the total assignment grade.
2.Final Submission After Group Consultation: This component also contributes 50% of the overall grade for the problem sets. After discussing the problem set with your group members and documenting the correct answers, you will submit this revised version individually. Note: no group submission is permitted. Each one of you has to submit the second attempt of the assignment individually. Your grade for this component will be based on the quality of your final submission after group consultation. This grade will also be weighted at 50% of the total assignment grade.
Presentation
Each student will deliver a 10-15 minute presentation on a topic assigned in advance. Presentations should include an overview of the:
- question
- description of the selected text analysis method
- description of the dataset used
- description of how the data was validated
- critical analysis
Please deliver your presentations in a quarto presentation and use relevant visuals. Practice beforehand to stay within the time limit and maintain a confident, professional tone. Be prepared to answer 2-3 questions from peers or the instructor during and after the presentation. Remember to cite your sources and avoid reading verbatim from slides or notes.
In addition to summarizing the key arguments or findings, your presentation should include critical analysis of the material. Highlight what the author does not address, the limitations of their research, or potential problems in their analysis or methodology. Think about how the research could be improved, expanded, or connected to broader themes discussed in class, and incorporate these insights into your presentation.
Final Project
Students will undertake a text analysis project emphasizing the practical application of text analysis techniques using the python programming language. This project offers an opportunity to showcase the acquired skills in manipulating text data and conducting meaningful analyses. Participants are encouraged to choose a topic of interest or relevance, utilizing datasets provided in the course or exploring new ones. The project entails a few steps:
- choosing a topic that involves text data
- acquire data either from the course materials or from external sources
Students have to submit a two-page report in Quarto and give a 20-minute and give an in-class presentation in Quarto.
Grading Criteria for the project:
- Relevance
- Methodology
- Analysis
- Presentation
- Code Quality
Here are examples of projects that previous students created:
Attendance
Students are required to attend classes following the University’s policies. Students with more than three unexcused absences are assumed to have withdrawn from the course. Students with a justified reason not to attend class have to send me an email explaining why they cannot attend ahead of class.
Academic Honesty
As stated in the university catalog, any student who commits an act of academic dishonesty will receive a failing grade on the work in which the dishonesty occurred. In addition, acts of academic dishonesty, irrespective of the weight of the assignment, may result in the student receiving a failing grade in the course. Instances of academic dishonesty will be reported to the Dean of Academic Affairs. A student reported twice for academic dishonesty is subject to summary dismissal from the University. In such a case, the Academic Council will then make a recommendation to the President, who will make the final decision.
Students with Learning Difficulties and other Disabilities
The University does not discriminate based on disability. Students with approved accommodations must inform their professors at the beginning of the term. Please see the website for the complete policy.
Required Books
General Books on Python
McKinney, Wes, 2022. Python for Data Analysis. O’Reilly. https://wesmckinney.com/book/.
Text Analysis
Hovy, Dirk. 2020. Text Analysis in Python for Social Scientists. Cambridge University Press. https://www.cambridge.org/core/elements/abs/text-analysis-in-python-for-social-scientists/BFAB0A3604C7E29F6198EA2F7941DFF3
Bengfort, Benjamin, Bilbro, Rebecca, and Tony Ojeda. 2018. Applied Text Analysis with Python. O’Reilly Media, Inc. https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/
Hvitfeldt, Emil and Silge, Julia. 2022. Supervised Machine Learning for Text Analysis in R. CRC Press. https://smltar.com
Grimmer, Justin, Brandon M. Stewart, and Margaret E. Roberts. Text As Data: A New Framework for Machine Learning and the Social Sciences. Princeton University Press, 2022.
Silge, Julia, and David Robinson. Text Mining with R: A Tidy Approach.. First edition. O’Reilly, 2017.
Dates:
Week 1 Session 1: 02/10/2025 - Introduction and course overview Session 2: 02/11/2025 - Installing R Quarto files: notebooks vs. presentations Session 3: 02/13/2025 - Practice
Week 2 Session 1: 02/17/2025 - Classes, Vectors, Operations, Matrices, Dataframes Session 2: 02/18/2025 - Importing Data Session 3: 02/20/2025 - Practice
Assignment 1
Week 3 Session 1: 02/24/2025 - Reading Data and Working Directories Session 2: 02/25/2025 - Relational Data, Dates and Joins Session 3: 02/27/2025 - Practice
Resubmit Assignment 1
Week 4 Session 1: 03/03/2025 - Data Viz 1 Session 2: 03/04/2025 - Data Viz 2 Session 3: 03/06/2025 - Practice
Week 5 Session 1: 03/10/2025 - Data Viz 3 Session 2: 03/11/2025 - Practice Session 3: 03/13/2025 - Practice
Assignment 2
Week 7 Session 1: 03/24/2026 - Review for Exam Session 2: 03/25/2026 - Exam Session 3: 03/27/2026 - Intro to Machine Learning
Resubmit Assignment 2
Week 8 04/07/2026 - Intro to Machine Learning: Supervised/Non-Supervised Learning 04/08/2026 - Linear Regression 04/10/2026 - Practice
Week 9 04/14/2026 - Cost Function 1 04/15/2026 - Cost Function 2 04/17/2026 - Practice
Assignment 3
Week 10 04/21 - Gradient Descent 1 04/22 - Gradient Descent 2 04/24 - Practice
Resubmit Assignment 3
Week 11 04/28 - Learning Rate 04/29 - Gradient Descent for Linear Regression 05/01 - no class
Week 13 05/12 - Combining LLM and Machine Learning 05/13 - Combining LLM and Machine Learning 05/15 - Practice
Assignment 4
Week 14 05/19 - Project Proposals 05/20 - Project Proposals 05/22 - Final Exam Review
Resubmit Assignment 4
Week 15 05/26 - Final Exam 05/27 - Project Development 05/29 - Project Development
Week 16 06/02 - Project Development 06/03 - Presentations 06/05 - Presentations
Week 17 06/09 - Presentations 06/10 - Presentations 06/12 - Presentations