L: Designing Projects using Text Analysis
Bogdan G. Popescu
John Cabot University
Agenda
Tips for Final Project
- how to find a topic
- should you start with data or with question?
- where to find data
Essentials of Scraping Websites
Ideas for Projects
Intro
The final project is your chance to shine — to apply everything you’ve learned in this course and explore a text-based research question that excites you.
This is your opportunity to move from learning methods to actually doing research with them.
- Use at least one Text-as-Data method studied in this course
- Formulate and answer a clear research question
- Demonstrate your ability to use and interpret the method
Criteria for Grading
- Clear and Answerable question
- Description of the selected method
- Description of the Dataset used in the analysis
- Implementation of the chosen method
- Accurate interpretation of the method output
- Attempts to validate the approach
- Code Quality
- Additional points for originality and ambition
How to Find a Question
Read papers that use text analysis
Think about what excites you in text analysis
You can start with a question or a dataset
What Should Come First?
Questions or Data? Which one comes first?
Both approaches are valid — here’s a quick comparison:
Start with a Question
Advantages
- Likely to lead to more interesting questions
- Less time dedicated to exploring different methods
- The question will guide the analysis
Disadvantages
- Likely a lot of time spent collecting data
- Data may not exist in an easily accessible format
- You will need to spend time collecting data.
What Should Come First?
Questions or Data? Which one comes first?
Both approaches are valid — here’s a quick comparison:
Start with Data
Advantages
- You will realize soon if there are any findings worth discussing
- You won’t spend extensive amounts of time trying to find data
Disadvantages
- The research questions may be less interesting
How to Design a Strong Project
- Explain what is the concept that you are trying to measure.
- are you measuring agressions, hatred, use of scientific facts?
- is the way you measure the concept new?
- Validate your measure
- handcode a few documents to show that the quantitative text measure is similar to the way a human would code the text
- show some validity checks: does the measure correlate with events in a way that makes sense?
- Use your measure to show variation (over time, by geography, etc.)
How to Find a Research Questions
You can pick up on research ideas from academic articles.
- you can pick up technique especially if you examine the replication files
- however, your questions may not be the most original
Journals
- American Journal of Political Science
- American Political Science Review
- Journal of Politics
- British Journal of Political Science
Magazines
- The Economist
- The Financial Times
- Podcasts
Existing Datasets
Example: Harvard Dataverse
The Harvard Dataverse is a data and code repository for many social science journals.
This is an excellent source of data for your projects!
It will sometimes take some time to find files in the different repositories.
Existing Datasets
Example: Harvard Dataverse
Existing Datasets
Example: Harvard Dataverse
Existing Datasets
Example: Harvard Dataverse
Existing Datasets
Example: Harvard Dataverse
![]()