L: Designing Projects using Text Analysis

Bogdan G. Popescu

John Cabot University

Agenda

Tips for Final Project

  • how to find a topic
  • should you start with data or with question?
  • where to find data

Essentials of Scraping Websites

Ideas for Projects

Intro

The final project is your chance to shine — to apply everything you’ve learned in this course and explore a text-based research question that excites you.

This is your opportunity to move from learning methods to actually doing research with them.

  • Use at least one Text-as-Data method studied in this course
  • Formulate and answer a clear research question
  • Demonstrate your ability to use and interpret the method

Criteria for Grading

  • Clear and Answerable question
  • Description of the selected method
  • Description of the Dataset used in the analysis
  • Implementation of the chosen method
  • Accurate interpretation of the method output
  • Attempts to validate the approach
  • Code Quality
  • Additional points for originality and ambition

How to Find a Question

Read papers that use text analysis
Think about what excites you in text analysis
You can start with a question or a dataset

What Should Come First?

Questions or Data? Which one comes first?

Both approaches are valid — here’s a quick comparison:

Start with a Question

Advantages

  • Likely to lead to more interesting questions
  • Less time dedicated to exploring different methods
  • The question will guide the analysis

Disadvantages

  • Likely a lot of time spent collecting data
  • Data may not exist in an easily accessible format
  • You will need to spend time collecting data.

What Should Come First?

Questions or Data? Which one comes first?

Both approaches are valid — here’s a quick comparison:

Start with Data

Advantages

  • You will realize soon if there are any findings worth discussing
  • You won’t spend extensive amounts of time trying to find data

Disadvantages

  • The research questions may be less interesting

How to Design a Strong Project

  1. Explain what is the concept that you are trying to measure.
  • are you measuring agressions, hatred, use of scientific facts?
  • is the way you measure the concept new?
  1. Validate your measure
  • handcode a few documents to show that the quantitative text measure is similar to the way a human would code the text
  • show some validity checks: does the measure correlate with events in a way that makes sense?
  1. Use your measure to show variation (over time, by geography, etc.)

How to Find a Research Questions

You can pick up on research ideas from academic articles.

  • you can pick up technique especially if you examine the replication files
  • however, your questions may not be the most original

Journals

  • American Journal of Political Science
  • American Political Science Review
  • Journal of Politics
  • British Journal of Political Science

Magazines

  • The Economist
  • The Financial Times
  • Podcasts

Existing Datasets

Example: Harvard Dataverse

The Harvard Dataverse is a data and code repository for many social science journals.

This is an excellent source of data for your projects!

It will sometimes take some time to find files in the different repositories.

Existing Datasets

Example: Harvard Dataverse

Existing Datasets

Example: Harvard Dataverse

Existing Datasets

Example: Harvard Dataverse

Existing Datasets

Example: Harvard Dataverse