L19: Language Complexity

Bogdan G. Popescu

John Cabot University

Complexity

One important aspect of text is the complexity of language used in a corpus of text.

On the most basic level, we can have a measure of linguistic diversity - Type-to-Token Ratio (TTR)

\[ TTR = \frac{\textrm{Number of Types (V)}}{\textrm{Number of Tokens (N)}} \]

Complexity

Low Type-to-Token Ratio

Example: “Democracy is the best system, the best system, the best system for governance.”

  • Tokens: 11
  • Types: 7
  • TTR: \(\frac{7}{11}\approx 0.64\)

High Type-to-Token Ratio

Example: “Democracy is an excellent, effective, and universally admired system of governance.”

  • Tokens: 12
  • Types: 12
  • TTR: \(\frac{12}{12} = 1\)

Caveats associated with Complexity

Lexical diversity is sensitive to document length

Shorter texts display fewer word repetitions.

Longer texts may introduce additional subjects, which will include lexical richness.

Implementation of Complexity: Flesch Reading Ease

We can use the flesch_reading_ease to calculate language complexity

Python
import pandas as pd
from textstat import flesch_reading_ease

# Create a DataFrame with two sentences
df = pd.DataFrame({
    'text': [
        "Democracy is government of, by, and for the people.",
        "Democracy has multifaceted mechanisms to facilitate the distribution of power."
    ]})

# Calculate Flesch Reading Ease score
df['flesch_score'] = df['text'].apply(flesch_reading_ease)

text flesch_score
Democracy is government of, by, and for the people. 62.34
Democracy has multifaceted mechanisms to facilitate the distribution of power. -6.36

Complexity: Flesch Reading Ease

The Flesch Reading Ease is calculated as:

\[ \text{Score} = 206.835 - 1.015 \left( \frac{\text{Total Words}}{\text{Total Sentences}} \right) - 84.6 \left( \frac{\text{Total Syllables}}{\text{Total Words}} \right) \]

\(\text{Total Words}\): The total number of words in the text.

\(\text{Total Sentences}\): The total number of sentences in the text.

\(\text{Total Syllables}\): The total number of syllables in all words of the text.

\(\frac{\text{Total Words}}{\text{Total Sentences}}\): The average sentence length.

\(\frac{\text{Total Syllables}}{\text{Total Words}}\): The average number of syllables per word

Complexity: Flesch Reading Ease

The Flesch Reading Ease is calculated as:

\[ \text{Score} = 206.835 - 1.015 \left( \frac{\text{Total Words}}{\text{Total Sentences}} \right) - 84.6 \left( \frac{\text{Total Syllables}}{\text{Total Words}} \right) \]

The numbers come from Flesch, 1928:

  • asked high-school students to read a set of texts and ask them some comprehension question
  • calculated the average grade of the students who correctly answer 75%+ of the questions
  • transformed the average grades to the 0-100 scale
  • regressed these scores on the sentence- and word-length variables

Readibility of Speeches Over Time

Here is how we can calculate average complexity over time:

Python
# Load the dataset
file_path = "/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week8/lecture8b/data/aggression_texts.csv"
aggression_texts = pd.read_csv(file_path)
# Calculate flesch_score
aggression_texts['flesch_score'] = aggression_texts['body'].apply(flesch_reading_ease)
# Group by the 'year' column and calculate the average Flesch Reading Ease score
average_complexity_by_year = aggression_texts.groupby('year')['flesch_score'].mean().reset_index()
# Rename columns for clarity
average_complexity_by_year.columns = ['year', 'average_flesch_score']
See code
R
library(dplyr)
library(ggplot2)
result_df <- reticulate::py$average_complexity_by_year
# Create the line plot
ggplot(result_df, aes(x = year, y = average_flesch_score)) +
  geom_line() +
  geom_point() +  # Adds points for clarity
  labs(
    title = "Average Flesch Reading Ease Score by Year",
    x = "Year",
    y = "Average Flesch Score")+
  theme_bw()

Readibility of Speeches Over Time

Here is how we can calculate average complexity over time:

Python
from textstat import flesch_kincaid_grade
import scipy.stats as stats

# Load the dataset
file_path = "/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/text_analysis/week8/lecture8b/data/aggression_texts.csv"
aggression_texts = pd.read_csv(file_path)
# Calculate flesch_score
aggression_texts['flesch_score'] = aggression_texts['body'].apply(flesch_kincaid_grade)
# Group by 'year' and 'gender' and calculate mean and confidence intervals
def calculate_confidence_interval(series):
    n = len(series)
    mean = series.mean()
    sem = stats.sem(series)  # Standard error of the mean
    ci_range = stats.t.interval(0.95, df=n-1, loc=mean, scale=sem) if n > 1 else (mean, mean)
    return pd.Series({'mean': mean, 'ci_lower': ci_range[0], 'ci_upper': ci_range[1]})

average_complexity_with_ci = (
    aggression_texts.groupby(['year', 'gender'])['flesch_score']
    .apply(calculate_confidence_interval)
    .reset_index()
)
## Group by the 'year' column and calculate the average Flesch Reading Ease score
# Pivot the DataFrame
df_transposed = average_complexity_with_ci.pivot(index=["year", "gender"], columns="level_2", values="flesch_score").reset_index()
See code
R
library(dplyr)
library(broom)
result_df <- reticulate::py$df_transposed

# Create the line plot
ggplot(result_df, aes(x = year, y = mean, group = gender, color=gender)) +
  geom_line() +
  geom_point() +  # Adds points for clarity
    geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper, fill = gender),
    alpha = 0.2,  # Transparency for the confidence interval
    color = NA     # Removes outline of the ribbon
  ) +
  labs(
    title = "Average Flesch Reading Ease Score by Year",
    x = "Year",
    y = "Average Flesch Score")+
  theme_bw()+
    scale_x_continuous(
    breaks = seq(min(result_df$year), max(result_df$year), by = 2))

Readibility of Speeches Over Time

Here is how it compares to the original

See code
R
library(dplyr)
library(broom)
result_df <- reticulate::py$df_transposed
ggplot(result_df, aes(x = year, y = mean, group = gender, color=gender)) +
  geom_line() +
  geom_point() +  # Adds points for clarity
    geom_ribbon(
    aes(ymin = ci_lower, ymax = ci_upper, fill = gender),
    alpha = 0.2,  # Transparency for the confidence interval
    color = NA     # Removes outline of the ribbon
  ) +
  labs(
    title = "Average Flesch Reading Ease Score by Year",
    x = "Year",
    y = "Average Flesch Score")+
  theme_bw()+
    scale_x_continuous(
    breaks = seq(min(result_df$year), max(result_df$year), by = 2))

Readibility of Speeches Over Time

Here are some interesting insights

Readibility of Speeches Over Time

What if we calculate language complexity of different politicians?

name average_flesch_score speech_count
Mr Roger Casale 17.874 5
Mr James Molyneaux 21.24 5
Martin Salter 25.099 10
Mr Jerry Wiggin 27.156667 3
Mr Peter Hardy 27.277143 7
Michael Dugher 29.353333 3
Mohammad Sarwar 31.89 2
Helen Southworth 32.3725 4
Mr Tom King 32.726667 3
Helen Hayes 33.24 2

Let us look at one example of speech from the top candidate with at least two speeches:

Will the right honourable Gentleman give way?

name average_flesch_score speech_count
Mr Irvine Patnick 98.72 2
Alan Campbell 88.013333 3
James Berry 85.03 2
Mr Jamie Cann 84.68 2
Mr Edward Heath 83.84 4
Mr David Ashby 83.456667 3
Ms Candy Atherton 83.006667 3
Mr Andy King 82.29 2
Dr Rhodes Boyson 82.04 2
Mr Piers Merchant 81.85 8

Let us look at one example of speech from the top candidate with at least two speeches:

It's being so cheerful keeps you going.

Readibility of Speeches Over Time

What if we calculate language complexity of different politicians?

name average_flesch_score speech_count
Mr Roger Casale 17.874 5
Mr James Molyneaux 21.24 5
Martin Salter 25.099 10
Mr Jerry Wiggin 27.156667 3
Mr Peter Hardy 27.277143 7
Michael Dugher 29.353333 3
Mohammad Sarwar 31.89 2
Helen Southworth 32.3725 4
Mr Tom King 32.726667 3
Helen Hayes 33.24 2

Let us look at one example of speech from the top candidate with at least two speeches:

I am delighted to contribute briefly to the debate, although I am sorry that I am unable to be in my constituency of Wimbledon with the hundreds of thousands of British tennis fans who are celebrating the outstanding victory of Mr. Tim Henman. We all send our congratulations to him, and wish him well in the semi-final. However, my absence from Wimbledon is more than made up for by my being here as the Labour Member of Parliament for Wimbledon working with the Government and winning so convincingly the argument on Europe - so convincingly in fact that many Conservative Members are no longer present.  In the last Parliament I was a member of the Select Committee on European Scrutiny, whose 17th report, published during the 1999-2000 Session, is one of the relevant documents listed on the Order Paper. During that Session, the Committee conducted, on behalf of the House, an extensive investigation of the intergovernmental conference that led to the Nice treaty, and of the way in which the British Government had conducted themselves during the negotiations. Our conclusions appear in the report.  I want to challenge - I think the report will back up my challenge - the view expressed by many Opposition Members, notably the shadow Foreign Secretary, that the institutional reforms decided at Nice were not a necessary precursor of EU enlargement. Those who study the report - I urge Members to do so, although it seems that the shadow Foreign Secretary has not - will note our recommendation that the scope of the IGC that led to the treaty should be restricted to the practical reforms that we considered necessary to pave the way for enlargement. We did so exactly because we saw those changes as a precondition of enlargement, and we wanted enlargement to take place. We argued, successfully, that the Government should resist pressure from the European Commission and the European Parliament to widen the IGC's scope, in order not to jeopardise the chance of achieving the real, practical reforms that would allow enlargement.  The shadow Foreign Secretary suggested that the EU was pursuing some starry-eyed federalist agenda through the IGC, but if he reads our report he will see that the Government's focus was firmly on the ground beneath our feet, and on the practical steps that were necessary for the widening of EU membership.  Let us remember what constituted those limited but significant changes to the mechanics of the EU decision-making processes. There were changes to the size of the Commission: it was right for us to drop one of our two Commissioners if other countries that would join the EU in future were not to have a Commissioner at all. There were also changes to the weighting of votes. Now that those changes have been accepted, the three largest states, if opposed, can constitute a blocking majority. A further safeguard - an important democratic safeguard - was established, in the form of a supplementary condition that the member states constituting a majority should represent 62 per cent. of the population of the EU.  Those changes may be technical, and they may not be easily understood by those who are not aficionados of the EU and its evolving institutional architecture; but, notwithstanding what we have heard from those on the Opposition Front Bench, they were essential if we were to prepare for a potential membership of 27 rather than the present 15 EU states.  Behind the rather bald statement of  the right honourable Member for Horsham   , which flies in the face of the facts - behind the stubborn refusal to acknowledge the reality of what was achieved and decided at Nice - lies a vision of Europe, and Britain's relationship with Europe, that would take the country back half a century or more. I must tell Opposition Members who will oppose ratification tonight that European integration has always been about much more than the creation of a free-trade area or a common market for goods and services, and it is about much more than that today.  The Union of which we are a member has never been devoid of political content; nor does the European marketplace lack a social foundation. Not a single existing member of the European Union would want that to be the case, and certainly no applicant would wish to join the kind of Union that the right honourable Member for Horsham and his honourable Friends wish to create. Nor do I believe that any European member state or applicant state would believe that the EU described by Opposition Members could deliver the sort of practical improvements on which continued public confidence in the EU depends. However, the Conservative party has moved so far to the right on Europe that it has put itself well outside the mainstream of European Christian democracy. It would be difficult to name another party, let alone another Government, in the EU that would agree with the vision of Europe, and of its future, that the Conservative party is trying to put forward.  We know that the vision put forward by the Tories was flatly rejected by the British people on  7 June . We also know that it has not gone down well with public opinion in other member states. When I researched this speech in the Library this afternoon, it was interesting to read the press comments that appeared in Germany, France and Italy at the time of the election about the Conservative party and the virulent form of Euro-scepticism that it was putting forward. One piece, by Stepfan Berger of Germany's  Frankfurter Allgemeine Zeitung , caught my eye. Published on  6 June , it stated:  "At present, the Tories are a truly sad old bunch: a kind of cabinet of horrors, made up of religious fundamentalists, Euro-sceptics, nationalists and people hostile to foreigners . . . Germanophobes are particularly widespread".  The Conservative party must understand that its ideas about Europe and about Britain's relationship with Europe have been rejected not just by the British people. Labour Members and others must take on those ideas and propose a more confident and positive vision of Europe; otherwise, we risk appearing ridiculous in the eyes of European public opinion and of those European partner nations that we want to influence and with which we want to co-operate.  However, the Tory election manifesto boasted:  "The next Conservative Government will secure our independence and use Britain's great strength to help create a flexible European Union" -   as if a Conservative Government were going to twist the arms of 14 other European nations and take them in a direction that none of them wanted to go.  How were the Tories going to do that? They would do it be refusing to ratify the treaty of Nice. The manifesto stated:  "We will not ratify the treaty of Nice, but will renegotiate it so that Britain does not lose its veto" -   a veto that Britain had already lost as a result of the Maastricht treaty and other treaties to which the previous Conservative Government had signed up.  The right honourable Member for Horsham is not in his seat. However, although it is all very well for him to say that globalisation requires co-operation but not integration, it is not acceptable then to argue that Britain should fail to co-operate when the means for such co-operation exist. The right honourable Gentleman said that he wanted an EU that is stable, prosperous, outward-looking and democratic. He should therefore support the ratification of the treaty of Nice.  The right honourable Member for Horsham said that the French referendum on Maastricht nearly plunged Europe into chaos, but he wants to plunge the EU into chaos by refusing to ratify the treaty of Nice tonight. He should remember that the near-failure of the French to ratify the treaty of Maastricht also catapulted Britain out of the European monetary system.  The Tory party must learn the lessons of the election on Europe. It is fantasy to believe that we can reverse 50 years of European development. It is not credible to claim that we can influence the future of Europe by carping from the sidelines. Before they vote to oppose ratification of the treaty of Nice tonight, Tory Members must take a moment to reflect on the fact that the Nice treaty is a limited treaty that makes the practical reforms necessary for enlargement. To oppose the treaty of Nice would give the impression that the Opposition want little more than to see the EU paralysed by lack of reform, or to block this historic opportunity to unite the two halves of Europe -

name average_flesch_score speech_count
Mr Irvine Patnick 98.72 2
Alan Campbell 88.013333 3
James Berry 85.03 2
Mr Jamie Cann 84.68 2
Mr Edward Heath 83.84 4
Mr David Ashby 83.456667 3
Ms Candy Atherton 83.006667 3
Mr Andy King 82.29 2
Dr Rhodes Boyson 82.04 2
Mr Piers Merchant 81.85 8

Let us look at one example of speech from the top candidate with at least two speeches:

He is in the Strangers' Gallery.

Conclusion

Complexity Analysis:

  • Metrics like Flesch Reading Ease assess readability, offering insights into text difficulty and audience targeting.

  • Speech analysis tracks readability trends and politician-specific language complexity.

  • Quantitative techniques uncover insights into text difficulty and audience alignment.

  • Lexical diversity varies with text length, influencing complexity measurements.