Twitalectology, in short, was a project which uses Twitter posts to create dialect maps. Linguists have been interested in studying how dialects vary and evolve for decades, but studying dialects on a large scale has traditionally required years of travel, data collection, and analysis. Twitalectology’s goal was to make this process much quicker and simpler for certain dialectal features, such as words and phrases.
Twitalectology was originally conducted in 2011; since then, several researchers have gone on to conduct far more exhaustive and extensive studies of language use on Twitter. If Twitalectology interests you, you may also be interested in the work of Jack Grieve, Jacob Eisenstein, Tyler Schnoebelen, and Allison Shapp.
- Python script (with installation guidelines) for Twitalectology
- Slides for the debut of Twitalectology, presented at the 2012 American Dialect Society annual meeting
- “Examining Regional Variation Through Online Geotagged Corpora“, my 2013 graduate thesis on Twitalectology and its results
- “Regional English, Tweet by Tweet“, New York Times
- “American dialects from A to Z“, Boston Globe
- “#Soda or #Pop? Regional Language Quirks Get Examined on Twitter“, TIME
- “Linguistics of Food“, Good Food (KCRW radio)
- “Soda, pop or Coke? Words Northwest natives use“, Seattle Post-Intelligencer
In the interactive maps below, the decimal in “Value” is the percentage of the dominant variant. For example, if San Francisco, CA has a Value of 0.85 (or 1.85), the dominant variant for that city is used in 85% of the relevant tweets. A Value of X.00 is equivalent to 100%.
Map #1: soft drinks
- Blue: soda
- Red: coke
- Yellow: pop
Map #2: ‘hella’/’very’
- Blue: 0-20% ‘hella’ usage (in comparison to ‘very’)
- Purple: 20-40%
- Red: 40-60%
- Green: 60-80%
- Yellow: 80-100%
Map #3: ‘needs X-ed’
- Red: needs X-ed
- Yellow: needs X-ing/needs to be X-ed
If you have any questions about this methodology or the maps, shoot me an email!