Twitalectology

Welcome to attendees of ADS 2012, and readers of the New York Times and Boston Globe!

Please click here for a copy of my presentation, and here for a copy of my Python script (with installation guidelines). A standalone app (for Windows and Mac OSX) is in the works; if you'd like to help develop this app, shoot me an email below.

If you're interested in reading more about this type of work, you should take a look at David Bamman's Lexicalist and the work of Jacob Eisenstein.

In the interactive maps below, the decimal in "Value" is the percentage of the dominant variant. For example, if San Francisco, CA has a Value of 0.85 (or 1.85), the dominant variant for that city is used in 85% of the relevant tweets. A Value of X.00 is equivalent to 100%. Note that some locations have been improperly geocoded; these are in the process of being removed.

Map #1: soft drinks

Legend:
Blue: soda
Red: coke
Yellow: pop

Map #2: 'hella'/'very'

Legend:
Blue: 0-20% 'hella' usage
Purple: 20-40%
Red: 40-60%
Green: 60-80%
Yellow: 80-100%

Map #3: 'needs X-ed'

Legend:
Red: needs X-ed
Yellow: needs X-ing/needs to be X-ed

If you have any questions about this methodology or the maps, please email me at this address.