Blog RSS Feed

Archive for the ‘Twitter’ Category

Delving into Lent Data

Sunday, March 7th, 2010

Let’s look a little more at some of the data on what Twitterers are giving up for Lent.

Categories of Things Given up by Location

As I only track in English what people are giving up, there are concentrations in English-speaking countries.

Categories by Country
Size indicates the relative number of Twitterers in each country giving up something for Lent.

Categories by Location

Categories of Things Given up by State

These visualizations show the differences (or lack thereof) in what people are giving up among U.S. states.

Categories by State
Size indicates the relative number of Twitterers in each state giving up something for Lent. Sorry, Alaska and Hawaii.

Categories by State (%)
The composition of each state’s categories of tweets shows mostly minor variations among states. Some states (like Wyoming on the far right) have small numbers of tweets. I would have liked to use opacity or width to indicate this disparity but couldn’t figure out how to do it.

Comparison between 2009 and 2010

This treemap shows how the data changed between 2009 and 2010. The size of the box shows the number of people giving up each category and thing, while color indicates the percentage change from last year: dark blue indicates the steepest drop; dark orange indicates the steepest rise. The second chart shows the same data more conventionally expressed.

Categories and Terms: Term Changes: 2009-2010

Categories and Terms: Term Changes: 2009-2010

About the Visualizations

I created these charts mostly to explore how the new data-analysis software Tableau Public works. One of its claims to fame is that you can publish interactive visualizations to the web, a feature I didn’t take advantage of here. Tableau doesn’t do treemaps, so I used Many Eyes to create the treemap; the closest Tableau equivalent appears below the treemap.

What Twitterers Are Giving up for Lent (2010 Edition)

Tuesday, February 23rd, 2010

The top 100 things that Twitterers are giving up for Lent in 2010.

Snow makes the list this year, understandable given the Snowpocalypse and Snowmageddon that gripped much of the Eastern U.S. in the weeks preceding Ash Wednesday. IPods also made the list after the Bishop of Liverpool asked people to consider praying instead of listening to them. This year a celebrity, Justin Bieber, cracks the top 100. He beat out the Jonas Brothers, 64 votes to 11; draw your own conclusions.

The list largely tracks last year’s list. It draws from 40,000 tweets retrieved February 14-20, 2010.

Complete List of the Top 100

Rank Word Count Change from last year’s rank
1. Twitter 2089 +1
2. Facebook 1874 -1
3. Chocolate 1323 0
4. Alcohol 1258 +1
5. Swearing 1158 +5
6. Soda 1126 0
7. Lent 792 -3
8. Meat 720 0
9. Sex 701 +7
10. Fast food 695 +7
11. Sweets 627 0
12. Coffee 445 -5
13. iPod 437  
14. Candy 325 +18
15. Religion 305 -6
16. Catholicism 264 -4
17. Smoking 254 +5
18. Junk food 251 +34
19. Giving up things 241 -6
20. Beer 241 -5
21. Chips 234 +24
22. You 233 +13
23. Stuff 217 -3
24. Fried food 199 +33
25. Red meat 193 +19
26. Bread 187 +13
27. Sugar 183 -8
28. Work 176 -14
29. Shopping 174 +11
30. Food 162 -7
31. Shame 150  
32. Social networking 147 -2
33. Caffeine 136 -6
34. Rice 136 +44
35. Procrastination 127 -11
36. Internet 126 -11
37. Cheese 120 +1
38. Coke 120 +41
39. Starbucks 119 +14
40. School 118 +36
41. Ice cream 118 +13
42. Booze 117 -21
43. Texting 114 +28
44. Masturbation 111  
45. Cookies 110 +11
46. TV 97 -18
47. Christianity 96 0
48. Snow 96  
49. Wine 92 -13
50. Pizza 91 +12
51. MySpace 91 +4
52. Men 90 +31
53. Giving up 89 -19
54. Sobriety 89 -13
55. Liquor 87  
56. Desserts 87  
57. Lint 87 -20
58. Pancakes 82 -29
59. Homework 81 +28
60. Marijuana 80  
61. Diet Coke 80 -28
62. Hope 78 +15
63. Virginity 76  
64. French fries 75 -15
65. Laziness 71 +5
66. Boys 67  
67. Nothing 67 -19
68. Carbs 66 -4
69. Justin Bieber 64  
70. Pork 64  
71. Porn 63 +9
72. Me 62 0
73. Sleep 61 -42
74. Complaining 58 -16
75. Eating out 58 -8
76. Jesus 55 -26
77. McDonald’s 55  
78. Beef 54 +18
79. Church 54 +6
80. God 53 -21
81. Abstinence 53 -39
82. Cake 52  
83. Negativity 52  
84. Him 49  
85. Juice 47  
86. Celibacy 44 +13
87. Chicken 42  
88. Lying 42  
89. New Year’s resolutions 42 -29
90. Sarcasm 42 -39
91. Snacking 41  
92. My wife 39  
93. Tea 37  
94. iPhone 37  
95. Exercise 36 -6
96. Sweet tea 35  
97. People 35  
98. Vegetables 34  
99. Pasta 33  
100. Self control 33  

Image created using Wordle.

New Feature: Search for Bible Verses on Twitter

Monday, November 30th, 2009

Search over 1.2 million Bible verses on Twitter–nearly every tweet that has mentioned a Bible verse since April 2009. You can also see a list of the most popular verses on Twitter over the past few hours (“Trending Verses”).

Search Bible verses on Twitter.

This project uses several APIs from Twitter and is still in a beta stage. It could evolve in several directions, but I want to see how people use it before developing it further.

It’s not quite realtime, but the most recent tweet is rarely more than a few minutes old.

Behind the scenes, it processes tweets to try to ensure their relevance; it has about a 92% accuracy rate based on a training corpus of around 45,000 tweets. Use the “relevant” and “not relevant” buttons in the interface if you see a tweet that you think should or shouldn’t belong. (I’m mostly interested in the latter, but it seems weird not to have both–like Facebook’s lack of an unlike button.)

It currently uses Logos RefTagger to link the Bible references in the tweets.

Feel free to leave a comment here if you have a feature idea or want to make any suggestions.

Edit February 2016: This feature is no longer available; with over 200 million tweets, it’s just too much data to serve reliably. Instead, the latest tweets are visible.

Top 100 Linguistic Indicators of Bible-Related Tweets

Sunday, October 25th, 2009

When people tweet about Bible verses on Twitter, what words do they use? Here are the top 100:

  1. bible
  2. lord
  3. christ
  4. gospel
  5. psalm
  6. god
  7. psalms
  8. corinthians
  9. preach
  10. shall
  11. heaven
  12. readings
  13. church
  14. spirit
  15. righteous
  16. verse
  17. lectionary
  18. verses
  19. spiritual
  20. ministry
  21. pray
  22. enemies
  23. thou
  24. tongue
  25. creation
  26. wisdom
  27. deuteronomy
  28. testament
  29. strength
  30. refuge
  31. therefore
  32. kingdom
  33. romans
  34. holy
  35. thankful
  36. thy
  37. reading
  38. rejoice
  39. understanding
  40. faithful
  41. message
  42. earth
  43. blessed
  44. exodus
  45. deut
  46. faith
  47. wise
  48. beginning
  49. pastor
  50. chapel
  51. chapter
  52. survey
  53. anger
  54. resurrection
  55. risen
  56. read
  57. hearts
  58. chronicles
  59. salvation
  60. flesh
  61. servant
  62. glory
  63. praying
  64. kings
  65. sheep
  66. praise
  67. trust
  68. prosperity
  69. bless
  70. heavens
  71. deeds
  72. toward
  73. discussion
  74. whoever
  75. speaks
  76. ye
  77. hath
  78. amen
  79. teaching
  80. thess
  81. apostles
  82. preparing
  83. eph
  84. eccl
  85. path
  86. fear
  87. upon
  88. presence
  89. inspire
  90. search
  91. zechariah
  92. seek
  93. teach
  94. wrath
  95. commandments
  96. believers
  97. humility
  98. spoke
  99. thee
  100. devo

Background

Extracting Bible references from text means identifying whether a given piece of text is referring to a Bible verse or something else. For example, the meaning of Acts 2 depends on context:

  • Referring to Bible passage: Acts 2 recounts the early church.
  • Not referring to Bible passage: She’s 5 years old but acts 2.

When you encounter a phrase that could be a Bible reference, you have to look at context to determine whether the phrase is a Bible reference. Humans can make this leap pretty easily, but computers need rigorous models and lots of training data to guess whether an ambiguous phrase is a Bible reference. In the above example, the phrase “early church” is a strong indicator that the phrase “Acts 2” is a Bible reference, while the phrase “years old” is an indicator the other way.

Twitter, with its high volume of content and decent search engine, provides lots of training data.

Methodology

Using the Twitter Search API, I downloaded 30,000 tweets possibly containing Bible references (e.g., [john 3], [jeremiah 29]) and then categorized them by hand as referring to a Bible verse or not.

I then ran a Naive Bayes algorithm on the resulting tweets to produce the above list, which contains the words that most strongly indicate the presence of a Bible reference.

This list suffers from sample bias, of course: a different set of tweets would produce a different list. In addition, the list is Twitter-centric; the results may not carry over into blogs or other media. (People substitute the number “2” for the word “to” and “4” for “for” on Twitter more frequently than they do elsewhere, for example, which oversamples content like “I’m meeting Matthew 4 dinner.”)

See It in Action

Search for Bible references on Twitter. Use the relevant and not relevant buttons to improve the filtering. I haven’t formally announced this new feature of OpenBible.info yet; consider the link a preview.

Top 100 Things Twitterers Are Giving Up for Lent

Friday, February 27th, 2009

A Wordle of the below words shows the relative frequency of each one.

Some you’d expect (alcohol, chocolate), some are ironic (giving up Lent for Lent, giving up giving up things), some are odd (pants, lint), some are anti-religious (religion, Catholicism), and some are tech-related (Facebook, Twitter—even “Facebook and Twitter” makes the list).

Complete List

  1. Facebook (654)
  2. Twitter (317)
  3. Chocolate (272)
  4. Lent (216)
  5. Alcohol (187)
  6. Soda (139)
  7. Coffee (129)
  8. Meat (126)
  9. Religion (102)
  10. Swearing (94)
  11. Sweets (92)
  12. Catholicism (90)
  13. Giving up things (80)
  14. Work (70)
  15. Beer (60)
  16. Sex (59)
  17. Fast food (57)
  18. Facebook and twitter (57)
  19. Sugar (45)
  20. Stuff (43)
  21. Booze (41)
  22. Smoking (39)
  23. Food (39)
  24. Procrastination (38)
  25. Internet (37)
  26. Cursing (36)
  27. Caffeine (35)
  28. TV (33)
  29. Pancakes (33)
  30. Social networking (33)
  31. Sleep (32)
  32. Candy (32)
  33. Diet Coke (29)
  34. Giving up (29)
  35. You (28)
  36. Wine (28)
  37. Lint (28)
  38. Cheese (28)
  39. Bread (26)
  40. Shopping (26)
  41. Sobriety (26)
  42. Abstinence (24)
  43. Cussing (24)
  44. Red meat (24)
  45. Chips (23)
  46. Internet porn (22)
  47. Christianity (22)
  48. Nothing (21)
  49. French fries (21)
  50. Jesus (21)
  51. Sarcasm (19)
  52. Junk food (19)
  53. Starbucks (18)
  54. Ice cream (18)
  55. MySpace (18)
  56. Cookies (18)
  57. Fried food (17)
  58. Complaining (17)
  59. God (16)
  60. New years resolutions (15)
  61. Social media (15)
  62. Pizza (14)
  63. Tweeting (14)
  64. Carbs (13)
  65. MySpace and Facebook (13)
  66. Carbon (13)
  67. Eating out (13)
  68. Stress (13)
  69. Flaky guys (12)
  70. Laziness (12)
  71. Texting (12)
  72. Me (11)
  73. Some of your money (11)
  74. Annoying me (11)
  75. Sacrifice (11)
  76. School (11)
  77. Hope (10)
  78. Rice (10)
  79. Coke (10)
  80. Porn (10)
  81. The snooze button (10)
  82. Guilt (10)
  83. Men (9)
  84. Obama (9)
  85. Church (9)
  86. My job (9)
  87. Homework (9)
  88. Self denial (9)
  89. Moderation (9)
  90. Exercise (8)
  91. Bacon (8)
  92. Dieting (8)
  93. Paying taxes (8)
  94. Dr Pepper (8)
  95. Gossip (8)
  96. Beef (8)
  97. Pants (7)
  98. My sanity (7)
  99. Celibacy (7)
  100. Shaving (7)

About

Created using the Twitter Search API and Wordle. Data based on analysis of 15,000 tweets from February 22-26, 2009.