Blog RSS Feed

The Bible on Twitter in 2014

December 30th, 2014

Bible Gateway recently shared their most-popular Bible verses of 2014, and I wanted to discuss this chart a little more:

Popular Bible verses by day in 2014 on Bible Gateway

The chart stems from the idea that if someone is equally likely to see a verse on any day of the year, each day should have 1/365, or 0.27%, of a verse’s yearly popularity. This chart shows days when there’s a spike in pageviews for each verse for a particular day (whenever it was over 0.4% of the annual total).

The theme of the chart is that people follow certain paths through the Bible during the year; I labeled a few of them on the chart. But there are definitely a few patterns I can’t explain:

  1. At the beginning of the year, two lines emanate from Genesis that look like they’re on track to read the full Bible in a year, but one of them is faster than the other. Why are there two?
  2. At the bottom right of the chart is a shallow line that looks like it involves reading Genesis and Exodus starting in May and ending in December. There’s a similar line in the New Testament running through Matthew from June to November. What are those?

I was curious whether the same patterns would appear in Twitter for the year, so I ran a similar analysis on the 43 million tweets this year that mentioned Bible verses. The answer is that, yes, you can see many of the same paths in both charts:

Popular Bible verses by day in 2014 on Twitter

They even include the same two (or three or four) fast readings of the Bible at the beginning of the year and the slow reading of Genesis and Exodus in the second half of the year. You can see similar peaks around the Passion stories leading to Easter and the Nativity story leading to Christmas. (Christmas is the last day that appears on this chart.) The Twitter chart more clearly shows the weekly rhythms of the devotional life, with vertical lines just barely visible every Sunday. The main difference is that there’s not as clear a path through the New Testament.

The Twitter chart also shows some horizontal bands where sharing is pretty light. These “sharing shadows” appear in the opening chapters of Numbers, 1 Kings, and 1 Chronicles.

Prolific Verse Sharers

A quirk of the Twitter chart is that some Twitterers tweet (and are retweeted) a lot. I suspect many of them are bots, but it’s hard to say whether they constitute “Bible spam”–many people do appear to find them helpful by retweeting them. The top fifty or so Twitterers are responsible for 16 million of the 43 million tweets this year. The chart doesn’t look too different if you remove them (mostly, the frequent repetition of Matthew disappears), but that just could be because I didn’t remove enough users to affect the results meaningfully. For all I know, this chart mostly just shows how Twitter bots share the Bible during the year. The consistency with the Bible Gateway data (in which I have more confidence), however, leads me to think that this picture is reasonably accurate.

Here are the top non-bot (as far as I can tell) sharers of Bible verses–these people tweeted the most Bible verses (and, more importantly, were retweeted most) throughout the year. Some of these people I recognize, and others… not so much. The “tweet” numbers reflect only tweets containing Bible verses and include others’ retweets of their tweets.

  1. JohnPiper (105,836 tweets)
  2. DangeRussWilson (87,382 tweets)
  3. WeLiftYourName (52,638 tweets)
  4. JosephPrince (50,889 tweets)
  5. BishopJakes (49,109 tweets)
  6. siwon407 (48,994 tweets)
  7. RickWarren (42,637 tweets)
  8. JoyceMeyer (39,703 tweets)
  9. jeremycamp (32,003 tweets)
  10. DaveRamsey (28,173 tweets)
  11. RCCGworldwide (26,731 tweets)
  12. AdamCappa (25,976 tweets)
  13. Creflo_Dollar (24,422 tweets)
  14. sadierob (20,068 tweets)
  15. Carson_Case (19,846 tweets)
  16. TimTebow (18,303 tweets)
  17. Kevinwoo91 (17,230 tweets)
  18. levimitchell (16,355 tweets)
  19. jesse_duplantis (15,755 tweets)
  20. kutless (14,806 tweets)

Most-Popular Verses

Here are the most-popular verses shared on Twitter in 2014:

  1. Phil 4:13 (613,161 tweets)
  2. 1Pet 5:7 (261,417 tweets)
  3. Prov 3:5 (218,019 tweets)
  4. John 14:6 (212,883 tweets)
  5. John 13:7 (207,084 tweets)
  6. 1Cor 13:4 (197,379 tweets)
  7. Matt 28:20 (187,407 tweets)
  8. Ps 118:24 (183,475 tweets)
  9. 2Tim 1:7 (182,758 tweets)
  10. Ps 56:3 (180,139 tweets)

You can also download a text file (411 KB) with the complete list of 2014’s popular verses.

John 13:7 (“Jesus replied, ‘You do not realize now what I am doing, but later you will understand.'”) is the oddball here, but it turns out that it’s mostly from over 100,000 retweets of a single tweet in April. (Since it was a one-off, I omitted him from the list of top sharers above, although his tweet count of 163,497 would put him in first place.)

How do the year’s most-popular verses compare among Bible Gateway, YouVersion, and Twitter? The answer: there’s a good deal of variation. Below are the top ten from each service; only Proverbs 3:5 appears in all three lists, and YouVersion and Twitter only have one verse that overlaps, which surprises me (given that they’re both based on sharing).

If we look only at Bible Gateway and Twitter, the average verse differs in its ranking by about 3,000 places, or nearly 10% of the Bible. The largest differences in rank: 1 Kings 20:14 is much more popular on Twitter (rank 4,380) than on Bible Gateway (rank 27,119), while Ezra 5:14 is way more popular on Bible Gateway (rank 13,995) than Twitter (rank 30,018).

Ranking Bible Gateway YouVersion Twitter
1. John 3:16 Rom 12:2 Phil 4:13
2. Jer 29:11 Phil 4:8 1Pet 5:7
3. Phil 4:13 Phil 4:6 Prov 3:5
4. Rom 8:28 Jer 29:11 John 14:6
5. Ps 23:4 Matt 6:33 John 13:7
6. Phil 4:6 Phil 4:7 1Cor 13:4
7. 1Cor 13:4 Prov 3:5 Matt 28:20
8. Prov 3:5 Isa 41:10 Ps 118:24
9. 1Cor 13:7 Matt 6:34 2Tim 1:7
10. Rom 12:2 Prov 3:6 Ps 56:7

Bold entries appear in at least two lists.

Data Source

The Twitter data is from Bible Verses on Twitter. A program connects to the Twitter Streaming API with a query for every chapter of the Bible (“Gen 1”, “Genesis 1”, and so on). I run a Bible reference parser on the tweet to pull out all the references. Then an SVM algorithm tries to guess whether the tweet is actually talking about a Bible verse or just happens to contain a string that looks like a Bible reference (“Gen 1 XBox for sale,” where “Gen” is short for “Generation”).

Sidenote: How I Calculate Verse Views

A note on methodology: I’ve never documented how I determine a particular verse’s popularity; now’s a good time, because you can do it a number of ways to reach different answers. Let’s say that someone is looking at Genesis 1, which has 31 verses. That counts as one pageview, but if you’re looking for the number of pageviews that, say, Genesis 1:1 receives, how do attribute a chapter-length view like this? You could give each verse credit for a full pageview, but then verses in long chapters will appear to have a disproportionately high number of pageviews. Instead, I prefer to divide the pageview into the number of verses in the passage: in this case, each verse in Genesis 1 will receive 1/31, or 0.032 pageviews.

Now, what if someone is looking at, say, Genesis 1:1 and Matthew 1 (25 verses) on the same page? In this case, I divide the pageview by the number of separate passages: Genesis 1:1 receives credit for a full 0.5 pageviews, as does Matthew 1. Each verse in Matthew 1 therefore receives 0.5/25, or 0.02 pageviews.

I feel that this approach best respects people’s intentions whether they want to look at multiple verses, several independent passages, or just individual verses.

What Twitterers Are Giving up for Lent (2014 Edition)

March 8th, 2014

The top 100 things that people on Twitter are giving up for Lent in 2014.

This year, “School” topped the list of things Twitterers are giving up for Lent, up 44 places from last year. Remaining in the top ten from last year are Swearing, Alcohol, Soda, Social Networking, and Fast Food. Chocolate, Twitter, Sweets, and Lent round out the new additions to the top ten.

I don’t have a great explanation for why School is #1 this year–it could be that Ash Wednesday is later this year, so spring break is closer (for some it even starts today). It’s also possible that Twitter’s audience is skewing younger than it used to, or that younger Twitter users are more likely to tweet about Lent.

Timely topics this year are Boosie, referring to rapper Lil Boosie, who was released from prison this week (people joked that the prison was giving him up for Lent); and Electricity, referring to a widespread power outage in South Africa.

This list draws from 646,000 tweets during March 2 to 8 that mention giving up something for Lent and excludes retweets.

Rank Word Count Change from last year’s rank
1. School 11,757 +44
2. Chocolate 9,515 +15
3. Twitter 8,642 +8
4. Swearing 7,132 -2
5. Alcohol 6,325 0
6. Soda 5,446 -3
7. Social networking 4,197 -3
8. Sweets 4,188 +8
9. Fast food 4,088 0
10. Lent 2,842 +118
11. Meat 2,790 +26
12. Homework 2,760 +61
13. Junk food 2,723 +8
14. Coffee 2,678 +123
15. Sex 2,392 +112
16. Chips 2,129 -10
17. Bread 2,020 +114
18. You 2,016 +21
19. Facebook 1,926 0
20. Pizza 1,628 +122
21. Starbucks 1,566 +120
22. Candy 1,412 +87
23. Instagram 1,212 -13
24. Religion 1,147 +104
25. Virginity 1,143 -18
26. Cookies 1,053 -14
27. Work 1,031 +4
28. Ice cream 1,025 +27
29. Boys 1,021 +99
30. Marijuana 1,018 -22
31. Smoking 994 -9
32. Beer 939 +104
33. Life 933 -8
34. Food 930 +27
35. McDonalds 926 +20
36. Winter 853  
37. Netflix 851 -7
38. College 819 +16
39. My phone 777 +6
40. Shopping 748 +103
41. Stuff 733 +100
42. Selfies 731 +1
43. Chipotle 726 +100
44. Masturbation 725 -30
45. Sugar 682 +82
46. Cheese 670 +94
47. Me 656 +87
48. Sobriety 655 -17
49. Wine 652 +94
50. Carbs 648 +81
51. Boosie 581  
52. Fried food 574 -23
53. Caffeine 563 +70
54. Rice 562 +86
55. Catholicism 561 -28
56. Snapchat 543 +11
57. Coke 541 -22
58. Procrastination 517 -18
59. People 516 +54
60. Snow 506  
61. Desserts 486 -37
62. Fizzy drinks 480 +81
63. French fries 475 -29
64. Takeout 464 -49
65. Obama 452 +74
66. Makeup 451 -25
67. Taco Bell 434 +39
68. Feelings 434 -32
68. Porn 430  
69. Nothing 427 +74
70. My swag 420 -47
71. Negativity 417 +28
72. Red meat 396 +59
73. Diet Coke 390 +69
74. Sarcasm 380  
75. Breathing 369  
76. Caring 357 +66
77. Complaining 354  
78. Tea 352 +64
79. Pancakes 340 +63
80. Peanut butter 336  
81. Sweet tea 335  
82. Booze 325 +61
83. Sleep 320 +33
84. Hope 316 +46
85. Cake 313 -13
86. Pasta 303 +57
87. TV 302 +30
88. Texting 297 +52
89. Eating out 275 -29
90. Exercise 274 -47
91. Pants 270 +5
92. Electricity 268 +41
93. The gym 258 +16
94. Liquor 245  
95. Church 243 -46
96. Tinder 237 +35
97. Tumblr 236 +46
98. Math 236 +20
98. Juice 232 +35
99. Being mean 230  
100. Chick Fil A 228 -38

Categories

Rank Category Number of Tweets
1. food 62,453
2. school/work 18,148
3. technology 17,615
4. habits 16,616
5. smoking/drugs/alcohol 12,665
6. irony 7,319
7. relationship 6,563
8. sex 5,483
9. health/hygiene 3,476
10. religion 2,784
11. generic 2,504
12. entertainment 1,959
13. weather 1,496
14. shopping 1,183
15. celebrity 961
16. sports 780
17. politics 547
18. clothes 540
19. money 492
20. habit 393
21. possessions 217
22. clothing 62

Historical Trends

This year I added a new Historical Lent Tracker that you can use to investigate Lenten trends on your own over the past six years.

Here are some of my favorite graphs:

Second-Wave Social Media

Tumblr peaked in 2011, and WhatsApp, which Facebook recently paid $19 billion for, doesn’t register highly.

Instagram is highest, followed by Snapchat, Tumblr, Tinder, and Whatsapp

Fast Food Restaurants

Chipotle is much higher on the list than I expected–is that because people love it or because they hate it?

McDonald's is highest, followed by Chipotle, Taco Bell, Chick-Fil-A, Dunkin Donuts, Whataburger, KFC, and Subway.

One Direction vs. Justin Bieber

One Direction has been outpacing Justin Bieber since 2012.

Snack Foods

Congratulations, Hot Cheetos, on being the snack the most people want to give up.

Hot Cheetos is highest, followed by popcorn, Doritos, potato chips, and Cheetos.

Media Coverage

The Lent Tracker got some media attention this year. In roughly chronological order:

Finally, this Wall Street Journal article doesn’t talk about the Lent tracker, but it discusses the fraught phenomenon of Ash Wednesday selfies: Selfies Bring Ashtags to Lent. (This article may or may not be behind a paywall for you.)

“Hacking the Bible” in Christianity Today

March 6th, 2014

Read the article This month’s Christianity Today cover story, The Bible in the Original Geek, talks about how programmers are using technology to change how we read, study, and interpret the Bible. If you’re interested in the Bible and technology (and if you’re reading this blog, you probably are), then you should go read it.

(Ted Olsen, the author of the article, is doing also did an AMA on Reddit about the article.)

The article talks about the “academic priesthood,” and I think it’s particularly interesting that so few universities are interested in “digital theology” (for lack of a better term). You can study at Durham (like John Dyer is) or King’s College London, or you can try to work a biblical emphasis into a digital rhetoric Ph.D. But I’m surprised that more institutions, especially evangelical seminaries, aren’t at the forefront of the kind of research described in the article.

Track What People Are Giving Up in 2014 for Lent in Real Time

March 3rd, 2014

See the top 100 things people are giving up in 2014 for Lent on Twitter, continually updated until March 7, 2014.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: “Netflix,” “Flappy Bird,” and “Getting an Oscar.” “Social Networking” is currently way out in front, with twice as many tweets as perennial favorites “Swearing” and “Alcohol.” (Last year, Social Networking came in at #4.)

Look for the usual post-mortem on March 8, 2014.

Christmas Timeline Visualization

December 24th, 2013

Christmas Timeline Visualization

Over at the Bible Gateway Blog, I have a post discussing the above Christmas Timeline Visualization, which uses the same xkcd-inspired format as the Holy Week Timeline from 2011.

Sequencing the events of the Christmas story in the Bible to produce this visualization raises a few questions I’d never considered before (not that they’re unique to me):

  1. When does Mary conceive Jesus? Everyone (including several commentaries) says that it happens before Mary goes to visit Elizabeth. John the Baptist’s leap for joy in the womb is generally thought of as a response to Jesus’ proximity, but the text says that Mary’s voice prompts it. Even Elizabeth’s blessing doesn’t necessarily imply that Mary is already carrying Jesus.
  2. Did any of the shepherds who visited Jesus on the night of his birth have children whom Herod would later kill in the “Slaughter of the Innocents?” If so, that adds a chilling undertone to the story.
  3. Did the magi stay in the same inn at Bethlehem that didn’t have room for Mary and Joseph?
  4. Why do angels always inspire movement? Every time they show up in the story, someone heads off somewhere.

Thanks to my assistant for putting together the spreadsheet (CSV) containing all the data used in the visualization.

So You Want to Write a Kids’ Bible

November 28th, 2013

The Christmas story illustrated from 1880 and 2013. The bottom illustration is copyright Lifechurch.tv.

Let’s say you want to write a children’s Bible. When the time arrives to collect the stories you want to include, you need to choose which of the hundreds of tales in the Bible will make the cut. You can approach this decision a variety of ways: thematically (stories involving children, for example), artistically (stories that are illustratable), or even mathematically.

And by “mathematically,” I just mean “counting”–gather a bunch of kids’ Bibles, look at the tables of contents, and count the number of times that each story appears. Google Books speeds up this process; they include dozens of children’s Bibles in their index, some from the nineteenth century. My assistant went through over thirty Bibles for kids, copied out the tables of contents, and aligned all the stories. The resulting spreadsheet reflects around 415 unique stories that have appeared in kids’ Bibles over the past 180 years; around 350 of them show up in more than one kids’ Bible.

Show the complete list of popular stories from kids’ Bibles.

Most-Popular Old and New Testament Stories in Kids’ Bibles

Old Testament Stories

  1. Noah’s flood (Genesis 6-8)
  2. Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)
  3. Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
  4. Ruth (Ruth 1-4)
    Creation of the world (Genesis 1:1-25)
  5. David and Goliath (1 Samuel 17)
  6. Capture of Jericho (Joshua 6)
    Crossing the Red Sea (Exodus 14)
    Daniel and the lions’ den (Daniel 6)
  7. Burning bush (Exodus 3:1-4:17)
    David chosen by God (1 Samuel 16:1-13)
    The Ten Commandments (Exodus 20:1-17)

New Testament Stories

  1. Jesus’ birth (Luke 2:7)
  2. Wise men visit Jesus (Matt 2:10-12)
  3. Jesus as a boy in the Temple (Luke 2:41-52)
    Jesus’ crucifixion (Mark 15:22-40)
  4. Feeding of the 5,000 (Mark 6:32-44)
  5. Jesus and the children (Luke 18:15-17)
    Jesus chooses his disciples (Matt 4:18-22)
    Jesus calms the storm (Mark 4:35-41)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    The triumphal entry (Luke 19:28-44)
  6. Peter’s miraculous escape from prison (Acts 12:1-19)
    The Last Supper (Mark 14:18-26)
    Sermon on the Mount (Matt 5)
    Jesus’ ascension (Luke 24:51-53)

(Stories appearing under the same number in each column appear in an equal number of kids’ Bibles.)

Some stories fall in and out of favor over time–compare the following list of top stories from the 1800s to the top stories from the 2000s. After looking at the below list, one person I talked to suggested that the composition of Bibles for children has become more theological recently and less focused on character-building.

Most-Popular Old and New Testament Stories in Kids’ Bibles (1800s vs. 2000s)

Old Testament (1800s)

  1. Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)
  2. Noah’s flood (Genesis 6-8)
    Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
    Samson’s death (Judges 16:23-31)
    Ruth (Ruth 1-4)
  3. Birth of Jacob and Esau (Genesis 25:19-26)
    Cain and Abel (Genesis 4)
    Sodom and Gomorrah (Genesis 18:16-19:29)
    David chosen by God (1 Samuel 16:1-13)
    Samson’s birth (Judges 13)
    Samuel’s birth (1 Samuel 1)
    Samson’s marriage (Judges 14)

Old Testament (2000s)

  1. Noah’s flood (Genesis 6-8)
    Creation of the world (Genesis 1:1-25)
    Daniel and the lions’ den (Daniel 6)
  2. Burning bush (Exodus 3:1-4:17)
    Capture of Jericho (Joshua 6)
    Crossing the Red Sea (Exodus 14)
    David and Goliath (1 Samuel 17)
  3. Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
    David chosen by God (1 Samuel 16:1-13)
    Egypt’s nine plagues (Exodus 7:14-10:29)
    The Ten Commandments (Exodus 20:1-17)
    Tower of Babel (Genesis 11:1-9)
    Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)

New Testament (1800s)

  1. Jesus and the children (Luke 18:15-17)
    Peter’s miraculous escape from prison (Acts 12:1-19)
    Jesus’ birth (Luke 2:7)
    Jesus talks with the Samaritan Woman (John 4:4-42)
    Wise men visit Jesus (Matt 2:10-12)
    Jesus as a boy in the Temple (Luke 2:41-52)
    The triumphal entry (Luke 19:28-44)
  2. Bartimaeus sees (Mark 10:46-52)
    Jesus calms the storm (Mark 4:35-41)
    Jesus and the woman with bleeding (Luke 8:43-48)
    Parable of the good Samaritan (Luke 10:25-37)
    Parable of the prodigal son (Luke 15:11-32)
    Feeding of the 5,000 (Mark 6:32-44)
    The stoning of Stephen (Acts 7:54-60)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    Jesus’ crucifixion (Mark 15:22-40)

New Testament (2000s)

  1. Jesus’ birth (Luke 2:7)
  2. Wise men visit Jesus (Matt 2:10-12)
    Jesus’ crucifixion (Mark 15:22-40)
  3. The Holy Spirit comes at Pentecost (Acts 2:1-13)
    Feeding of the 5,000 (Mark 6:32-44)
    The Last Supper (Mark 14:18-26)
    Sermon on the Mount (Matt 5)
  4. Jesus chooses his disciples (Matt 4:18-22)
    Jesus’ temptation in the wilderness (Luke 4:1-13)
    The faith of the centurion (Luke 7:1-10)
    Jesus and the miraculous catch of fish (John 21:1-25)
    Shepherds visit Jesus (Luke 2:15-18)
    Parable of the good Samaritan (Luke 10:25-37)
    Peter heals the crippled beggar (Acts 3:1-10)
    Parable of the prodigal son (Luke 15:11-32)
    Saul’s conversion (Acts 9:1-19)
    Jesus walks on water (Mark 6:48-53)
    Jesus changes water to wine (John 2:1-11)
    Zacchaeus (Luke 19:1-10)
    Jesus as a boy in the Temple (Luke 2:41-52)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    The triumphal entry (Luke 19:28-44)
    Gethsemane (Mark 14:32-42)
    Raising of Lazarus (John 11:1-45)
    Jesus’ ascension (Luke 24:51-53)

The new Bible for Kids app from Lifechurch, out today, includes six stories: Creation, Fall, Jesus’ birth, Jesus heals a paralytic (the roof story), Jesus’ crucifixion, and Jesus’ resurrection. Aside from the story of the paralytic, all these stories are popular in recent Bibles for children.

Download the data

The raw data behind these lists is available as a Google Spreadsheet for you to download if you’re interested. For those books still in copyright, the contents of each book are copyright their respective authors.

Religious Interest among Facebook Users

April 25th, 2013

I have a post on the Bible Gateway blog that briefly looks at how religious interest among Facebook users varies with age.

In particular, eighteen-year-old women appear to have an especially strong interest in religion, which drops off sharply during their 20s. (Barna in 2003 published findings that corroborate the dropoff.)

The post makes some possibly unwarranted inferences from the original data published yesterday by Stephen Wolfram:

quotes + life philosophy data by age

How to Train Your Franken-Bible

March 16th, 2013

This is the outline of a talk that I gave earlier today at the BibleTech 2013 conference.

  • There are two parts to this talk
    • Inevitability of algorithmic translations. An algorithmic translation (or “Franken-Bible”) means a translation that’s at least partly done by a computer.
    • How to produce one with current technology.

  • The number of English translations has grown over the past five centuries and has accelerated recently. This growth mirrors overall trend in publishing as costs have diminished and publishers have proliferated.
  • I argue that this trend of ever-more translations will not only continue but will accelerate, that we’re heading for a post-translation world, where the number of English translations becomes so high, and the market so fragmented, that Bible translations as distinct identities will have much less meaning than they do today.
    • This trend isn’t specific to Bibles, but Bible translations participate in the larger shift to more diverse and fragmented cultural expressions.

  • Like other media, Bible translations are subject to a variety of pressures.
    • Linguistic (e.g., as English becomes more gender-inclusive, pressure rises on Bible translations to also become more inclusive).
    • Academic (discoveries that shed new light on existing translations). My favorite example is Matthew 17:15, where some older translations say, “My son is a lunatic,” while newer translations say, “My son has epilepsy.”
    • Theological / doctrinal (conform to certain understandings or agendas).
    • Social (decline in public religious influence leads to a loss of both shared stories and religious vocabulary).
    • Moral (pressure to address whatever the pressing issue of the day is).
    • Institutional (internally, where Bible translators want control over their translations; and externally, with the wider, increasing distrust of institutions).
    • Market. Bible translators need to make enough money (directly or indirectly) to sustain translation operations.
    • Technological. Technological pressure increases the variability and intensity of other pressures and is the main one we’ll be investigating today.

  • If you’re familiar with Clayton Christensen’s The Innovator’s Dilemma, you know that the basic premise is that existing dominant companies are eventually eclipsed by newer companies who release an inferior product at low prices. The dominant companies are happy to let the new company operate in this low-margin area, since they prefer to focus on their higher-margin businesses. The new company then steadily encroaches on the territory previously staked out by the existing companies until the existing companies have a much-diminished business. Eventually the formerly upstart company becomes the incumbent, and the cycle begins again.
    • One of the main drivers of this disruption is technology, where a technology that’s vastly inferior to existing methods in terms of quality comes at a much lower price and eventually supersedes existing methods.
  • I argue that English Bible translation is ripe for disruption, and that this disruption will take the form of large numbers of specialized translations that are, from the point of view of Bible translators, vastly inferior. But they’ll be prolific and easy to produce and will eventually supplant existing modern translations.

  • For an analogy, let’s look at book publishing (or any media, really, like news, music, or movies. But book publishing is what I’m most familiar with, so it’s what I’ll talk about). In the past twenty years, it’s gone through two major disruptions.
    • The first is a disruption in distribution, with Amazon.com and other web retailers. National bookstore chains consolidated or folded as they struggled to figure out how to compete with the lower prices and wider selection offered online.
    • This change hurt existing retailers but didn’t really affect the way that content creators like publishers and authors did business. From their perspective, selling through Amazon isn’t that different from selling through Barnes & Noble.
    • The second change is more disruptive to content creators: this change is the switch away from print books to ebooks. At first, this change seems more like a difference in degree rather than in kind. Again, from a publisher’s perspective, it seems like selling an ebook through Amazon is just a more-convenient way of selling a print book through Amazon.
    • But ebooks actually allow whole new businesses to emerge.

  • I’d argue that these are the main functions that publishers serve for authors–in other words, why would I, as an author, want a publisher to publish my book in exchange for some small cut of the profit?
    • Gatekeeping (by publishing a book, they’re saying it’s worth your time and has a certain level of quality: it’s been edited and vetted).
    • Marketing (making sure that people know about and buy books that are interesting to them).
    • Distribution (historically, shipping print books to bookstores).
  • Ebooks most-obviously remove the distribution pillar of this model–when producing and distributing an epub only involves a few clicks, it’s hard to argue that a publisher is adding a whole lot of value in distribution.
  • That leaves gatekeeping and marketing, which I’ll return to later in the context of Bibles.

  • But beyond just affecting these pillars, ebooks also allow new kinds of products:
    • First, a wider variety of content becomes more economically viable–content that’s too long or too short to print as a book can work great as an ebook, for example.
    • Second, self-publishing becomes more attractive: when traditional publishing shuns you because, say, your book is terrible, just go direct and let the market decide just how terrible it is. And if no one will buy it, you can always give it away–it’s not like it’s costing you anything.
  • So ebooks primarily allow large numbers of low-quality, low-priced books into the market, which fits the definition of disruption we talked about earlier.

  • Let’s talk specifically about Bible translations.
  • Traditionally, Bible translations have been expensive endeavors, involving teams of dozens of people working over several years.
    • The result is a high-quality product that conforms to the translation’s intended purpose.
    • In return for this high level of quality, Bible publishers charge money to, at a minimum, recoup their costs.
  • What would happen if we applied the lessons from the ongoing disruption in book publishing to Bible translations?
    • First, like Amazon and physical bookstores, we disrupt distribution.
    • Bible Gateway in 1993 on the web first disrupted distribution by letting people browse translations for free.
    • YouVersion in 2010 on mobile then took that disruption a step further by letting people download and own translations for free.
    • But we’re really only talking about a disruption in distribution here. Just like with print books, this type of disruption doesn’t affect the core Bible-translation process.
  • That second type of disruption is still to come and will eventually arrive; I’m going to argue that, as with ebooks, the disruption to translations themselves will be largely technological and will result in an explosion of new translations.
    • I believe that this disruption will take the form of partially algorithmic personalized Bible translations, or Franken-Bibles.
    • Because Bible translation is a specialized skill, these Franken-Bibles won’t arise from scratch–instead, they’ll build on existing translations and will be tailored to a particular audience–either a single individual or a group of people.

  • In its simplest form, a Franken-Bible could involve swapping out a footnote reading for the reading that’s in the main text.

  • A typical Bible has around 1100 footnotes indicating alternate translations. What if a translation allowed you to decide which reading you preferred–either by setting a policy (for example, you might, say, “always translate adelphoi in a particular way”) or by deciding on a case-by-case basis.

  • By including footnote variants at all, translations have already set themselves up for this approach. Some translations go even further–the Amplified and Expanded Bibles embrace variants by embedding them right in the text. Here we see a more-extensive version of the same idea.
  • But that’s the simplest approach. A more-radical approach, from a translation-integrity perspective, would allow people to edit the text of the translation itself, not merely to choose among pre-approved alternatives at given points.
    • In many ways, the pastor who says in a sermon, “This verse might better be translated as…” is already doing this; it just isn’t propagated back into the translation.
    • People also do this on Twitter all the time, where they alter a few words in a verse to make it more meaningful to them.
    • The risk here, of course, is that people will twist the text beyond all recognition, either through incompetence or malice.

  • That risk brings us back to one of the other two functions that publishers serve: gatekeeping.
    • A translation is nothing if not gatekeeping: a group of people have gotten together and declared, “This is what Scripture says. These are our translation principles, and here’s where we stand in relation to other translations.”
    • What happens to gatekeeping–to the seal of authority and trust–when anyone can change the text to suit themselves?

  • In other words, what happens if the “Word of God” becomes the “Wiki of God?”
    • After all, people who aren’t interested in translating their own Bible still want to be able to trust that they’re reading an accurate, or at least non-heretical, translation.

  • I suggest that new axes of trust will form. Whereas today a translation “brand”–NIV, ESV, or whatever–carries certain trust signals, those signals will shift to new parties, and in particular to groups that people already trust.
    • The question of whom you’d trust to steward a Bible translation probably isn’t that different from whomever you already trust theologically, a group that probably includes some of the following:
      • Social network.
      • Teachers or elders in your church.
      • Pastors in your church.
      • Your denomination.
      • More indirectly, a megachurch pastor you trust.
      • A parachurch organization or other nonprofit.
      • Maybe even a corporation.
    • The point is not that trust in a given translation would go away, but rather that trust would become more networked, complicated, and fragmented, before eventually solidifying.
      • We already see this happening somewhat with existing Bible translations, where certain groups declare some translations as OK and others as questionable. Like your choice of cellphone, the translation you use becomes an indicator of group identity.
      • The proliferation of translations will allow these groups who are already issuing imprimaturs to go a step further and advance whole translations that fit their viewpoints.
      • In other words, they’ll be able to act as their own Magisteriums. Their own self-published Magisteriums.
      • This, by the way, also addresses the third function that publishers serve: marketing. By associating a translation with a group you already identify with, you reduce the need for marketing.

  • I said earlier that I think technology will bring about this situation, but there are a couple ways it could happen.
    • First, an existing modern translation could open itself up to such modifications. A church could say, “I like this translation except for these ten verses.” Or, “This translation is fine except that it should translate this Greek word this way instead of this other way.”
    • Such a flexible translation could provide the tools to edit itself–with enough usage, it’s possible that useful improvements could be incorporated back into the original translation.
  • A second alternative is to use technology to produce a new, cheap, low-quality translation with editing in mind from the get-go, to provide a base from which a monstrous hydra of translations can grow. Let’s take a look at what such a hydra translation, or a Franken-Bible, could look like.

  • The basic premise is this: there are around thirty modern, high-quality translations of the Bible into English. Can we combine these translations algorithmically into something that charts the possibility space of the original text?
    • Bible translators already consult existing translations to explore nuances of meaning. What I propose is to consult these translations computationally and lay bare as many nuances as possible.

  • You can explore the output of what I’m about to discuss at www.adaptivebible.com. I’m going to talk about the process that went into creating the site.
  • This part of the talk is pretty technical.

  • In all, there are fifteen steps, broken into two phases: alignment of existing translations and generation of a new translation.
  • It took about ten minutes of processor time for each verse to produce the result.
  • The total cost in server time on Amazon EC2 to translate the New Testament was about $10. Compared to the millions of dollars that a traditional translation costs, that’s a big savings–five or six orders of magnitude.

  • The first phase is alignment.
  • First step. Collect as many English translations as possible. Obviously there are copyright implications with doing that, so it’s important to deal with only one verse at a time, which is something that all translations explicitly allow. For this project, we used around thirty translations.
  • Second. Normalize the text as much as possible. For this project, we’re not interested in any formatting, for example, so we can deal with just the plain text.
  • Third. Tokenize the text and run basic linguistic analysis on it.
    • Off-the-shelf open-source software from Stanford called Stanford CoreNLP tokenizes the text, identifies lemmas (base forms of words) and analyzes how words are related to each other syntactically.
    • In general, it’s about 90% accurate, which is fine for our purposes; we’ll be trying to enhance that accuracy later.
  • Fourth. Identify Wordnet similarities between translations.
    • Wordnet is a giant database of word meanings that computers can understand.
    • We take the lemmas from the step 3 and identify how close in meaning they are to each other. The thinking is that even when translations use different words for the same underlying Greek word, the words they choose will at least be similar in meaning.
    • For this step, we used Python’s Natural Language Toolkit.
  • Fifth. Run an off-the-shelf translation aligner.
    • We used another open-source program called the Berkeley Aligner, which is designed to use statistics to align content between different languages. But it works just as well for different translations of the same content in the same language. It takes anywhere from two to ten minutes for each verse to run.
  • Sixth. Consolidate all this data for future processing.
    • By this point, we have around 4MB of data for each verse, so we consolidate it into a format that’s easy for us to access in later steps.
  • Seventh. Run a machine-learning algorithm over the data to identify the best alignment between single words in each pair of translations.
    • We used another Python module, scikit-learn, to execute the algorithm.
    • In particular, we used Random Forest, which is a supervised-learning system. That means we need to feed it some data we know is good so that it can learn the patterns in the data.

  • Where did we get this good data? We wrote a simple drag-and-drop aligner to feed the algorithm, where there are two lists of words and you drag them on top of each other if they match; it’s actually kind of fun: if you juiced it up a little, I can totally see it becoming a game called “Translations with Friends.”
    • In total, we hand-aligned around 30 pairs of translations across 25 verses. There are about 8,000 verses in the New Testament, so it doesn’t need a lot of training to get good results.

  • What the algorithm actually runs on is a big vector matrix. These are the ten factors we included in our matrix.
    • 1. One translation might begin a verse with the words “Jesus said,” while another might put that same phrase at the end of the verse. All things being equal, though, translations tend to put words in similar positions in the verse. When all else fails, it’s worth taking position into account.
    • 2. Similarly, even when translations rearrange words, they’ll often keep them in the same sentence. Again, all things being equal, it’s more likely that the same word will appear in the same sentence position across translations.
    • 3. If we know that a particular word is in a prepositional phrase, for example, it’s not unlikely that it will serve a similar grammatical role in another translation.
    • 4. If words in different translations are both nouns or both verbs, it’s more likely that they’re translating the same word than if one’s a noun and another’s an adverb.
    • 5. Here we use the output from the Berkeley Aligner we ran earlier. The aligner is bidirectional, so if we’re comparing the word “Jesus” in one translation with the word “he” in another, we look both at what the Berkeley Aligner says “Jesus” should line up with in one translation and with what “he” should line up with in the other translation. It provides a fuller picture than just going in one direction.
    • 6. Here we go more general. Even if the Berkeley Aligner didn’t match up “Jesus” and “he” in the two translations we’re currently looking at, if other translations use “he” and the Aligner successfully aligned them with “Jesus”, we want to take that into account.
    • 7. This is similar to grammatical context but looks specifically at dependencies, which describe direct relationships between words. For example, if a word is the subject of a sentence in one translation, it’s likely to be the subject of a sentence in another translation.
    • 8. Wordnet similarity looks at the similarities we calculated earlier–words with similar meanings are more likely to reflect the same underlying words.
    • 9. This step strips out all words that aren’t nouns, pronouns, adjectives, verbs, and adverbs and compares their sequence–if a different word appears between two identical words across translations, there’s a good chance that it means the same thing.
    • 10. Finally, we look at any dependencies between major words; it’s a coarser version of what we did in #7.
    • The end result a giant matrix of data–ten vectors for every word-combination in every translation in every verse–and we run our machine-learning algorithm on it, which produces an alignment between every word in every translation.
    • At this point, we’ve generated between 50 and 250MB of data for every verse.

  • Eighth. Now that we have the direct alignment, we supplement it with indirect alignment data across translations. In other words, to reuse our earlier example, the alignment between two translations may not align “Jesus” and “he,” but alignments in other translations might strongly suggest that the two should be aligned.
  • At this point, we have a reasonable alignment among all the translations. It’s not perfect, but it doesn’t have to be. Now we shift to the second phase: generating a range of possible translations from this data.

  • First. Consolidate alignments into phrases, where we look for runs of parallel words. You can see that we’re only looking at lemmas here–dealing with every word creates a lot of noise that doesn’t add much value, so we ignore the less-important words. In this case, the first two have identical phrases even though the words differ slightly, while the third structures the sentence differently.
  • Second. Arrange translations into clusters based on how similar they are to each other structurally. In this example, the first two form a cluster, and the the third would be part of a different cluster.

  • Third. Insert actual keyword text. I’ve been using words in the examples I’ve been giving, but in the actual program, we use numerical ids assigned to each word. Here we start to introduce actual words.
  • Fourth. Fill in the gaps between keywords. We add in small words like conjunctions and prepositions that are key to producing recognizable English.
  • Fifth. Add in punctuation. Up to this point, we’ve been focusing on the commonalities among translations. Now we’re starting to focus on differences to produce a polished output.
  • Sixth. Reduce the possibility space to accept only valid bigrams. “Bigrams” just means two words in a row. We remove any two-word combinations that, based on our algorithm thus far, look like they should work but don’t. We check each pair of words to see whether they exist anywhere in one of our source translations. If they don’t, we get rid of them.

  • Seventh. Produce rendered output.

  • In this case, the output is just for the Adaptive Bible website. It shows the various translation possibilities for each verse.
    • Hovering over a reading shows what the site thinks are valid next words based on what you’re hovering over. (That’s the yellow.)
    • You can click a particular reading if you think it’s the best one, and the other readings disappear. Clicking again restores them. (That’s the green.)
    • The website shows a single sentence structure that it thinks has the best chance of being valid, but most verses have multiple valid structures that we don’t bother to show here.

  • To consider a verse successfully translated, this process has to produce readings supported by two independent translation streams (e.g., having a reading supported only by ESV and RSV doesn’t count because ESV is derived from RSV).
    • Using this metric, the process I’ve described produces valid output for 96% of verses in the New Testament.
    • On the current version of adaptivebible.com, I use stricter criteria, so only 91% of verses show up.

  • Limitations
    • Just because a verse passes the test, that doesn’t mean it’s actually grammatical, and it certainly doesn’t mean that every alternative presented within a verse is valid.
    • Because we use bigrams for validity, we can get into situations like what you see here, where all these are valid bigrams, but the result (“Jesus said, ‘Be healed,’ Jesus said”) is ridiculous.
    • There’s no handling of inter-verse transitions; even if a verse is totally valid, it may not read smoothly into the next verse.
    • Since we removed all formatting at the beginning of the process, there’s no formatting.
  • Despite those limitations, the process produced a couple of mildly interesting byproducts.

  • Probabilistic Strongs-to-Wordnet sense alignment. Given a single Strong’s alignment and a variety of translations, we can explore the semantic range of a Strong’s number. Here we have dunamis. This seems like a reasonably good approximation of its definition in English.

  • Identifying translation similarity. This slide explores how structurally similar translations are to each other, based on the phrase clusters we produced. The results are pretty much what I’d expect: translations that are derived from each other tend to be similar to each other.

  • What I’ve just described is one pretty basic approach to what I think is inevitable: the explosion of translations into Franken-Bibles as technology gets better. In the future, we won’t be talking about particular translations anymore but rather about trust networks.
  • To be clear, I’m not saying that I think this development is a particularly great one for the church, and it’s definitely not good for existing Bible translations. But I do think it’s only a matter of time until Franken-Bibles arrive. At first they’ll be unwieldy and ridiculously bad, but over time they’ll adapt, improve, and will need to be taken seriously.

What Twitterers Are Giving up for Lent (2013 Edition)

February 16th, 2013

The top 100 things that people on Twitter are giving up for Lent in 2013.

This year saw a lot of churn in the top 100 things people were giving up for Lent.

The pope announced his resignation on Monday, leading many to say that he was giving up “being pope” for Lent. It came in at #1. (Related, at #18, people said they were giving up “the pope” for Lent.)

Specific social networking sites like Twitter and Facebook generally dropped this year, with the generic term “social networking” (#4) taking over as a catchall. Instagram (#10), Pinterest (#52), and Snapchat (#78) were all new to the top 100.

With Valentine’s Day falling on the day after Ash Wednesday this year, it came in at #13. My wife suggests that the timing may also have contributed to the drop in “chocolate” from #2 last year to #17 this year. “Valentines” is #97.

“Horse meat” (#20) refers to the ongoing European scandal.

The only celebrity to make the list was British boy band One Direction, up substantially at #41.

I learned several new words this year: “twerking” (#34), a type of dance move, “selfies” (#46), or self-shot photos taken with a phone, “subtweeting” (#57), or tweeting about someone without mentioning them by name, “oomf” (#71), or “one of my [Twitter] followers,” and “Nando’s” (#76), a chicken restaurant.

This list draws from 263,000 tweets from February 10-15, 2013, and excludes most retweets.

Rank Word Count Change from last year’s rank
1. Being pope 5,654  
2. Swearing 4,944 +1
3. Soda 2,648 +2
4. Social networking 2,264 +19
5. Alcohol 2,217 -1
6. Chips 1,690 +8
7. Virginity 933 +23
8. Marijuana 784 +17
9. Fast food 776 -2
10. Instagram 755 +270
11. Twitter 672 -10
12. Cookies 643 +19
13. Valentine’s day 514  
14. Masturbation 510 +18
15. Takeout 465 +59
16. Sweets 444 -7
17. Chocolate 417 -15
18. The pope 394 +10,224
19. Facebook 380 -13
20. Horse meat 375  
21. Junk food 362 -8
22. Smoking 355 -3
23. My swag 331 +373
24. Desserts 325 +21
25. Life 325 +40
26. New year’s resolutions 313 +47
27. My boyfriend 309 +99
28. Catholicism 255 +11
29. Straightening my hair 228 +89
30. Fried food 225 +5
31. Netflix 216 +255
32. Work 216 -5
33. Sobriety 213 +4
34. Twerking 185 +698
35. The playoffs 184 +3,556
36. French fries 173 +19
37. Coke 168 +1
38. Feelings 168 +207
39. Laziness 160 +28
40. Meat 158 -30
41. Onedirection 155 +103
42. You 154 -24
43. Procrastination 153 +1
44. Makeup 150 +16
45. Internet 149 +61
46. Selfies 149 +2,328
47. Exercise 144 +58
48. School 141 -36
49. My phone 135 +15
50. Classes 129 +84
51. Dip 127 +132
52. Pinterest 125 +133
53. Church 124 +33
54. Emotions 122 +397
55. Going to school 119 +163
56. My girlfriend 111 +207
57. Subtweeting 110 +253
58. College 106 +5
59. My face 106 +4,168
60. Ice cream 106 -27
61. McDonald’s 102 -32
62. Being ugly 101 +256
63. Snacking 99 +19
64. Spending 96 +89
65. Dunkin Donuts 96 +475
66. Chew 95 +418
67. Eating out 94 +28
68. Elevators 94 +99
69. Food 93 -47
70. Moaning 93 +123
71. Oomf 93 +78
72. Chick Fil A 90 +135
73. Healthy food 88 +180
74. Football 87 +145
75. Swimming 87 +200
76. Nando’s 86 +72
77. DVDs 84 +1,326
78. Snapchat 84  
79. Broccoli 83 +206
80. Ranch 81 +250
81. The snooze button 80 +176
82. Crystal meth 80 +219
83. Dignity 79 +116
84. Cake 77 -13
85. Unhealthy food 77 +34
86. Homework 76 -65
87. Busyness 75  
88. Schoolwork 74 +88
89. Chemistry 74 +34,949
90. Frozen yogurt 72 +480
91. iPhone 72 +100
92. FIFA 71 +143
93. Betting 70 +315
94. Doing homework 69 +158
95. Myself 68 +267
96. Supermarkets 67 +1,797
97. Valentines 66  
98. Domino’s 63 +323
99. Being negative 63 +212
100. Hookah 63 +340

Categories

Rank Category Number of Tweets
1. food 11,642
2. habits 8,083
3. religion 6,519
4. technology 4,782
5. smoking/drugs/alcohol 3,928
6. sex 1,771
7. relationship 1,399
8. health/hygiene 1,270
9. school work 1,095
10. irony 792
11. sports 648
12. entertainment 392
13. celebrity 246
14. clothes 235
15. money 133
16. shopping 111

The image is a Wordle.

Track What People Are Giving Up in 2013 for Lent in Real Time

February 11th, 2013

See the top 100 things people are giving up in 2013 for Lent on Twitter, continually updated until February 15, 2013.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: meowing, Valentine’s Day, and Snapchat.

Look for the usual post-mortem on February 16, 2013.