Blog RSS Feed

Track What People Are Giving Up in 2014 for Lent in Real Time

March 3rd, 2014

See the top 100 things people are giving up in 2014 for Lent on Twitter, continually updated until March 7, 2014.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: “Netflix,” “Flappy Bird,” and “Getting an Oscar.” “Social Networking” is currently way out in front, with twice as many tweets as perennial favorites “Swearing” and “Alcohol.” (Last year, Social Networking came in at #4.)

Look for the usual post-mortem on March 8, 2014.

Christmas Timeline Visualization

December 24th, 2013

Christmas Timeline Visualization

Over at the Bible Gateway Blog, I have a post discussing the above Christmas Timeline Visualization, which uses the same xkcd-inspired format as the Holy Week Timeline from 2011.

Sequencing the events of the Christmas story in the Bible to produce this visualization raises a few questions I’d never considered before (not that they’re unique to me):

  1. When does Mary conceive Jesus? Everyone (including several commentaries) says that it happens before Mary goes to visit Elizabeth. John the Baptist’s leap for joy in the womb is generally thought of as a response to Jesus’ proximity, but the text says that Mary’s voice prompts it. Even Elizabeth’s blessing doesn’t necessarily imply that Mary is already carrying Jesus.
  2. Did any of the shepherds who visited Jesus on the night of his birth have children whom Herod would later kill in the “Slaughter of the Innocents?” If so, that adds a chilling undertone to the story.
  3. Did the magi stay in the same inn at Bethlehem that didn’t have room for Mary and Joseph?
  4. Why do angels always inspire movement? Every time they show up in the story, someone heads off somewhere.

Thanks to my assistant for putting together the spreadsheet (CSV) containing all the data used in the visualization.

So You Want to Write a Kids’ Bible

November 28th, 2013

The Christmas story illustrated from 1880 and 2013. The bottom illustration is copyright Lifechurch.tv.

Let’s say you want to write a children’s Bible. When the time arrives to collect the stories you want to include, you need to choose which of the hundreds of tales in the Bible will make the cut. You can approach this decision a variety of ways: thematically (stories involving children, for example), artistically (stories that are illustratable), or even mathematically.

And by “mathematically,” I just mean “counting”–gather a bunch of kids’ Bibles, look at the tables of contents, and count the number of times that each story appears. Google Books speeds up this process; they include dozens of children’s Bibles in their index, some from the nineteenth century. My assistant went through over thirty Bibles for kids, copied out the tables of contents, and aligned all the stories. The resulting spreadsheet reflects around 415 unique stories that have appeared in kids’ Bibles over the past 180 years; around 350 of them show up in more than one kids’ Bible.

Show the complete list of popular stories from kids’ Bibles.

Most-Popular Old and New Testament Stories in Kids’ Bibles

Old Testament Stories

  1. Noah’s flood (Genesis 6-8)
  2. Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)
  3. Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
  4. Ruth (Ruth 1-4)
    Creation of the world (Genesis 1:1-25)
  5. David and Goliath (1 Samuel 17)
  6. Capture of Jericho (Joshua 6)
    Crossing the Red Sea (Exodus 14)
    Daniel and the lions’ den (Daniel 6)
  7. Burning bush (Exodus 3:1-4:17)
    David chosen by God (1 Samuel 16:1-13)
    The Ten Commandments (Exodus 20:1-17)

New Testament Stories

  1. Jesus’ birth (Luke 2:7)
  2. Wise men visit Jesus (Matt 2:10-12)
  3. Jesus as a boy in the Temple (Luke 2:41-52)
    Jesus’ crucifixion (Mark 15:22-40)
  4. Feeding of the 5,000 (Mark 6:32-44)
  5. Jesus and the children (Luke 18:15-17)
    Jesus chooses his disciples (Matt 4:18-22)
    Jesus calms the storm (Mark 4:35-41)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    The triumphal entry (Luke 19:28-44)
  6. Peter’s miraculous escape from prison (Acts 12:1-19)
    The Last Supper (Mark 14:18-26)
    Sermon on the Mount (Matt 5)
    Jesus’ ascension (Luke 24:51-53)

(Stories appearing under the same number in each column appear in an equal number of kids’ Bibles.)

Some stories fall in and out of favor over time–compare the following list of top stories from the 1800s to the top stories from the 2000s. After looking at the below list, one person I talked to suggested that the composition of Bibles for children has become more theological recently and less focused on character-building.

Most-Popular Old and New Testament Stories in Kids’ Bibles (1800s vs. 2000s)

Old Testament (1800s)

  1. Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)
  2. Noah’s flood (Genesis 6-8)
    Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
    Samson’s death (Judges 16:23-31)
    Ruth (Ruth 1-4)
  3. Birth of Jacob and Esau (Genesis 25:19-26)
    Cain and Abel (Genesis 4)
    Sodom and Gomorrah (Genesis 18:16-19:29)
    David chosen by God (1 Samuel 16:1-13)
    Samson’s birth (Judges 13)
    Samuel’s birth (1 Samuel 1)
    Samson’s marriage (Judges 14)

Old Testament (2000s)

  1. Noah’s flood (Genesis 6-8)
    Creation of the world (Genesis 1:1-25)
    Daniel and the lions’ den (Daniel 6)
  2. Burning bush (Exodus 3:1-4:17)
    Capture of Jericho (Joshua 6)
    Crossing the Red Sea (Exodus 14)
    David and Goliath (1 Samuel 17)
  3. Joseph’s coat, dreams, and sold by his brothers for twenty pieces of silver (Genesis 37)
    David chosen by God (1 Samuel 16:1-13)
    Egypt’s nine plagues (Exodus 7:14-10:29)
    The Ten Commandments (Exodus 20:1-17)
    Tower of Babel (Genesis 11:1-9)
    Moses’ birth and being found by Pharaoh’s daughter (Exodus 2:1-10)

New Testament (1800s)

  1. Jesus and the children (Luke 18:15-17)
    Peter’s miraculous escape from prison (Acts 12:1-19)
    Jesus’ birth (Luke 2:7)
    Jesus talks with the Samaritan Woman (John 4:4-42)
    Wise men visit Jesus (Matt 2:10-12)
    Jesus as a boy in the Temple (Luke 2:41-52)
    The triumphal entry (Luke 19:28-44)
  2. Bartimaeus sees (Mark 10:46-52)
    Jesus calms the storm (Mark 4:35-41)
    Jesus and the woman with bleeding (Luke 8:43-48)
    Parable of the good Samaritan (Luke 10:25-37)
    Parable of the prodigal son (Luke 15:11-32)
    Feeding of the 5,000 (Mark 6:32-44)
    The stoning of Stephen (Acts 7:54-60)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    Jesus’ crucifixion (Mark 15:22-40)

New Testament (2000s)

  1. Jesus’ birth (Luke 2:7)
  2. Wise men visit Jesus (Matt 2:10-12)
    Jesus’ crucifixion (Mark 15:22-40)
  3. The Holy Spirit comes at Pentecost (Acts 2:1-13)
    Feeding of the 5,000 (Mark 6:32-44)
    The Last Supper (Mark 14:18-26)
    Sermon on the Mount (Matt 5)
  4. Jesus chooses his disciples (Matt 4:18-22)
    Jesus’ temptation in the wilderness (Luke 4:1-13)
    The faith of the centurion (Luke 7:1-10)
    Jesus and the miraculous catch of fish (John 21:1-25)
    Shepherds visit Jesus (Luke 2:15-18)
    Parable of the good Samaritan (Luke 10:25-37)
    Peter heals the crippled beggar (Acts 3:1-10)
    Parable of the prodigal son (Luke 15:11-32)
    Saul’s conversion (Acts 9:1-19)
    Jesus walks on water (Mark 6:48-53)
    Jesus changes water to wine (John 2:1-11)
    Zacchaeus (Luke 19:1-10)
    Jesus as a boy in the Temple (Luke 2:41-52)
    Jairus’ daughter (Luke 8:41-42, 49-56)
    The triumphal entry (Luke 19:28-44)
    Gethsemane (Mark 14:32-42)
    Raising of Lazarus (John 11:1-45)
    Jesus’ ascension (Luke 24:51-53)

The new Bible for Kids app from Lifechurch, out today, includes six stories: Creation, Fall, Jesus’ birth, Jesus heals a paralytic (the roof story), Jesus’ crucifixion, and Jesus’ resurrection. Aside from the story of the paralytic, all these stories are popular in recent Bibles for children.

Download the data

The raw data behind these lists is available as a Google Spreadsheet for you to download if you’re interested. For those books still in copyright, the contents of each book are copyright their respective authors.

Religious Interest among Facebook Users

April 25th, 2013

I have a post on the Bible Gateway blog that briefly looks at how religious interest among Facebook users varies with age.

In particular, eighteen-year-old women appear to have an especially strong interest in religion, which drops off sharply during their 20s. (Barna in 2003 published findings that corroborate the dropoff.)

The post makes some possibly unwarranted inferences from the original data published yesterday by Stephen Wolfram:

quotes + life philosophy data by age

How to Train Your Franken-Bible

March 16th, 2013

This is the outline of a talk that I gave earlier today at the BibleTech 2013 conference.

  • There are two parts to this talk
    • Inevitability of algorithmic translations. An algorithmic translation (or “Franken-Bible”) means a translation that’s at least partly done by a computer.
    • How to produce one with current technology.

  • The number of English translations has grown over the past five centuries and has accelerated recently. This growth mirrors overall trend in publishing as costs have diminished and publishers have proliferated.
  • I argue that this trend of ever-more translations will not only continue but will accelerate, that we’re heading for a post-translation world, where the number of English translations becomes so high, and the market so fragmented, that Bible translations as distinct identities will have much less meaning than they do today.
    • This trend isn’t specific to Bibles, but Bible translations participate in the larger shift to more diverse and fragmented cultural expressions.

  • Like other media, Bible translations are subject to a variety of pressures.
    • Linguistic (e.g., as English becomes more gender-inclusive, pressure rises on Bible translations to also become more inclusive).
    • Academic (discoveries that shed new light on existing translations). My favorite example is Matthew 17:15, where some older translations say, “My son is a lunatic,” while newer translations say, “My son has epilepsy.”
    • Theological / doctrinal (conform to certain understandings or agendas).
    • Social (decline in public religious influence leads to a loss of both shared stories and religious vocabulary).
    • Moral (pressure to address whatever the pressing issue of the day is).
    • Institutional (internally, where Bible translators want control over their translations; and externally, with the wider, increasing distrust of institutions).
    • Market. Bible translators need to make enough money (directly or indirectly) to sustain translation operations.
    • Technological. Technological pressure increases the variability and intensity of other pressures and is the main one we’ll be investigating today.

  • If you’re familiar with Clayton Christensen’s The Innovator’s Dilemma, you know that the basic premise is that existing dominant companies are eventually eclipsed by newer companies who release an inferior product at low prices. The dominant companies are happy to let the new company operate in this low-margin area, since they prefer to focus on their higher-margin businesses. The new company then steadily encroaches on the territory previously staked out by the existing companies until the existing companies have a much-diminished business. Eventually the formerly upstart company becomes the incumbent, and the cycle begins again.
    • One of the main drivers of this disruption is technology, where a technology that’s vastly inferior to existing methods in terms of quality comes at a much lower price and eventually supersedes existing methods.
  • I argue that English Bible translation is ripe for disruption, and that this disruption will take the form of large numbers of specialized translations that are, from the point of view of Bible translators, vastly inferior. But they’ll be prolific and easy to produce and will eventually supplant existing modern translations.

  • For an analogy, let’s look at book publishing (or any media, really, like news, music, or movies. But book publishing is what I’m most familiar with, so it’s what I’ll talk about). In the past twenty years, it’s gone through two major disruptions.
    • The first is a disruption in distribution, with Amazon.com and other web retailers. National bookstore chains consolidated or folded as they struggled to figure out how to compete with the lower prices and wider selection offered online.
    • This change hurt existing retailers but didn’t really affect the way that content creators like publishers and authors did business. From their perspective, selling through Amazon isn’t that different from selling through Barnes & Noble.
    • The second change is more disruptive to content creators: this change is the switch away from print books to ebooks. At first, this change seems more like a difference in degree rather than in kind. Again, from a publisher’s perspective, it seems like selling an ebook through Amazon is just a more-convenient way of selling a print book through Amazon.
    • But ebooks actually allow whole new businesses to emerge.

  • I’d argue that these are the main functions that publishers serve for authors–in other words, why would I, as an author, want a publisher to publish my book in exchange for some small cut of the profit?
    • Gatekeeping (by publishing a book, they’re saying it’s worth your time and has a certain level of quality: it’s been edited and vetted).
    • Marketing (making sure that people know about and buy books that are interesting to them).
    • Distribution (historically, shipping print books to bookstores).
  • Ebooks most-obviously remove the distribution pillar of this model–when producing and distributing an epub only involves a few clicks, it’s hard to argue that a publisher is adding a whole lot of value in distribution.
  • That leaves gatekeeping and marketing, which I’ll return to later in the context of Bibles.

  • But beyond just affecting these pillars, ebooks also allow new kinds of products:
    • First, a wider variety of content becomes more economically viable–content that’s too long or too short to print as a book can work great as an ebook, for example.
    • Second, self-publishing becomes more attractive: when traditional publishing shuns you because, say, your book is terrible, just go direct and let the market decide just how terrible it is. And if no one will buy it, you can always give it away–it’s not like it’s costing you anything.
  • So ebooks primarily allow large numbers of low-quality, low-priced books into the market, which fits the definition of disruption we talked about earlier.

  • Let’s talk specifically about Bible translations.
  • Traditionally, Bible translations have been expensive endeavors, involving teams of dozens of people working over several years.
    • The result is a high-quality product that conforms to the translation’s intended purpose.
    • In return for this high level of quality, Bible publishers charge money to, at a minimum, recoup their costs.
  • What would happen if we applied the lessons from the ongoing disruption in book publishing to Bible translations?
    • First, like Amazon and physical bookstores, we disrupt distribution.
    • Bible Gateway in 1993 on the web first disrupted distribution by letting people browse translations for free.
    • YouVersion in 2010 on mobile then took that disruption a step further by letting people download and own translations for free.
    • But we’re really only talking about a disruption in distribution here. Just like with print books, this type of disruption doesn’t affect the core Bible-translation process.
  • That second type of disruption is still to come and will eventually arrive; I’m going to argue that, as with ebooks, the disruption to translations themselves will be largely technological and will result in an explosion of new translations.
    • I believe that this disruption will take the form of partially algorithmic personalized Bible translations, or Franken-Bibles.
    • Because Bible translation is a specialized skill, these Franken-Bibles won’t arise from scratch–instead, they’ll build on existing translations and will be tailored to a particular audience–either a single individual or a group of people.

  • In its simplest form, a Franken-Bible could involve swapping out a footnote reading for the reading that’s in the main text.

  • A typical Bible has around 1100 footnotes indicating alternate translations. What if a translation allowed you to decide which reading you preferred–either by setting a policy (for example, you might, say, “always translate adelphoi in a particular way”) or by deciding on a case-by-case basis.

  • By including footnote variants at all, translations have already set themselves up for this approach. Some translations go even further–the Amplified and Expanded Bibles embrace variants by embedding them right in the text. Here we see a more-extensive version of the same idea.
  • But that’s the simplest approach. A more-radical approach, from a translation-integrity perspective, would allow people to edit the text of the translation itself, not merely to choose among pre-approved alternatives at given points.
    • In many ways, the pastor who says in a sermon, “This verse might better be translated as…” is already doing this; it just isn’t propagated back into the translation.
    • People also do this on Twitter all the time, where they alter a few words in a verse to make it more meaningful to them.
    • The risk here, of course, is that people will twist the text beyond all recognition, either through incompetence or malice.

  • That risk brings us back to one of the other two functions that publishers serve: gatekeeping.
    • A translation is nothing if not gatekeeping: a group of people have gotten together and declared, “This is what Scripture says. These are our translation principles, and here’s where we stand in relation to other translations.”
    • What happens to gatekeeping–to the seal of authority and trust–when anyone can change the text to suit themselves?

  • In other words, what happens if the “Word of God” becomes the “Wiki of God?”
    • After all, people who aren’t interested in translating their own Bible still want to be able to trust that they’re reading an accurate, or at least non-heretical, translation.

  • I suggest that new axes of trust will form. Whereas today a translation “brand”–NIV, ESV, or whatever–carries certain trust signals, those signals will shift to new parties, and in particular to groups that people already trust.
    • The question of whom you’d trust to steward a Bible translation probably isn’t that different from whomever you already trust theologically, a group that probably includes some of the following:
      • Social network.
      • Teachers or elders in your church.
      • Pastors in your church.
      • Your denomination.
      • More indirectly, a megachurch pastor you trust.
      • A parachurch organization or other nonprofit.
      • Maybe even a corporation.
    • The point is not that trust in a given translation would go away, but rather that trust would become more networked, complicated, and fragmented, before eventually solidifying.
      • We already see this happening somewhat with existing Bible translations, where certain groups declare some translations as OK and others as questionable. Like your choice of cellphone, the translation you use becomes an indicator of group identity.
      • The proliferation of translations will allow these groups who are already issuing imprimaturs to go a step further and advance whole translations that fit their viewpoints.
      • In other words, they’ll be able to act as their own Magisteriums. Their own self-published Magisteriums.
      • This, by the way, also addresses the third function that publishers serve: marketing. By associating a translation with a group you already identify with, you reduce the need for marketing.

  • I said earlier that I think technology will bring about this situation, but there are a couple ways it could happen.
    • First, an existing modern translation could open itself up to such modifications. A church could say, “I like this translation except for these ten verses.” Or, “This translation is fine except that it should translate this Greek word this way instead of this other way.”
    • Such a flexible translation could provide the tools to edit itself–with enough usage, it’s possible that useful improvements could be incorporated back into the original translation.
  • A second alternative is to use technology to produce a new, cheap, low-quality translation with editing in mind from the get-go, to provide a base from which a monstrous hydra of translations can grow. Let’s take a look at what such a hydra translation, or a Franken-Bible, could look like.

  • The basic premise is this: there are around thirty modern, high-quality translations of the Bible into English. Can we combine these translations algorithmically into something that charts the possibility space of the original text?
    • Bible translators already consult existing translations to explore nuances of meaning. What I propose is to consult these translations computationally and lay bare as many nuances as possible.

  • You can explore the output of what I’m about to discuss at www.adaptivebible.com. I’m going to talk about the process that went into creating the site.
  • This part of the talk is pretty technical.

  • In all, there are fifteen steps, broken into two phases: alignment of existing translations and generation of a new translation.
  • It took about ten minutes of processor time for each verse to produce the result.
  • The total cost in server time on Amazon EC2 to translate the New Testament was about $10. Compared to the millions of dollars that a traditional translation costs, that’s a big savings–five or six orders of magnitude.

  • The first phase is alignment.
  • First step. Collect as many English translations as possible. Obviously there are copyright implications with doing that, so it’s important to deal with only one verse at a time, which is something that all translations explicitly allow. For this project, we used around thirty translations.
  • Second. Normalize the text as much as possible. For this project, we’re not interested in any formatting, for example, so we can deal with just the plain text.
  • Third. Tokenize the text and run basic linguistic analysis on it.
    • Off-the-shelf open-source software from Stanford called Stanford CoreNLP tokenizes the text, identifies lemmas (base forms of words) and analyzes how words are related to each other syntactically.
    • In general, it’s about 90% accurate, which is fine for our purposes; we’ll be trying to enhance that accuracy later.
  • Fourth. Identify Wordnet similarities between translations.
    • Wordnet is a giant database of word meanings that computers can understand.
    • We take the lemmas from the step 3 and identify how close in meaning they are to each other. The thinking is that even when translations use different words for the same underlying Greek word, the words they choose will at least be similar in meaning.
    • For this step, we used Python’s Natural Language Toolkit.
  • Fifth. Run an off-the-shelf translation aligner.
    • We used another open-source program called the Berkeley Aligner, which is designed to use statistics to align content between different languages. But it works just as well for different translations of the same content in the same language. It takes anywhere from two to ten minutes for each verse to run.
  • Sixth. Consolidate all this data for future processing.
    • By this point, we have around 4MB of data for each verse, so we consolidate it into a format that’s easy for us to access in later steps.
  • Seventh. Run a machine-learning algorithm over the data to identify the best alignment between single words in each pair of translations.
    • We used another Python module, scikit-learn, to execute the algorithm.
    • In particular, we used Random Forest, which is a supervised-learning system. That means we need to feed it some data we know is good so that it can learn the patterns in the data.

  • Where did we get this good data? We wrote a simple drag-and-drop aligner to feed the algorithm, where there are two lists of words and you drag them on top of each other if they match; it’s actually kind of fun: if you juiced it up a little, I can totally see it becoming a game called “Translations with Friends.”
    • In total, we hand-aligned around 30 pairs of translations across 25 verses. There are about 8,000 verses in the New Testament, so it doesn’t need a lot of training to get good results.

  • What the algorithm actually runs on is a big vector matrix. These are the ten factors we included in our matrix.
    • 1. One translation might begin a verse with the words “Jesus said,” while another might put that same phrase at the end of the verse. All things being equal, though, translations tend to put words in similar positions in the verse. When all else fails, it’s worth taking position into account.
    • 2. Similarly, even when translations rearrange words, they’ll often keep them in the same sentence. Again, all things being equal, it’s more likely that the same word will appear in the same sentence position across translations.
    • 3. If we know that a particular word is in a prepositional phrase, for example, it’s not unlikely that it will serve a similar grammatical role in another translation.
    • 4. If words in different translations are both nouns or both verbs, it’s more likely that they’re translating the same word than if one’s a noun and another’s an adverb.
    • 5. Here we use the output from the Berkeley Aligner we ran earlier. The aligner is bidirectional, so if we’re comparing the word “Jesus” in one translation with the word “he” in another, we look both at what the Berkeley Aligner says “Jesus” should line up with in one translation and with what “he” should line up with in the other translation. It provides a fuller picture than just going in one direction.
    • 6. Here we go more general. Even if the Berkeley Aligner didn’t match up “Jesus” and “he” in the two translations we’re currently looking at, if other translations use “he” and the Aligner successfully aligned them with “Jesus”, we want to take that into account.
    • 7. This is similar to grammatical context but looks specifically at dependencies, which describe direct relationships between words. For example, if a word is the subject of a sentence in one translation, it’s likely to be the subject of a sentence in another translation.
    • 8. Wordnet similarity looks at the similarities we calculated earlier–words with similar meanings are more likely to reflect the same underlying words.
    • 9. This step strips out all words that aren’t nouns, pronouns, adjectives, verbs, and adverbs and compares their sequence–if a different word appears between two identical words across translations, there’s a good chance that it means the same thing.
    • 10. Finally, we look at any dependencies between major words; it’s a coarser version of what we did in #7.
    • The end result a giant matrix of data–ten vectors for every word-combination in every translation in every verse–and we run our machine-learning algorithm on it, which produces an alignment between every word in every translation.
    • At this point, we’ve generated between 50 and 250MB of data for every verse.

  • Eighth. Now that we have the direct alignment, we supplement it with indirect alignment data across translations. In other words, to reuse our earlier example, the alignment between two translations may not align “Jesus” and “he,” but alignments in other translations might strongly suggest that the two should be aligned.
  • At this point, we have a reasonable alignment among all the translations. It’s not perfect, but it doesn’t have to be. Now we shift to the second phase: generating a range of possible translations from this data.

  • First. Consolidate alignments into phrases, where we look for runs of parallel words. You can see that we’re only looking at lemmas here–dealing with every word creates a lot of noise that doesn’t add much value, so we ignore the less-important words. In this case, the first two have identical phrases even though the words differ slightly, while the third structures the sentence differently.
  • Second. Arrange translations into clusters based on how similar they are to each other structurally. In this example, the first two form a cluster, and the the third would be part of a different cluster.

  • Third. Insert actual keyword text. I’ve been using words in the examples I’ve been giving, but in the actual program, we use numerical ids assigned to each word. Here we start to introduce actual words.
  • Fourth. Fill in the gaps between keywords. We add in small words like conjunctions and prepositions that are key to producing recognizable English.
  • Fifth. Add in punctuation. Up to this point, we’ve been focusing on the commonalities among translations. Now we’re starting to focus on differences to produce a polished output.
  • Sixth. Reduce the possibility space to accept only valid bigrams. “Bigrams” just means two words in a row. We remove any two-word combinations that, based on our algorithm thus far, look like they should work but don’t. We check each pair of words to see whether they exist anywhere in one of our source translations. If they don’t, we get rid of them.

  • Seventh. Produce rendered output.

  • In this case, the output is just for the Adaptive Bible website. It shows the various translation possibilities for each verse.
    • Hovering over a reading shows what the site thinks are valid next words based on what you’re hovering over. (That’s the yellow.)
    • You can click a particular reading if you think it’s the best one, and the other readings disappear. Clicking again restores them. (That’s the green.)
    • The website shows a single sentence structure that it thinks has the best chance of being valid, but most verses have multiple valid structures that we don’t bother to show here.

  • To consider a verse successfully translated, this process has to produce readings supported by two independent translation streams (e.g., having a reading supported only by ESV and RSV doesn’t count because ESV is derived from RSV).
    • Using this metric, the process I’ve described produces valid output for 96% of verses in the New Testament.
    • On the current version of adaptivebible.com, I use stricter criteria, so only 91% of verses show up.

  • Limitations
    • Just because a verse passes the test, that doesn’t mean it’s actually grammatical, and it certainly doesn’t mean that every alternative presented within a verse is valid.
    • Because we use bigrams for validity, we can get into situations like what you see here, where all these are valid bigrams, but the result (“Jesus said, ‘Be healed,’ Jesus said”) is ridiculous.
    • There’s no handling of inter-verse transitions; even if a verse is totally valid, it may not read smoothly into the next verse.
    • Since we removed all formatting at the beginning of the process, there’s no formatting.
  • Despite those limitations, the process produced a couple of mildly interesting byproducts.

  • Probabilistic Strongs-to-Wordnet sense alignment. Given a single Strong’s alignment and a variety of translations, we can explore the semantic range of a Strong’s number. Here we have dunamis. This seems like a reasonably good approximation of its definition in English.

  • Identifying translation similarity. This slide explores how structurally similar translations are to each other, based on the phrase clusters we produced. The results are pretty much what I’d expect: translations that are derived from each other tend to be similar to each other.

  • What I’ve just described is one pretty basic approach to what I think is inevitable: the explosion of translations into Franken-Bibles as technology gets better. In the future, we won’t be talking about particular translations anymore but rather about trust networks.
  • To be clear, I’m not saying that I think this development is a particularly great one for the church, and it’s definitely not good for existing Bible translations. But I do think it’s only a matter of time until Franken-Bibles arrive. At first they’ll be unwieldy and ridiculously bad, but over time they’ll adapt, improve, and will need to be taken seriously.

What Twitterers Are Giving up for Lent (2013 Edition)

February 16th, 2013

The top 100 things that people on Twitter are giving up for Lent in 2013.

This year saw a lot of churn in the top 100 things people were giving up for Lent.

The pope announced his resignation on Monday, leading many to say that he was giving up “being pope” for Lent. It came in at #1. (Related, at #18, people said they were giving up “the pope” for Lent.)

Specific social networking sites like Twitter and Facebook generally dropped this year, with the generic term “social networking” (#4) taking over as a catchall. Instagram (#10), Pinterest (#52), and Snapchat (#78) were all new to the top 100.

With Valentine’s Day falling on the day after Ash Wednesday this year, it came in at #13. My wife suggests that the timing may also have contributed to the drop in “chocolate” from #2 last year to #17 this year. “Valentines” is #97.

“Horse meat” (#20) refers to the ongoing European scandal.

The only celebrity to make the list was British boy band One Direction, up substantially at #41.

I learned several new words this year: “twerking” (#34), a type of dance move, “selfies” (#46), or self-shot photos taken with a phone, “subtweeting” (#57), or tweeting about someone without mentioning them by name, “oomf” (#71), or “one of my [Twitter] followers,” and “Nando’s” (#76), a chicken restaurant.

This list draws from 263,000 tweets from February 10-15, 2013, and excludes most retweets.

Rank Word Count Change from last year’s rank
1. Being pope 5,654  
2. Swearing 4,944 +1
3. Soda 2,648 +2
4. Social networking 2,264 +19
5. Alcohol 2,217 -1
6. Chips 1,690 +8
7. Virginity 933 +23
8. Marijuana 784 +17
9. Fast food 776 -2
10. Instagram 755 +270
11. Twitter 672 -10
12. Cookies 643 +19
13. Valentine’s day 514  
14. Masturbation 510 +18
15. Takeout 465 +59
16. Sweets 444 -7
17. Chocolate 417 -15
18. The pope 394 +10,224
19. Facebook 380 -13
20. Horse meat 375  
21. Junk food 362 -8
22. Smoking 355 -3
23. My swag 331 +373
24. Desserts 325 +21
25. Life 325 +40
26. New year’s resolutions 313 +47
27. My boyfriend 309 +99
28. Catholicism 255 +11
29. Straightening my hair 228 +89
30. Fried food 225 +5
31. Netflix 216 +255
32. Work 216 -5
33. Sobriety 213 +4
34. Twerking 185 +698
35. The playoffs 184 +3,556
36. French fries 173 +19
37. Coke 168 +1
38. Feelings 168 +207
39. Laziness 160 +28
40. Meat 158 -30
41. Onedirection 155 +103
42. You 154 -24
43. Procrastination 153 +1
44. Makeup 150 +16
45. Internet 149 +61
46. Selfies 149 +2,328
47. Exercise 144 +58
48. School 141 -36
49. My phone 135 +15
50. Classes 129 +84
51. Dip 127 +132
52. Pinterest 125 +133
53. Church 124 +33
54. Emotions 122 +397
55. Going to school 119 +163
56. My girlfriend 111 +207
57. Subtweeting 110 +253
58. College 106 +5
59. My face 106 +4,168
60. Ice cream 106 -27
61. McDonald’s 102 -32
62. Being ugly 101 +256
63. Snacking 99 +19
64. Spending 96 +89
65. Dunkin Donuts 96 +475
66. Chew 95 +418
67. Eating out 94 +28
68. Elevators 94 +99
69. Food 93 -47
70. Moaning 93 +123
71. Oomf 93 +78
72. Chick Fil A 90 +135
73. Healthy food 88 +180
74. Football 87 +145
75. Swimming 87 +200
76. Nando’s 86 +72
77. DVDs 84 +1,326
78. Snapchat 84  
79. Broccoli 83 +206
80. Ranch 81 +250
81. The snooze button 80 +176
82. Crystal meth 80 +219
83. Dignity 79 +116
84. Cake 77 -13
85. Unhealthy food 77 +34
86. Homework 76 -65
87. Busyness 75  
88. Schoolwork 74 +88
89. Chemistry 74 +34,949
90. Frozen yogurt 72 +480
91. iPhone 72 +100
92. FIFA 71 +143
93. Betting 70 +315
94. Doing homework 69 +158
95. Myself 68 +267
96. Supermarkets 67 +1,797
97. Valentines 66  
98. Domino’s 63 +323
99. Being negative 63 +212
100. Hookah 63 +340

Categories

Rank Category Number of Tweets
1. food 11,642
2. habits 8,083
3. religion 6,519
4. technology 4,782
5. smoking/drugs/alcohol 3,928
6. sex 1,771
7. relationship 1,399
8. health/hygiene 1,270
9. school work 1,095
10. irony 792
11. sports 648
12. entertainment 392
13. celebrity 246
14. clothes 235
15. money 133
16. shopping 111

The image is a Wordle.

Track What People Are Giving Up in 2013 for Lent in Real Time

February 11th, 2013

See the top 100 things people are giving up in 2013 for Lent on Twitter, continually updated until February 15, 2013.

As I write this post, with about 5,000 tweets analyzed, the new hot topics so far this year are: meowing, Valentine’s Day, and Snapchat.

Look for the usual post-mortem on February 16, 2013.

Street View through Israel

January 17th, 2013

Google yesterday announced that they’ve expanded their street view functionality throughout Israel, including a number of sites that you’d visit on any tour of the Holy Lands. Of particular note are the archaeological sites they walked around and photographed. Here, for example, is Megiddo:


View Larger Map

Previously available were many places in Jerusalem, like the Via Dolorosa. But the new imagery covers much more area–I can imagine it being particularly useful in Sunday School and classroom settings, where a semi-immersive environment communicates more than static photographs.

Via Biblical Studies and Technological Tools.

Bible Reference Parser Code Update

November 20th, 2012

The semiannual release schedule of the Javascript Bible Passage Reference Parser continues. This release:

  • Improves support for parentheses.
  • Adds some alternate versification systems.
  • Supports French book names.
  • Removes the “docs” folder because it was getting unwieldy; the source itself remains commented.
  • Reorganizes some of the source code.
  • Increases the number of real-world strings from 200,000 to 370,000. I ran the parser on all 85 million tweets and Facebook posts in the Realtime Bible Search database to produce the list.

One of the main goals of this parser is to give you a starting point to build your own parser, so the source is thoroughly documented and has many tests you can use to validate your code.

Try a demo or browse the source on GitHub.

Zoomable Map of the Greco-Roman World

October 4th, 2012

The Pelagios Project has produced a lovely zoomable map of the Greco-Roman world. Below, for example, is a static view of Israel during the Roman period.

A blog post about the map discusses how they created it (plus bonus technical background). I’m most impressed by how attractive the maps are—a lot of online maps present you the data but don’t try to be beautiful; this map succeeds on both counts.

More generally, the Pelagios Project, which I admit I hadn’t heard of before today, incorporates linked data to help people study the ancient world. It encompasses a variety of efforts (need to search for an inscription from ancient Palestine? No problem)—it’s all fascinating.

Terrain-shaded map of Roman Palestine (Israel) showing topography, cities, roads, and other features.
(Note the “Mortuum Mare” instead of the “Dead Sea.”)

Via O’Reilly Radar.