When people tweet about Bible verses on Twitter, what words do they use? Here are the top 100:
- bible
- lord
- christ
- gospel
- psalm
- god
- psalms
- corinthians
- preach
- shall
- heaven
- readings
- church
- spirit
- righteous
- verse
- lectionary
- verses
- spiritual
- ministry
- pray
- enemies
- thou
- tongue
- creation
- wisdom
- deuteronomy
- testament
- strength
- refuge
- therefore
- kingdom
- romans
- holy
- thankful
- thy
- reading
- rejoice
- understanding
- faithful
- message
- earth
- blessed
- exodus
- deut
- faith
- wise
- beginning
- pastor
- chapel
- chapter
- survey
- anger
- resurrection
- risen
- read
- hearts
- chronicles
- salvation
- flesh
- servant
- glory
- praying
- kings
- sheep
- praise
- trust
- prosperity
- bless
- heavens
- deeds
- toward
- discussion
- whoever
- speaks
- ye
- hath
- amen
- teaching
- thess
- apostles
- preparing
- eph
- eccl
- path
- fear
- upon
- presence
- inspire
- search
- zechariah
- seek
- teach
- wrath
- commandments
- believers
- humility
- spoke
- thee
- devo
Background
Extracting Bible references from text means identifying whether a given piece of text is referring to a Bible verse or something else. For example, the meaning of Acts 2 depends on context:
- Referring to Bible passage: Acts 2 recounts the early church.
- Not referring to Bible passage: She’s 5 years old but acts 2.
When you encounter a phrase that could be a Bible reference, you have to look at context to determine whether the phrase is a Bible reference. Humans can make this leap pretty easily, but computers need rigorous models and lots of training data to guess whether an ambiguous phrase is a Bible reference. In the above example, the phrase “early church” is a strong indicator that the phrase “Acts 2” is a Bible reference, while the phrase “years old” is an indicator the other way.
Twitter, with its high volume of content and decent search engine, provides lots of training data.
Methodology
Using the Twitter Search API, I downloaded 30,000 tweets possibly containing Bible references (e.g., [john 3], [jeremiah 29]) and then categorized them by hand as referring to a Bible verse or not.
I then ran a Naive Bayes algorithm on the resulting tweets to produce the above list, which contains the words that most strongly indicate the presence of a Bible reference.
This list suffers from sample bias, of course: a different set of tweets would produce a different list. In addition, the list is Twitter-centric; the results may not carry over into blogs or other media. (People substitute the number “2” for the word “to” and “4” for “for” on Twitter more frequently than they do elsewhere, for example, which oversamples content like “I’m meeting Matthew 4 dinner.”)
See It in Action
Search for Bible references on Twitter. Use the relevant and not relevant buttons to improve the filtering. I haven’t formally announced this new feature of OpenBible.info yet; consider the link a preview.
[…] Behind the scenes, it processes tweets to try to ensure their relevance; it has about a 92% accuracy rate based on a training corpus of around 45,000 tweets. […]