Bible Codes: Making an Algorithm to Find Hidden Word Sequences in Text

7 min readMar 19, 2019

Hello, and welcome to the first installment on my path to 100 tech projects!

For this post, I created an algorithm to find equidistant letter sequences found in text, similar to the algorithm written to search for codes in the Bible.

A Brief History of the Phenomenon

The idea of certain codes hidden in the Bible is not a new one. While studying the Hebrew Bible (Torah), a group scholars in the early 20th century noticed that, starting at one letter in the books, and moving some X letters forward, then forward again, those letters combined made a different word. They found these patterns out of pure chance, and began hand searching spatially close passages of text for more codes. This is a very tedious process, and the pure algorithmic complexity of the task meant that these early researchers could not explore all of the codes hidden in the Bible.

In 1994, an interesting paper named Equidistant Letter Sequences in the Book of Genesis appeared in Statistical Science. The effort, led by Doron Witztum and Prof Eliyahu Rips, two Observant Jews and accomplished mathematicians, revealed to the scientific community that there are sequences of words in the Bible that are both located near each other and are also topically related. The paper was rejected a few times before publication, with the journal’s approval board skeptical of such content holding up to statistical significance and the incredulity that any book can be so carefully crafted as to hide such intricate codes.

Fast forward to 1997, when author Michael Drosnin released the first book in his Bible Codes series. With the book gaining major media attention due to its controversial nature (religion might be real? there’s no way any book could have so many codes and not be a coincidence — did a divine entity create the bible?), people everywhere became interested in the topic.

My Introduction to Bible Codes

I first heard of the Bible Codes while attending a lecture on religion and philosophy. At first, my logical brain thought: “yep, there’s bound to be any sequence of codes like this that one may find in any text in any language .. there are only so many permutations of letter combinations, and thus with enough letters, you can find literally any word or name.” I debated this point to the lecturer, who then responded with the fact that certain words and names appeared near each other with relatively low stride counts (number of unimportant letters between letters in the code). This was interesting, I thought. He then gave an example of this, the same one appearing on the front page of Drosnin’s first book: the name ‘Yitzhak Rabin,’ the word ‘assassin,’ and the name ‘Yigal Amir’ all appear in the code, and all near each other. For context, Rabin was a former Israeli PM who was assassinated in 1995 by Amir while giving an inspiring speech on the progress of the nation.

After that lecture, I became very interested. I started reading more into the history of the codes, and read Drosnin’s book in exactly two sittings. I then attempted to write an algorithm a la the one described in the original paper. Unfortunately, I gave up early into the design of the algorithm .. my Computer Engineering underclassman brain was not up to the snuff of actually coding such an algorithm. A few years later, I gave it another go, and was successful.

Notes on the Algorithm

The presentation of the algorithm must be prefaced with a few notes.

Number 1 is that there a few ways to write an algorithm for this task. My version is only one of them, and while there is something to be said about designing for optimality, I did not try out different versions and time them for efficiency.

Number 2 is that the original algorithm operated on the Hebrew language — specifically biblical Hebrew. Hebrew, like Arabic and other semitic languages, is an abjad, meaning that the characters are all consonants. Vowels are placed hap-hazardly between them to make different words. For example, the Hebrew word מלח means both salt (melach) or seafarer (malach) based on vowel placement. In modern written Hebrew, vowels are often not written at all, leaving it up to the reader to fill in the ambiguity through context. This ambiguity makes it statistically easier for a language like Hebrew (with only 22 characters) to yield more words. Adding to this, Hebrew words tend to be shorter than other languages, owing to the root structure (shorshim) for common words.

The Basics of the Algorithm

As one who has rudimentary knowledge of algorithm design can already tell, the complexity on this problem is pretty high. A lot of for loops.

Let’s start with an example. Take the sentence: ‘One small step for man, one giant leap for mankind’.

Step 1) Get rid of spaces, case, and colons, commas, quotation marks, etc.

onesmallstepformanonegiantleapformankind

Step 2) Choose a stride length and split up the original text based on that.

With a stride length of 2, and starting position of 1:

o*e*m*l*s*e*f*r*a*o*e*i*n*l*a*f*r*a*k*n*

oemlsefraoeinlafrakn— here we see the word ‘in’!

Note that we only get half the characters of the original text here. So, we have to start at the second character to complete the round.

With a stride length of 2, and starting position of 2:

*n*s*a*l*t*p*o*m*n*n*g*a*t*e*p*o*m*n*i*d

nsaltpomnngatepomnid— here we see the words ‘salt’, ‘pom’, ‘gate’, ‘omni’, and ‘id’!

Step 3) Go back to step 2, and increase stride length on each iteration

With a stride length of 3, and starting position of 1:

o**s**l**t**f**m**o**g**n**e**f**m**k**d

osltfmognefmkd— here we see no words!

With a stride length of 3, and starting position of 2:

*n**m**l**e**o**a**n**i**t**a**o**a**i**

nmleoanitaoai— here we see the words ‘leo’, ‘an’, ‘it’, and ‘tao’!

With a stride of length of 3, and starting position of 3:

**e**a**s**p**r**n**e**a**l**p**r**n**n*

easprnealprnn— here we see the word ‘alp’!

Algorithm Results and Analysis

I implemented this algorithm in python using jupyter notebook. I used a dictionary of ~3800 valid English words to determine if a sequence of letters was a valid word or not. Full code and data can be found at this project’s github repo.

For a quick test, I used the first paragraph of Siddhartha, by Herman Hesse (one of my favorite books!). The text itself is 927 letters long. We would think it should be relatively short to run the algorithm on this text, however, after running the text through once, then again in reverse (yes codes work in reverse), we see that it takes ~210 seconds. 3.5 minutes for one paragraph! This leads us into a discussion of the algorithmic complexity of the algorithm. Looking at the layers of our algorithm onion, we see that it is roughly O(m*n³), meaning it has cubic complexity as a lower bound (if anyone can analyze my algorithm and prove this differently, please do). This complexity is exponential, meaning that every doubling of the input text equates to a roughly 8x increase in computation time. Brute force, baby!

Overall, we found 622 distinct words greater than two letters in length. The longest of these was ‘another’, at 7 letters. The most common words were ‘the,’ ‘tie,’ and ‘tea,’ coming in with 195, 185, and 183 appearances each.

The chart below shows the frequency that each word was found, with respect to their length. Interesting that four-letter words were found the most! (Might be an artifact of the 3.8k-word dictionary’s word lengths).

The graph below shows roughly the amount of time each outer loop of the algorithm took. The two peaks are due to the forward and reverse passes of the text, and the asymptotic curves are due to the algorithm’s gradual winding down as the strides increase.

Conclusion

I presented an algorithm to search for all ‘equidistant letter sequences’ in a given text. The algorithm got the job done, and the results are pretty interesting! I hope you give it a try on any text you are interested in. Note, this algorithm _should_ work for different languages — simply find a suitable dictionary for your language of interest and swap it with the english one in the repo.

I hope you enjoyed this post!

As always, I’d love to hear your comments below!

Post Scriptum

Right before publishing this, I did a quick google image search for ‘bible codes.’ I found an app that was a much better engineered version of mine (with word search nonetheless! — searching for codes would entail a slightly different algorithm than the one presented). Apparently, you can really find anything you want and use it to fuel conspiracy theories, a la this guy.

Bible Codes: Making an Algorithm to Find Hidden Word Sequences in Text

Written by Felcjo Ringo

Responses (2)