Bijective Function Domains on Words in the English Language

Posted on 2010-02-26

If you were to apply a bijective function to each letter in each word of a language (e.g. English), how many pre-existing words would you obtain in the resulting image?

Since that's a pretty convoluted way of explaining things, let's try a more concrete example.

We'll take the well-known rot13 substitution cipher (a simple example of a bijection between the set of letters in the English alphabet and itself), and apply it to every letter in a chosen word. For most words, the result will be non-sensical gibberish. There does exist, however, a subset of valid English words that map into other valid English words.

Example

rot13('sync') = 'flap'

How many of these words exist? To answer this, I wrote a small Python script that loads up the words in my system dictionary into a Redis set. Another set of the rot13'ed words is then stored, and the set intersection of the original and transformed words is calculated:

import redis

def encode(word):
    return word.encode('rot13')

def cleanup():
    db.delete('eng')
    db.delete('eng-rot13')

if __name__ == "__main__":
    count = 0
    db = redis.Redis()
    cleanup()

    for line in open('/usr/share/dict/words', 'r'):
        count += 1
        db.sadd('eng', line)
        db.sadd('eng-rot13', encode(line))
        if (count % 10000 == 0):
            print "Loaded %d words so far" % count

    db.sinterstore('eng-intersect', 'eng', 'eng-rot13')
    msg = "English dictionary contains %d words, and %d rot13'ed words"
    print  msg % (db.scard('eng'), db.scard('eng-rot13'))
    print "Cardinality of intersection: %d " % db.scard('eng-intersect')
    cleanup()

After a simple cardinality check, we have our answer: 256 words¹.

This result does not discard any single letter words (e.g. "a"), which will always trivially map into another letter when using rot13.