Jump to content

Library of Babel


Recommended Posts

Hi all.

I recently read a story by Jorge Luis Borges called the Library of Babel. If you haven't read it, it's about a huge library. Each book of the Library consists of 410 pages, each one of 40 lines of 80 signs. The original character set includes 22 letters, dot, comma and space. The books are disposed in adjacent octagonal rooms, with a stairwell every few octagons. The special thing about this library is that it contains every single combination of 22 letters, dot, comma, and space, that are possible within the length of a book. In one of the books, there is an accurate 410 page summary of your life. In another, this post is written repeatedly for 410 pages. So I was thinking it would be fun to simulate this on a computer. Can anyone estimate how much storage it would take, in plaintext? I assume a huge amount, but 1-terabyte hard drives don't cost too much anymore. I was also wondering if there is an easy way to make a visual 3-d simulation of this. I was thinking of a simple, wireframe tower of single hexagons stacked on top of each other, and you could click the hexagon to enter it and scroll around the shelves, and click on a book to open its file. The books would have numbers on the spine. I was also wondering about a search function perhaps.


P.S. Here is a link to a pdf of the story if you want to read it: https://maskofreason.files.wordpress.com/2011/02/the-library-of-babel-by-jorge-luis-borges.pdf . you really should, it blew my mind for several hours straight.

Link to comment
Share on other sites

Within those constraints, you have:

25 signs

80 signs per line, meaning 25^80 unique lines =


20520927053718196475529111921787261962890625 lines

40 lines per page means (25^80)^40 unique pages =

421124750635840039425517034032964147627353668212890625 pages.

410 pages per book means ((25^80)^40)^410 unique books. It took my calculator about a minute to produce the result which is a number consisting of 1807193 digits so you'll forgive me for not posting it here.

Apparently it's a fact that there are 10^80 particles in the observable universe. That means that if each particle of the observable universe were to contain a unique line for such a book, you'd need approximately 5000000000000000000000000000000 times that amount of particles to represent all possible lines. Never mind pages and books.

So yeah, I don't think your terabyte harddisk is going to cut it...

Edited by Cooper
Link to comment
Share on other sites


I don't know if that's really cool or really disturbing. I guess it's not possible on that scale. However, if I were to modify it, the number of combinations can change. For instance, with 25 characters, 50 characters per line, 30 lines per page, and 50 pages per book, the number is "only" 104846 digits long. This still seems somewhat unreasonable, but a little less. What do you think are the largest parameters I could have, storing the books in plaintext, on a reasonable number of hard drives (not much more than 4)?

Thanks, and sorry about the overly-ambitious project.

Link to comment
Share on other sites

With 25 characters in your language and a line length of 9 characters you can store every possible line on 1 4TB harddisk.

If you want a 10th character, you'll need another 19 of those harddisks.

Link to comment
Share on other sites

Oh... That's a shame. I guess what I'm wondering is if there is any possible way to make this library, with say 10-page manuscripts. If I were to lower my standards and use a wordlist, would that help? But I guess that might take some of the fun out of it. A friend suggested generating the books on the fly, based an a seed associated with the file for each book. This way, each individual book would stay the same, and I could bookmark some. Of course, then the search function would take an extremely long time, as it would have to individually open and close each book, to search for a sequence. Do you know how I would go about trying this system? Also, do you have any idea how to create the graphics I described?

Thanks again!

Link to comment
Share on other sites

If we make the language have 4 characters, 3 characters to a line, 7 lines to a page, you can fit every possibly page on a 5 TB hardisk.
If we assume that we can somehow compress these books such that we get a 1:10 ratio (so only 10% of original storage is needed), we'd be able to have a language of 5 characters, 4 characters to a line, 5 lines to a page, you could fit every possible page on 2 5TB harddisks.

If we were to search for a book and the library was unsorted, we'd have to go through half of all the books in the library to end up with a 50% chance of finding the page. If we take that last scenario, so 5^4^5 pages, and assume it takes 0.001 second to assess if this is the page we need, then for a 50% chance of finding that page we'd need 1512 years plus change.
Assuming the library is sorted and accessible via and AVL tree. Everything else identical except for the fact that we require a match this time. It would take at most a mere 47 assessments to find the page, so a guaranteed hit in less than 0.05 seconds.
In case anybody didn't know, here's how the algorithm works:

In a sorted set of a known/fixed size where 'smallest' is 'left' and 'largest' is 'right', you look at the item in the center of the set. If this is the item you want, you're done. If it's not, but this item is 'smaller' than the one you need, you can discard the 'left' half of the set. If the item is 'greater' than the one you need, you can discard the 'right'. Repeat this on the remainder of the set until you find the item you need.

The way to visualise the set is almost certainly in the form of a fractal.

Edited by Cooper
Link to comment
Share on other sites

So there's no way to have each book generate randomly when you select from a set seed, then delete the file when you're done reading? I don't know if you've heard of the game No Man's Sky, but it has a huge amount of data and that, as I understand it, is approximately how they handle it. Assuming the 4 character language is the only way, is there a way to interpret 4 characters as a full alphabet (with combinations I guess?) Thank you for the AVL tree, although I'm not entirely sure how it would works, especially given that if I chose the random on-the-fly generation from a seed option, it would have to open each document.

Thanks for all the help!

P.S. What do you mean by a fractal to visualize the set of documents? How would it be organized?

Link to comment
Share on other sites

It would be trivial to interpret 4 characters as a full alphabet. Your computer has been doing it for years with a 2-character alphabet so yeah, combinations are the way to go.

In the situation of 4^3^7, so 4 char alphabet, 3 chars per line, 7 lines per page, and the set being sorted so let's say each page is numbered with a value between 0 and 4.398.046.511.104 which in binary can be represented in exactly 42 bits ((4^3)^7 = (2^6)^7 = 2^(6*7) = 2^42). The problem is that in order to find a page in this construct, you'd need to have the full page to begin with since the contents of the page is the only information of that very page. You could write the number of the page down in binary and every 2 bits is a character, every block of 6 bits is a line and the full 42 bits is the page. So by identifying the page, you've effectively created the page. And since you did that, why even look it up in the library or, for that matter, write it down?

If you were to store the page itself in full, then you'd need (2^42)*42 bits of storage.

So let's assume that each of these pages has an author, there was no logic applied when an author was chosen - some did 1 page, some did 10000, all completely random. There were in total 256 different authors (just go with me on this one) so each byte on that 4.4 TB chunk of storage identifies the author whose page number is the page and is also the index within the chunk of storage to the byte with the author id.

You know of a page and you need to look it up so you can see who wrote it (let's assume that the organization on the harddisk isn't such that you can just jump to the page using the page itself as an index). In terms of the AVL tree, since the organization implies that you can start in the center of the set, halve and repeat, you could find the page you want in exactly 42 steps or less because you can halve the set exactly 42 times before you end up with a set of length 1 which must be the page you're looking for.

You would visualize the original library in a fractal because a fractal infinitely repeats itself. And if you go by the original libraries' dimensions, you end up with something between 2^6084400 and 2^6100800 total books.

Doing an AVL search here you'd need, at worst, 6100800 iterations and if an evaluation on average takes 0.1 seconds you'd still need 7 days to find the book. Finding the book means comparing the full book (as the 'number' of the book is the book) which means that the 0.1 seconds to evaluate might be a bit low plus you'd once again wonder why you're even bothering to do this unless you really need to know something else about the book which isn't thus far described as being on the page.

You van visualize this search process as entering a chamber with 2 exit doors and 1 page. You compare the page against the one you need. If you got it, YAY! You're done. If it's a page that comes after yours (='larger') you take the left exit door otherwise you take the right exit door and in both cases you've wound up in an identical chamber as the one before, except that the page is now the 'center' page of the remaining set.

Now imagine moving from chamber to chamber like that, changing chambers 10 times per second, non-stop, for a bit over a full week...

Edited by Cooper
Link to comment
Share on other sites

Thanks so much for explaining all that! After much thought, I think I finally understand. I just a few questions. Assuming each 4-letter combination, is one character once interpreted, I still only have room for 24 characters. And assuming that a 4-letter combination is one character and each letter is 2 bits, I would have a total of 8 bits per character, and 42 doesn't divide by 8. Also, if I understand correctly, each "book" from the original library will be one page in the system you propose, and each book will only contain a few characters, which isn't that interesting to read (I was interested in getting long chunks). Would a possibility be to write a function that takes a random combination of books? Also, it sounds like in your search technique, you search for a 42-bit long number and after some searching around, the computer locates that 42-bit number somewhere in the system. Is there no way to search for a string (e.g. To be or not to be) and then get a list of the books that contain that string, one of which would contain "To be or not to be, that is the question"? Last question: how would I go about coding this and arranging my hard drive?


Link to comment
Share on other sites

You have an alphabet of 4 which means every 2 bits is a char and at 42 bits you have 21 of your characters on a page. You want to assume a 4-letter combination because you think that's what ASCII is, but it's not. Extended ASCII is 8 bit. The original is only 7. So if you stop viewing your alphabet on the page and instead map it to 7-bit ASCII you can very cleanly and efficiently represent 6 ASCII characters on a single page.

Indeed, 6 characters isn't much when viewed in context. This is in large part because of the overly loose definition of what a book/page is. Of the first 2097152 books/pages, the latter half of the page is all character 0 which is probably going be rather boring to look at. It gets worse - there are also that many pages where the first half of the page is all character 0. In fact, for each character in your alphabet there will be that many pages for which this is true. If we could somehow discard those pages beforehand we would need to address 4*2*2.097.152 = 16.777.216 less pages. If you were to place additional constraints on the concept of 'a book', making a book something other than simply 'letters on a page' you could do a lot more effective culling.

To search for a text on a page, you need to visualize it as [X bytes of whatever]To be or not to be[Y bytes of whatever]. If we take the length of "To be or not to be" as Z, the amount of characters in a book(/page...) as L and the amount of characters in the alphabet as C the formula for how long X and Y should be becomes X+Z+Y = L and since "To be or not to be" is exactly 18 in length it means that X+18+Y = L which becomes X+Y-L = 18

If L is, say, 21, X is something of 0, 1, 2 or 3 characters long where Y is 3, 2, 1 or 0 characters long, respectively. To compute the amount of pages with that unique sequence on it somewhere, you would thus compute (C^3)*4 which, for an alphabet of 25 becomes 62500 unique books/pages. If you were looking for pages that started with that string, so X would always be 0, you'd still have 25^3 = 15625 pages.

How to go about arranging this on your harddrive... Well, I already explained that there's no point in doing so since unless you have additional information to retain the index to the page is the page. And since you need the index to find the page, why store the page? You already have it. You need to further clarify what you would want to store and, thus, code.

Link to comment
Share on other sites

Thanks so much for explaining all this to me! I'm sorry if it's frustrating. Basically what the ideal definition of "book" would be for me would be at least the length of a page you might read in an ordinary book, or enough room to convey some meaning, plot or idea. The problem with that, I guess, is that it would be extremely expensive to save that much information, which is why I was clinging to the on-the-fly generation idea. What I meant by arranging and coding was basically, what do I do? Basically what I want to have is this huge number of "books", most with gibberish, some with something that is interesting to read. I want to be able to visualize them, be able to search for a sequence, or randomly click on one and see what comes up. Thanks again for explaining mathematically what I would have to do to achieve this. The part that I still have no clue about is how to execute this on a computer.

Thanks so much for all your help!

Link to comment
Share on other sites

This is where software development meets its evil and quite resilient foe: shitty requirements definition.

Pointing at something and saying "I want you to make me one of those" invariably results in conflict because you can't tell what part of 'one of those' is particularly relevant. I could code the concept of a book, but what was really required was the spiffy cover. Also, when you want to generate something that isn't just garbled text, you should specify language, so instead of an alphabet you need to specify a dictionary and grammar rules. That's a feat of its own right there. You then want to take it even one step further and specialise the requirements to not just generate any valid english-language (say) sequence of words that combine to a sentence, you want that sentence to convey abstract things like meaning, have some spiritual depth to it or elements of suspense. These are very abstract concepts so next to impossible to formalise. As an example I'd suggest you check out the movie Angel Heart where Robert De Niro elevates the peeling of an egg to something considerably more interesting. How would you formalise that?

Hollywood however is a very good example of restricting oneself to formulaic prose but at some point people wake up and notice the eerie similarities. If you were to create your library based on this, it would hold essentially the same book a few gazillion times - the only variation being the gender of the protagonist, the tool used in the imminent slaying of the friend and the unlikely yet succesful move at the end that produces a satisfying outcome for the couple at the end.

Bottom line, you're going to have to say a *whole* lot more about what you want before I can help you with how you would achieve that.

Link to comment
Share on other sites

Hi, sorry about the vagueness. When I said I want something interesting, I was thinking that it would just show up randomly, since there is a book for every combination of characters of a certain length. So there is no need to intentionally generate something for any definition of "interesting," only to generate enough books that many of them will have interesting parts. My ideal definition of book would be something long enough to convey information in English, say about the length of a page in a physical book you might read, so about 30 lines and 50 characters per line. That's probably not very realistic though, as it would be difficult to get that much storage. Is that enough information?


Link to comment
Share on other sites

Do you know of the Turing test? The idea is that a human would chat with something via text. When the human is unable to determine if what's on the other end is an AI or another human, but it really is an AI, that AI has passed the Turing test.

AI folks have been working decades on software to achieve this feat. According to that Wikipedia page, in 2014 one piece of software convinced 33% of humans it was human, which caused them to claim to have found the first actual passing AI (which got them a lot of flak since other bots in the past did better with their judges at the time).

Now, the bot in that 2014 Turing Test was under development since 2001. I think you'd prefer results in somewhat less than 13 years?

This bot's task is to communicate. What you want is not only something that can form a somewhat coherent sentence (=communicate), but to do so in a creative and inventive manner. No small feat for humans, given the amount of absolutely abysmal books I've managed to come across in just my years. If you were to reduce the length of the text to a single page, you would end up with something like poetry, which is possibly even more difficult. If you scale it back to a diary of sorts, you'd still need to convey creative thoughts. Those don't lend themselves particularly well to digital creation at the current state of technology.

So, no, you're going to have to provide a *HELL* of a lot more information.

Link to comment
Share on other sites

Sorry, maybe I wasn't being clear enough. What I'm looking for is every single combination of characters for a given length. Most books won't even have any real words at all, but since every combination of characters exists, some books with meaning will definitely show up, since all written information is just a sequence of characters. So I'm not looking to intentionally create coherence or meaning, only to let it occur randomly, in the middle of a huge amount of meaningless books that probably don't even contain any English words. Is that any better?

Thanks and sorry about the lack of clarity again

Link to comment
Share on other sites

Because the point of this is to have a huge bank of every combination. I could use a random number generator to get a number between 1 and 1 quadrillion, then write it out in binary and read the text, or inversely turn text into a binary number and then a decimal number, but that would be the same as just creating something and then being surprised it exists. I want what I get to be unpredictable. What I want is to be able to navigate around the library/fractal/whatever, click on a book, and read it, or search for a phrase and see all the books that exist for it. In other words, I want the book to already exist and me to find it. Would this not involve storing at least some amount of information?

Link to comment
Share on other sites

I guess what I want help coding is a) something to generate all the possible combinations of characters for a certain length (but I guess you said the way to do this would be to take the binary 1-however many combinations there are of the length and turn that into text?), b) a method to keep these combinations on my hard drive, c) a way to search through them for a certain combination, which could be in the middle of a book and d) the graphics to visualize my "library" of "books".


Link to comment
Share on other sites

a) That's indeed how I would do such a thing.

b) You shouldn't since to define it is to have it. There's no need to store it.

c) I explained that in general terms on the assumption that you're generating the text rather than searching stored data since you can generate faster than you can read data from a harddisk (by an order of magnitude) and you'd have to consider all possible combinations anyway.

d) Look at

and keep on looking at that 'horizon' if you will for the duration. The image comes towards you from a single point. Now watch the clip again with your face pressed so close to the screen that you actually see 2 of those points. Imagine that for every color change you pick either the left or the right point as an access to the area beyond. *THAT* would be how you visualize it.

If you want to present an image to a classroom, simply state that the amount of options is so vast, it's impossible to visualize. Or alternatively, show a standard graph, kinda like this. Then say "Imagine this line moving up continuing indefinately. The amount of books for this scenario would be the Y axis when we're roughly here (move *WAY* to the right) on the X axis.

Link to comment
Share on other sites

Great! So if I were to follow this method, how do I go about coding it (i. e. Which language do I use? Do you know any reading I should do?) I assume I need a script for turning text into binary and vis versa, a script to search for all the possible sequences with a given sequence in it, and a script to make the graphics. Unfortunately, I'm not sure how to write any of these.


Link to comment
Share on other sites

For short tasks, the best tool for the job is always the one you're familiar with. For bigger projects (months..) it can pay to figure out if learning a new language is feasible.

You're the one who's going to be doing this, so you get to pick your poison.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...