Rainbow tables from a wordlist?

Recommended Posts

I have a nice wordlist with something like 1.2 billion entries. It's about 13.5gb and it takes a long time to run dictionary attacks.

How do I create a procomputed table from a wordlist?

I'm willing to make this wordlist and any tables I generate available for download.

What are the steps here? I've been looking around and I've been only found articles about creating tables for character sets for certain lengths or strings. I just want to use the list I have to make a variety of tables for cracking.

Share on other sites

What are you planning to crack? Be specific.

You see, a rainbow table only works when you run an algorithm from a fixed state over your word. Most algorithms these days allow you to seed them so the initial state isn't fixed. The problem then is that you can't precompute anything without knowing that initial state and even if you did know the initial state, it should be sufficiently random to make it nonsensical to do all that work in advance for the one-off chance of having the password in the list. This is why so much work is put into making the list contain actual, real passwords people are known to (re)use - precomputing is, and you have no idea how old I feel saying this, sooooo 2004.

To summarize, unless the thing you're hoping to crack is, say, a database load of encrypted passwords of which you know for a fact they all use... let's go for broke and say MD5 to encrypt the provided password, it's effectively impossible to precompute anything.

Share on other sites

To add to the other Cooper's answer, here's the basic theory of rainbow tables:

```Seed your current hash with a known value.
for 1 to X {
Store the current hash in the rainbow table
for 1 to Y {
Generate a password from the current hash
Hash the password and set the result to be the current hash.
}
}
Store the current hash in the rainbow table
```

The X value controls the number of hash chains you want to include in your rainbow table and the Y value controls the lengths of those hash chains.

To decrypt a hash with the final rainbow table you take the hash, generate a password from it and then calculate that password's hash. If this new hash isn't present in your rainbow table then repeat the process. Keep repeat untill you either get a hit in your rainbow table or you have repeated it more than Y times. If you've repeated the process more than Y times then you haven't got the password in your rainbow table.

Assuming that you got a hit from your rainbow table then you simply take the previous entry to the one you found in your hash table and start the process of generating passwords from the hash, and then a hash from the password. As each hash is generated you compare it against the hash you want to decrypt and when you hit it you'll have the password that generated it.

As you can see rainbow tables are very different to a precomputed hash value look up table. The rainbow table takes up less space but at the cost of having to do more calculations when decrypting a hash value.

Now the real issue here is that both rainbow tables and a precomputed hash value look up table are only worth anything if the hash doesn't include salt (or only includes very small quantities of salt). A salt is simply a random value that is stored with the hash and used to alter the way the password is hashed (e.g. added to the end of the password before hashing). If a salt is used you would need to precompute every hash + salt combinations. Note every bit of salt added doubles the number of the hashes you would need to precompute and store, so even a small about of salt can render precomputation impractical.

Are rainbow tables completely worthless these days? No, but once you've generated ones for the usual hashes without salts (md5, sha1, sha256, etc.) then you're done, and I suspect you will rarely use them.

Share on other sites

You know what? This is the first time someone explained the difference between a hash table and a rainbow table in a way that I can wrap my head around. I read the code for creating rainbow tables back in the day and I never got the concept of 'in case of a hit, take the previous entry' but it totally makes sense now. Thank you sir!

Share on other sites

So am I to understand that a rainbow table is much more than plaintext hash pairs binary form?

Also I'm not understanding what initial state is. Can you clarify this? Any recommended reading?

Mostly doing this to understand how it works.

So basically what I want to do is create an md5 or sha256 table from a wordlist.

I'm also happy to do some coding. What is a barebones program for rainbow tables?

I was thinking this would be pretty straight forward like opening a file stream and reading through a text file line by line and running the hashing algorithm against each line and outputting the plaintext : hash pairs to a file. I'll apply this to something more advanced and possibly useful when I have the general concept down.

Share on other sites

So am I to understand that a rainbow table is much more than plaintext hash pairs binary form?

(Note any figures in this post aren't exact, the're just approximations)

That's correct. A plaintext and hash look up table can take up a lot of room as each hash will be 128bits (md5) or 256bits (sha256) in size. That would result in your look up table for md5 taking up about 50GB and the sha256 version taking up about 100GB. These days this sort of storage size id available on SSD's so you could create a very fast look up table your word list, now back when rainbow tables were popular storage space was a lot smaller, and 50GB was a significant amount of space to sacrifice for a simple look up table.

Now consider what would happen if instead of just your word list you wanted to cover 90% of possible 8 character passwords consisting of numbers, uppercase and lowercase letters? Well for an md5 you'd be looking at 3,000TB or space requirement and for sha256 you'd need about 6,000TB, even with today's storage surplus we can't get near to that (government agencies probably have the resources, but your average technical person won't).

With a rainbow table you can set your chain length to be something your computer can calculate in a set period, let's we can calculate a chain with a length of 1,000,000 in a minute, so we choose a chain length of 1,000,000. Now the rainbow table will only need to store an one entry for every 1,000,000 hashes. This brings our space requirements for md5 down to a tiny 3GB and sha256 down to 6GB at the expense of taking up to one minute per hash being cracked.

Also I'm not understanding what initial state is. Can you clarify this? Any recommended reading?

The initial state is simply that when using a rainbow table to crack a hash you need to know what value to start your hash chain with. The wikipedia entry for rainbow tables is a good place to start.

As to where to start, I would suggest that initially you look at creating a simple password hash pair lookup table and try it with subsets of your dictionary. Once you've got that done you have a good understanding of the limitations and then you can revisit rainbow tables.

Share on other sites

Regarding initial state, look at the rand() function in C.

```#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf("%d\n", rand());
printf("%d\n", rand());
printf("%d\n", rand());
}```

If you compile and run this you'll find that you get the _exact same_ set of numbers every time. This is because the pseudo-random number generator starts from a known state (all zeros) which also means that if I can guesstimate how many randoms were requested from the generator, I don't have to try whatever algorithm that is using this random for every possible random it can produce, but a very small subset of that. This is what the Pixiedust attack exploits.

Let's change this program to vary the initial state of the PRNG:

```#include <stdio.h>

#include <stdlib.h>
#include <time.h>

int main(void) {
srand( (unsigned int)time(NULL) );
printf("%d\n", rand());
printf("%d\n", rand());
printf("%d\n", rand());
}```

Now the initial state of the PRNG is set to the number of seconds since epoch (1-1-1970) meaning that if you start the program twice within a second you still get the same values, but if you wait at least a second between invocations you'll get a different set of numbers every time. For simple stuff this is good enough but if you're working on something important you're better off with a less predictable source because if this is for your AP and I somehow know you reset your AP... Maybe I get lucky and there was a power blackout that lasted until somewhere between 12 and 18 and nobody's gotten home since then so no randoms were yet taken from the PRNG. That means with the above algorithm I can request a random from your AP as part of the WPS connection phase, discover the initial state by setting it to each possible value for that timeframe (6 hours is just 21600 seconds - your laptop can chew through that in milliseconds) and then use this knowledge to attack the hashed PIN that is sent to me (because yes, in WPS, it is) which takes just 11000 tries and hey-presto. I now know the WPS PIN number. Again, Google the Pixiedust attack to see this in action. User DataHead on this forum (co-)wrote a program that does just that for various PRNGs in popular APs.

Say someone did a hack on a website and got the users database with hashed passwords in it. If all passwords were just hashed with a known algorithm, someone could see if there are entries in there where multiple users have the same hashed value which means they all used the same password and it's probably something ludicrously simple. Also, someone could simply compute the hashes for his wordlist and see which ones match. If the person who wrote the code for this website was clever enough to seed his hash with something like the MD5 of the user's email address, identical passwords would be extremely unlikely to produce the same hash in the database, and to crack it using a wordlist someone would have to compute the hashes for his entire wordlist, preseeded with the MD5 of the specific user whose account they're trying to break into. Since this computational work would be specific to this 1 user on this 1 system, the chance of being able to reuse the computed data is effectively 0 so it doesn't make sense to store/retain these hashes. Just crack on the fly and hope you get a hit. This is the approach people currently take when cracking hashes and why so much work is put into making the computation itself as fast as possible.

Share on other sites

• 1 month later...
```#include <stdio.h>

#include <stdlib.h>
#include <time.h>

int main(void) {
srand( (unsigned int)time(NULL) );
printf("%d\n", rand());
printf("%d\n", rand());
printf("%d\n", rand());
}```

So without:

```srand( (unsigned int)time(NULL) );
```

The first code example would continue to produce the same output? Hence the hashes would be pretty easy to crack if you knew the initial random number. OK this is starting to make sense.

I had to read a book on crypto and come back to this. I remembered this thread because I'm at this spot a Python tutorial with random numbers. All of the cool kids were some Python scripts and I felt left out so I resolved to learn the language. Mostly I was curious about Veil-Evasion which also uses some random functions for code obfuscation.

Is seeding salting the same? Or is does salting happen later in the hashing function?

So I'm thinking to crack a bunch of passwords it would be ideal to have to hashing function used to create them or be able to recreate it. If it were a website a person could create a user account before dumping the database and use the hash from the his/her user account to help determine what operations were done against the plaintext because the plaintext and the hash values are known. But that still seems like a lot of work. Am I far off here?

What are some other elements to consider in secure hashing or cracking stronger hashes?

Share on other sites

Seeding is something you do to random number generators to give them their initial state. Salting is something you do to your hashing algorithm to ensure you get a context-specific sequence. Like what I explained above (using the word 'seed' where I should've used 'salt'... doh!). When you store the hash for (say) a user's password, you need to initialize the hashing algorithm and be able to repeat those steps exactly the next time you log in. So a hashing algorithm, like a PRNG, has an initial state (presumably all zeroes again, but there's no guarantee that's the value plus it doesn't matter really what the value is so long as it's the same initial value every time). Because a PRNG uses the last-produced value as a source for the next value to produce, they're calling it a seed. With a hashing algorithm, the state gets thorougly garbled with each subsequent character being hashed so rather than the initial state being a seed from which new values are sprouted, the initial state is something to give a bit of flavor to your hash. Hence: salt.

Note that when you store a salted password hash, you need to store the hash AND the salt so that when you come back for your next login the exact steps can be repeated. Using the md5 of a long-ish value you already have stored on someone which isn't allowed to change without also demanding the user provides his password to accept the change makes it easier to keep track of things as well as provide a bit of security by obscurity since it's not obvious based on the stored data alone which bit of information you used for the salting.

Share on other sites

So I'm thinking to crack a bunch of passwords it would be ideal to have to hashing function used to create them or be able to recreate it. If it were a website a person could create a user account before dumping the database and use the hash from the his/her user account to help determine what operations were done against the plaintext because the plaintext and the hash values are known. But that still seems like a lot of work. Am I far off here?

Most password hashes stored in a database will be using a standard hashing algorithm (SHA-512, SHA-256, MD5, etc. Note MD5 should be considered a poor choice these days) and will either indicate it in the hash some how. for example in a Linux shadow passwd file hashes starting with \$1\$ are using MD5, those starting with \$5\$ are SHA-256 and those starting with \$6\$ are SHA-512.

Failing the hashes indicating what they are then the next logical place to look would be the source code, finding the line that says

```hash=sha256(password+salt);
```

is much easier and quicker than trying to figure it out from a plaintext/ciphertext comparisons.

This leads onto the question of why would developers make it so easy for you to identify the hash being used, the answer is simply that it shouldn't matter that you know what type of hash is being used. The algorithm isn't the secret part, the password is. If the password it too simple or short then the hashing algorithm used is irrelevant as it will quickly be brute forced.

On the other hand, if the password is long and complex enough then cracking it would take too long, even with knowledge of the hash, salt and and hashing algorithm. Once the need to keep the algorithm used secret is dismissed as worthless, then actually indicating the type of algorithm used gives the developers power to respond to issues that arise. E.g. If an algorithm is becoming a security concern then you can add a modern hashing algorithm to be used when setting new passwords and keep the old algorithm for verifying older accounts, effectively phasing out the old algorithm over time. You could even update the users hash to new algorithm then next time they successfully login.

Share on other sites

So a salted hash is going to be something like this?

```hashfunction(password + salt)

hashfunction(salt + password + salt)
```

Or something like that? So we are just appending some other characters to the string before running the hashing algorithm against it.

So really to brute to a strong hash you need to brute the plaintext, the salt(s), and the initial state? Is that pretty much it in a nutshell?

So if you have know the plaintext of a given output you can try salts and initial states until you get the desired output?

I like the idea of the email address being a salt rather than each hash having the same salt that sound a little tough to break. Or I guess anything like that. But I think non-personal information would make it more secure. Sites like pipl. I have done some skip tracing with pipl and I can usually find somebody's d0x in about 15 minutes. (Find people to serve court papers. Not target reconnaissance but that would be just as easy.) As far as asking your mother's maiden name, pet's name, city you are from, email address, that stuff isn't any good because a lot of times you can find that kind of stuff on a person's facebook or LinkedIn profile. But that would still be more secure than just hash(password).

Maybe something like:

```string = password;
string hash = sha512(password + dateofbirth)
string hash1 = sha512(hash + currentcity)
string hash2 = sha512(hash1 + mothersmaidenname)
print sha512(hash2 + todaysdate)```

or

```string = password;
string hash = sha512(password + randomstring)
string hash1 = sha512(hash + randomstring)
string hash2 = sha512(hash1 + randomstring)
print sha512(hash2 + randomstring)
```
Share on other sites

So really to brute to a strong hash you need to brute the plaintext, the salt(s), and the initial state? Is that pretty much it in a nutshell?

Indeed. Although in general the initial state is known (all-zeroes).

Your last 2 code examples are basically:

```print( sha512( pwd + dob + city + momname + dateoftoday ) );
print( sha512( pwd + random + random + random + random ) );```

Now you should understand the reason for making and salting your hashes in the first place. You create a hash of something when you're not actually interested in the input itself, you just want to verify that it's the same thing the user provided last time. The benefit is that if somehow someone gains access to your dataset, that someone won't be able to read from that data the input used during the verification process. Enter brute forcing and hash-collisions: we don't know anything but we have all the time in the world so in an offline attack, after having extracted the data, we simply try just about any combination of characters and see if it matches. If the hash employed is nothing more than the hash of a word, then for each hash produced in the brute-force attack the ENTIRE database can be checked, meaning the chance of ending up with either a match or a collision (which is just as good) for any given user just shot up.

To prevent this, several approaches have been taken. The best and easiest is to salt the hash: from the initial state, run a long-ish sequence of characters that is unique to your user through the hash function first and only then the user input. Now an attacker needs to figure out what you used for this salting, which is a bit of security by obscurity right there. You could hardcode an initial sequence of characters to run through the hash, then push through the user-specific sequence and then the user input. Now the attacker needs to have access to the data, know the algorithm you're using AND find that initial sequence in your code. It's certainly not impossible for him/her to get all that once your box is hacked, but downloading the application code, assuming the hacker's access is sufficient for that, is probably not the first thing that comes to the hacker's mind.

You could 'enhance' your hashing methodology by for instance doing repeated runs of the hashing algorithm, but be aware that only a few are specifically designed for this whereas most get weaker on each iteration. You could 'enhance' it by running more data through the hashing algorithm, but the question then becomes what you hope to gain? By adding a user-specific item, preferably constant and long, you achieve that the hash becomes user-specific so the attacker would need to target a specific user and invest substantial resources trying to break that one hash. By adding a hardcoded constant, or even some sequence from a file that gets populated with a random sequence during installation, you now force your attacker to find that needle in the haystack. If all that your hacker has is database access by virtue of a SQL injection problem, (s)he will be missing a vital piece of information and as such won't be able to start the brute-force attack. If the hacker has full access you're still slowing him/her down.

Feeding more data through the hash function that the hacker reasonably has access to when the hacker can access the password hash doesn't gain you anything over what you currently already have. You just spend fractionally more time verifying the hash and (s)he spends fractionally more time brute-forcing it. And the big downside for you is that you need to retain ALL that data to verify a user is who (s)he says (s)he is.

On the systems I make, we typically use a GUID for the user id. That's a PERFECT value to use for salting - the user never sees it, it's fairly random and long but not too long.

Share on other sites

Are there some programs that will attack salts from a dictionary file? Something like:

```cracker -w <wordlist> -s <saltlist> -h <hashfile>
```

If the salt were something like a date of birth or a telephone number it would be trivial to attack that along with the.

Running a dictionary attack against the password and salt concurrently would require two open filestreams and some kind of nested looping structure.

So it would work something like this pseudocode:

```open wordlist;
while wordlist is open read line
for each line
open saltlist;
while saltlist !=eof
if sha512(word+salt) = hashfromhashfile
print word
print salt
print hash```

So the program would essentially read through the wordlist and for each iteration of the wordlist it would open the salt file and iterate through the salt file appending each salt to the current word and creating a compound word + salt for every possible word + salt combination. Conceptually it's not difficult to code. But the compute time.

Edited by vailixi
Share on other sites

The problem is that more often than not the salt is in fact something more or less random. Trying a wordlist for that is hoping and praying for the ridiculously unlikely. For each salt you repeat your password wordlist which makes it an INSANE amount of guessing attempts. Since SHA512 is already a pretty kick-ass hash function I wouldn't be surprised if the used salt is stored in the DB as-is. It would be produced from a pretty decent (P)RNG and, given the use of SHA512, will probably be a 64-byte binary, stored using MIME or some such.

Share on other sites

• 3 weeks later...

Really good point on changing the database. I've been making this way to complicated.

Share on other sites

• 9 months later...

Sorry to reopen an old topic as my debut here, but I feel this is probably the appropriate location for my question.

I have forgotten the password to a laptop that is more than 14 characters. Ophcrack's default rainbow tables won't do the job.

I know from memory the password contains words of fruits and possibly a few numbers, that is all.

I also know I can remove the password and reset the admin account using another program but am now far past the point of reasonableness and too curious, because this password was not difficult to recall.

What do you expert hackers recommend to crack it?

Thanks

Share on other sites

You felt wrong since the best approach would've been to make a new topic. Regardless, did you check out Crunch? Here's a page describing its use.

Share on other sites

You felt wrong since the best approach would've been to make a new topic. Regardless, did you check out Crunch? Here's a page describing its use.

Thanks for the link. I suppose the next logical step after generating a word list with Crunch is using a secondary program to convert it to a rainbow table for Ophcrack to read. Do you recommend a way for this?

Share on other sites

My guess is something in the ophcrack 'suite' will be able to convert a word list into an appropriate rainbow table.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.