chriscook.uk

The Importance of Passwords

Christopher Cook

Despite over 40 years of computer systems development and recent advances in multi factor authentication systems, passwords still represent the primary authentication mechanism employed by the vast majority of online systems. Any cursory browse of news sites will illustrate just how fragile the current situation is with respect to online account breaches; barely a day goes by before another story breaks about stolen credentials, compromised accounts and widespread identity theft.

The reasons for these breaches are many and varied, but in most cases result in the user database being stolen and either published online or sold to criminal groups in underground forums. Once this database is in the wild, it is essentially a race against time to persuade users to change their passwords before the attackers break those passwords and gain access to the accounts.

As an end user of these systems you have little control over how your passwords are protected, or how well the system is protected from external breaches. In most cases however, you do have control of the password you choose. As we will see, this choice can make the difference between having your account compromised, or remaining closed to the attackers until you get to change your credential before they do.

Hashing vs. encryption

When you see a news article about account breaches, one of the first reactions for the company involved is to state that the stolen passwords were encrypted. The standard follow up then involves security researchers attempting to determine if the passwords were encrypted or hashed. The distinction between encryption and hashing is important.

Encryption is the process of taking a plaintext (in this case a password) and changing it to a form that is not readable, or decipherable (the ciphertext). This process involves the use of an encryption key, either the same key in the case of symmetrical encryption, or by the corresponding key of a key pair in the case of asymmetrical encryption. In both cases the plaintext can be derived from the ciphertext with the appropriate key. In other words, the process of storing the password becomes reversible.

Hashing is the process of taking an input of variable size, and calculating a fixed size representation of that input. The original input can never be determined from the calculated output. In some texts, this hashing process is referred to as a digest, because the original input, no mater what it's size is digested to a fixed sized output. Logically, if the input is of variable length (i.e. an infinite number of input possibilities) and the output is of fixed length (i.e. a finite number of output possibilities), there must exist an infinite number of inputs that result in the same output.

An aside: For a satirical view of such paradoxes, see Douglas Adams: The Hitch Hikers Guide to the Galaxy. "It is known that there are an infinite number of worlds, simply because there is an infinite amount of space for them to be in. However, not every one of them is inhabited. Therefore, there must be a finite number of inhabited worlds. Any finite number divided by infinity is as near to nothing as makes no odds, so the average population of all the planets in the Universe can be said to be zero."

There are indeed many inputs that correspond to the same output, and these are referred to as collisions. However, the important property of hashes in general is that they are repeatable, or deterministic; they will always give the same output for the same input. The output of most hash functions is extremely large, usually upwards of 256 bits in size, and when we use hash functions for cryptography they are generally referred to as cryptographically secure hashes. This means that although there must be many collisions on a particular input, they are statistically and computationally almost impossible to find. Given the size of the output space on most modern cryptographic hash functions, the statistical chances of a collision are extremely remote.

If passwords are stored using encryption, the fate of all stored passwords rests entirely with the secrecy of the key used to encrypt those passwords. Since the system must be able to decrypt the passwords when a user logs in, it must have access to the encryption key in order to perform this task. In many cases, this key will reside on the same system as the password that was originally compromised. If however the passwords are stored as one-way hashes, neither the system nor the attacker can derive the original password from the hash. There is an important design point here: the system should never know your plain password.

Assume the user chooses their password, and the system passes this password through a one-way hash function and stores the resulting hash value for future comparison. The system then immediately forgets the original un-hashed password. Since the stored hash value is non-reversible, it is not possible for the system to recover the original password from the hash value, nor is it possible for an attacker to do the same. When a user who knows the password subsequently logs into the system (hopefully the same user who created the password in the first instance), the same hashing process is repeated using the password entered by the user, and the resulting hash value is compared against the stored hash value.

Because the hash function is deterministic, the same output will always be calculated for the same input password. The system can then compare the newly calculated hash value with the stored hash value. If the hash values match, the input password must also match the original password.

If an attacker gains visibility of the password hashes and wishes to derive the original input for the hash, the only options are to break the hash function through cryptanalysis (very challenging), or try many inputs in the hope of finding the correct output. This process of repeatedly trying inputs and comparing to a desired output is called brute forcing.

The success and practicality of brute force techniques depend on the number of possible password combinations, or key space to be searched. For example, if it is known there are only 100 possible inputs, it is a simple matter to iterate through each input from 1 to 100 and test the output for a match. However, if there are 2¹²⁸ (or 3.4028237e+38 in decimal) inputs, the search space becomes impractical using current technology. This is not to say the input space will not be searchable in the future; just that it is not yet possible within reasonable timescales using current techniques today.

Entropy, key size, and key space

The fundamental property of any password is its complexity. A password is just a secret, and the more complex the secret the harder it is for an adversary to guess. When we discuss password complexity, we usually refer to a property called entropy which is simply a measure of the unpredictability, or randomness of a password, usually measured in the number of binary bits of entropy. The higher the number of bits, the more password combinations are possible.

Consider the following 128 bit binary number:

01001101011111111111110111010011
10110010001101101100101011010100
10101111110000010011101101100100
10001001001110111111001011110000

(128 bit number, split into four 32 bit blocks for clarity)

This is simply a large 128 bit number generated from a random source. Even though the character set is limited to the characters '0' and '1', it still makes a very strong password containing 128 bits of entropy (assuming the target system will allow passwords of 128 characters or more). This same 128 bit number can also be expressed in a number of other forms:

Octal		1153777672354433312651277011666221116771360
Decimal		103015125598399031288671372001077359344
Base 16		4D7FFDD3B236CAD4AFC13B64893BF2F0
Base 32		JV773U5SG3FNJL6BHNSISO7S6A
UUID		JV77-3U5S-G3FN-JL6B-HNSI-SO7S-6A
Base 64		TX/907I2ytSvwTtkiTvy8A

Notice how, as the size of the character set increases, fewer characters are required to maintain the same level of entropy. Also notice how the Base 64 encoding of the original number begins to look more like what most of us understand as a password following a typical password policy such as upper and lower case letters, numbers, and a selection of special characters.

Further, consider the PIN numbers commonly assigned by banks in order to protect payment and cash withdraws. A typical bank PIN number consists of a 4 digit decimal number, which gives a possible character set of the numeric digits 0 to 9, and a maximum number of combinations of 10⁴ (or 10,000). In cryptographic terms, this PIN number will provide a maximum entropy of just over 13 bits since each individual decimal digit provides approximately 3.3 bits of entropy. If we were to attempt to brute force this PIN, we would require a maximum of 10,000 attempts before we are guaranteed to succeed.

Reduced entropy

Suppose we have an attaché case containing some documents, which has been stored away for some time. The case has a 4 digit combination lock, the number for which we have long forgotten. We could either break the lock with a large screwdriver (the equivilent of breaking the encryption through cryptanalysis), or we could sit down and try every combination from the 10,000 possibilities, in other words brute force the combination. By trying random combinations, it would also be reasonable to assume we would find the correct combination before having to try all 10,000 possibilities - enter something called the birthday paradox.

The birthday paradox states that if there are 30 people in a room, the probability of two people having the same birthday exceeds 50%. This seems counter-intuitive as there are 365 possible days (excluding leap years) and only 30 people. This paradox tells us that collisions (in this case birthdays), occur more frequently than we may otherwise expect. When we are trying to find PIN numbers, attaché case combinations or passwords, we will statistically succeed using brute force techniques far faster than we would otherwise expect. To account for this paradox, we take the square root (halve the power) of the keyspace. So for our 128 bit random number, we reduce the entropy to 64 bits (2^128/2 = 2⁶⁴). For our PIN number or attaché case examples, if we take the square root of the keyspace (10,000 combinations), the birthday paradox states that we have a greater than 50% chance of hitting the correct PIN after only 100 random attempts.

We can even utilize human behavioral characteristics and look for common PIN codes, trying those first. We could for example try 0000, 1111, 2222, 3333 to 9999 first. Then, assuming the keypad used to enter the PIN is of a standard layout, we could try various common geometric patterns, such as 1234 (left to right, top to bottom), 1478 (down left side and across), 4753 (a tick), and so on.

Extending our PIN number analogy to passwords, it is common to try large dictionaries of known passwords, sometimes referred to as wordlists before attempting linear attacks. These wordlists will contain the most common passwords in use, and are usually sourced from actual data breaches. For systems that allow users to create their own PIN numbers or passwords, this approach is highly effective because humans like to choose PINs and passwords they can remember easily. If we have a list of these statistically common combinations, it makes sense to try these first before considering a brute force approach. In fact, this approach is so effective that many data breaches can be attributed to a technique known as password spraying which involves taking the top 10 most common passwords and trying these with any accounts the attacker has knowledge of.

Consider also the following scenario. When you enter your PIN number at the ATM, someone observes that the last digit you hit on the keypad was a 5. Since the observer now knows the last digit of your PIN code, the number of unknown combinations has now dropped from 10,000 to only 1,000. If they observe the first key you hit as well as the last, the number of combinations has now dropped to only 100 since they would only need to find the middle 2 digits.

PIN numbers are usually considered insecure due to the limited number of combinations available, however we could improve this in a number of ways. Firstly, we could make the PIN number longer, say 8 digits. This would raise the total number of combinations to 10⁸ (or 100,000,000). Adding another 4 digits has increased the key space from 10,000 to 100 million, which has immediately raised the bar for a brute force attack by a factor of 10,000.

Secondly, we could also increase the character set size. If we used a keypad with letters A to Z instead of digits 0 to 9, and we retained the four character 'PIN' code, we would now have 26⁴ (or 456,976 combinations). In other words, by expanding the character set we are increasing the multiples of each character position by the size of the character set.

Online vs. offline attacks

Most PIN codes used by banks are still 4 decimal digits, giving a maximum number of 10,000 combinations. This works because the system using the PIN number enforces a maximum number of attempts before rejecting further attempts, which is essentially the difference between an online and an offline attack. With an online attack, the system using the PIN or password has the opportunity to enforce controls and restrictions on how and when the credential is used. With an offline attack, there is no system to 'protect' the credential, and hence the attacker has unlimited attempts to guess the credential. Offline attacks are the most dangerous, and typically occur when a user database is stolen.

To guard against offline attacks, the method of password storage becomes important, as does the cryptographic strength of the password itself. To show real world implications of these aspects of password choice and password storage, we will make use of a freely available piece of open-source software specifically designed to break passwords; HashCat. HashCat has only one task - to brute force passwords as quickly as possible.

HashCat is optimized to compute cryptographic hash functions, such as those typically used for password storage. To do this, it will utilize the hardware components at its disposal such as multi-core CPUs and Graphics Processing Units (GPUs) to accelerate this task as far as possible. Additionally, it can utilise wordlists and rule sets to further optimise its job. For example, we can ask HashCat to use a list of 1000 common English words and check those first. This will find obvious passwords such as 'password', 'Jenny', 'computer', and so forth. We can even ask HashCat to apply rules to this worldlist such as adding numbers at the end or beginning of the word. This will find other passwords such as 'password1234', 'Jenny2014', etc.

More advanced word manipulation is also possible. We can for instance, ask HashCat to try common letter replacements such as number zero for letter 'O', number 1 for letter 'L', swapping upper and lower case letters and so on. Finally, we can also ask HashCat to try multiple wordlists in combination with these rules. This will allow us to find passwords such as 'C0mputer2o14RuLes2015'.

HashCat is designed to work with a large range of hashing algorithms, but for our discussion we will cover just 3 of them for comparison purposes:

MD5 is a message digest algorithm which dates back to 1991. Although it is old and has been considered insecure for password storage, it is still widely used.
SHA-512 is a secure hashing algorithm which is part of the SHA-2 family of hash functions. It is the currently used default function for hashing Linux user passwords.
bcrypt is a hashing algorithm specifically designed for the hashing of user passwords.

The table below gives an indication of how fast HashCat can compute each of these hash functions:

MD5		20,000,000 per second
SHA-512		40,000 per second
bcrypt		25 per second

These results are from a low to mid-range GPU accelerated desktop PC. Neither the CPU, nor the GPU have been over-clocked, or represent the cutting edge of available technology. However, as can be seen from the results in the table above the choice of hashing algorithm is very important to slowing down the process of brute forcing an offline password.

Additionally, it is important to choose a password that cannot be manipulated in a way that is advantageous to cracking software such as HashCat. We have already seen above that using numbers as letter replacements, concatenating multiple words, or generally using dictionary words present very little problem for such software.

Summary

On the basis of the background given above, consider the following when choosing and using your passwords

Choose long and random passwords
The closer your password is to 128 bits of entropy the better. This is generally where password managers are required as they are able to both generate and store such complex passwords. As a rule of thumb, if you can remember the password it is too weak; therefore a good password manager becomes essential.

Check the chosen password does not exist in breached wordlists
There exist online resources that allow you to test a password against known wordlists, but they should be treated with caution. Unless you know how the service is performing this check, you could be inadvertently giving away your password to a third party. If you choose a long and random password as per the first point, this should not be a problem. Some password managers will also have an internal weak password dictionary to help you choose a strong password.

If possible, establish how the system is storing your password
This is generally very difficult, but a good test is to use the password reset function. If this function returns your original password, the system is almost certainly storing your password as plain text and all bets are off. Your password will be exposed in any future breach no matter how strong it is.
Websites should be using storage schemes such as BCRYPT or SCRYPT which are specifically designed to resist efficient implementation in CPUs and GPUs. This can be seen clearly from the results in the table above. That said, if any form of modern cryptographic hashing is used by the system, and you have chosen a good password with good entropy, the chances of brute forcing your password are small.

Use different passwords for each resource you are trying to protect
A common tactic used by cyber criminals is to take a stolen identity and try large numbers of passwords for that entity. However, another common approach is to take a single breached identity and test this against a large number of known systems. This is called credential stuffing and the upshot is that if you use the same password on multiple systems and that password is exposed, all systems using that password are potentially open to attack.

Don't just rely on the password alone if at all possible
Many systems are now starting to support multi factor authentication, and the basic premise of multi factor authentication is more layers of security. From our bank PIN number analogy, this is the form of multi factor authentication most people will be familiar with. The card represents something physical you have in your possession, and the PIN number you (hopefully) remember is something you know. Independently they are useless, but together they allow you to withdraw cash from the ATM.

Pay attention to secret questions
Some authentication systems allow you to set up additional security questions during the registration process. The idea of these questions is to provide additional data points that can be used to recover the account if you forget your original password. In most cases the choice of these questions is poor, and if answered with correct information provides very little additional security. The questions themselves may have obvious answers, or answers that can be found easily on the internet with a little background reasearch on the account holder. If you wish to use these security questions, and you are able to freely enter their answers, a good strategy is to regard the answers as additional passwords, using strong random character sequences in the same manner as the primary password.