Locations of visitors to this page
    Sprouting Synapses       Minimize  

             
            Minimize  
Jul 26

Written by: SkySigal
7/26/2008 2:38 AM

One part of Cryptography is the subject of generating unique hashes, for which there are many uses.

Hashes – What are they exactly?

In cryptography, a cryptographic hash function is a transformation that takes an input and returns a fixed-size string, which is called the hash value Src: Wikipedia

A hash is basically a fixed length signature / fingerprint of the data that strives towards being:

  • computationally infeasible to find a message that corresponds to a given message digest,
  • computationally infeasible to find two different messages that produce the same message digest.
  • Any change to a message (including single bit changes) will, with an exceedingly high probability, result in a completely different message digest.

Note:
A hash value is also called a "digest" or a "checksum".

 

What are they good for?

Hashes are good for for creating message integrity checks and digital signatures.
They happen to be also very good for storing passwords – which is generally the first encounter that new programmers have with them…

Using Hashes to store passwords

One use for hashes is to save a username/password in such a way that even if the db table is hacked, they can’t find the password.

This is done by saving not the password, but the hash of the password.

This is more secure than saving the password directly, yet still works, because the hash can be calculated from the original password, but the password cannot be ‘rehydrated’ (re-determined) from the hash. 

 

MD5 Hash algorythm:

The most common hash algorythm used is the MD5 hash algorythm:

  • In cryptography, MD5 (Message-Digest algorithm 5) is a widely used, partially insecure cryptographic hash function with a 128-bit hash value.
  • As an Internet standard (RFC 1321), MD5 has been employed in a wide variety of security applications, and is also commonly used to check the integrity of files.
  • An MD5 hash is typically expressed as a 32 digit hexadecimal number.
    Src: Wikipedia

 

Here is an example of one way to convert a string to an MD5  hash string, in C#:

/// <summary>
/// Calculate the MD5 for the given string.
/// <para>
/// Uses <see cref="Encoding.Default"/>
/// for the character encoding.
/// </para>
/// <para>
/// Note: 
/// the returned 32 char string is 
/// valid as a Guid constructor.
/// </para>
/// </summary>
/// <param name="text">The text.</param>
/// <returns>a string of 32 chars.</returns>
public static string TextToMD5(string text) {
  return TextToMD5(text, Encoding.Default); 
}




/// <summary>
/// Returns the Cryptographic Hash of the given text.
/// </summary>
/// <remarks>
/// <para>
/// 
/// </para>
/// </remarks>
/// <param name="text">The text to encode.</param>
/// <param name="encoding">The encoding to use.</param>
/// <returns></returns>
/// <exception cref="System.ArgumentNullException">
/// An exception is raised if 
/// <paramref name="text"/> is null.
/// </exception>
public static string TextToMD5(string text, Encoding encoding) {
  Guard.StringNotNullOrEmpty(ref text,"text");
  Guard.ArgumentNotNull(encoding, "encoding");
  //Convert string to byte buffer:
  byte[] buffer = encoding.GetBytes(text);

  HashAlgorithm hashAlgorithm =
   MD5CryptoServiceProvider.Create();

  //Create the hash value from the array of bytes.
  byte[] hashBuffer
  = hashAlgorithm.ComputeHash(buffer);

  //Method a:
  //System.Text.StringBuilder sb =
  //  new System.Text.StringBuilder();
  //foreach (byte b in hashBuffer) {
  //  sb.Append(b.ToString("x2"));//'X2' for uppercase
  //}
  //return sb.ToString();


  //Method B: 
  //string hashString =
  //  BitConverter.ToString(hashBuffer);
  //The output looks like:
  //"19-E2-62-AE-3A-84-0D-72-1F-EF-32-C9-25-D1-A1-89"
  //so you want to remove the hashes in most cases:
  //return hashString.Replace("-", "");

  //Method C: Homebrewed speed:
  //Convert 16byte buffer to string...
  return ConvertHexByteArrayToString(hashBuffer);

}

private static string ConvertHexByteArrayToString(byte[] buffer) {
  char[] result = new char[buffer.Length * 2];

  int i = 0;
  foreach (byte b in buffer) {
  result[i] = GetHexValue(b / 0x10);
  result[i + 1] = GetHexValue(b % 0x10);
  i += 2;
  }
  return new string(result, 0, result.Length);
}

private static char GetHexValue(int X) {
  return (char)((X < 10) ? (X + 0x30) : ((X - 10) + 0x41));
}   

Notice:

  • MD5 produces a 128 bit result, 16 bytes of 8 bits), generally expressed as 32 hexs,
    which is easily expressed as a string (as done above).

Converting Byte Arrays to Hex Strings
This gets asked a lot in forums, so might as well address here, head on, as the issue will come up with MD5 and every other Hash algorithm out there.

Notice that the hash is returned as a byte array. Since its probably going to be used as a string, we need to convert it. But how? And how, efficiently?

Usually the first thing people try is

string result = (string) byteBuffer.ToString();

or

string result = Convert.ToString(hashBuffer);

Both of these return a useless string containing the following phrase:

"System.Byte[]"

Which is why you’ll see across the net one one the two following solutions:

//Display the hash value to the console.
 foreach (byte b in hash) {
   sb.Append(b.ToString("x2"));
 }

Or:

string hashString =
  BitConverter.ToString(hashBuffer);
//But the output has dashes in between:
//"19-E2-62-AE-3A-84-0D-72-1F-EF-32-C9-25-D1-A1-89-67-13-5F-58"
//so you want to remove the hashes in most cases:
return hashString.Replace("-", "");

As to which one is faster, I suspect the first.

A little bugged by the question, I Reflector’ed in to see, and found that the code within BitConverter could be fixed up as follows, and I suspect that its the absolute fastest of all options available, but doesn’t spit out lowercase hash strings):

private static string ConvertHexByteArrayToString(byte[] buffer) {
            char[] result = new char[buffer.Length * 2];

            int i = 0;
            foreach (byte b in buffer) {
                result[i] = GetHexValue(b / 0x10);
                result[i + 1] = GetHexValue(b % 0x10);
                i += 2;
            }
            return new string(result, 0, result.Length);
        }

        private static char GetHexValue(int X) {
            return (char)((X < 10) ? (X + 0x30) : ((X - 10) + 0x41));
        }   

(PS: if anybody sees a way how to combine efficiently the above two parts into one method, drop me a comment would  you? Thanks).

 

Right. Now that we are all experts at converting the results of hash algorithms to strings, lets investigate the other hash algorithms.

 

The Encoding

One question that has to arise in all circumstances is which encoding to use.

You could use Default UTF7, UTF8, UTF32, Unicode, or any of the other encodings available. It doesn’t really matter what encoding you use, as long as you use the same encoding each time (ie when you write the password’s hash to the db for the first time, and later, when you test upon login).

That said, there’s a little more to be mentioned about encoding.

…The Default Encoding
You’ll notice that i used The Default Encoding in the above example. 
Why? Because I the first time I wrote the MD5 algorithm in .NET, I needed to use it to check MD5s I had generated on the PHP platform, and had to use an Encoding that would match the PHP’s results. This post explains it it much more detail.

…Why ASCII is generally wrong…
I suggest you never use ASCII for your encoding, as it will only give you bad habits: for the trivial job of encoding short texts such as passwords, this won’t bite you immediately, but later, when you use Encoding to encrypt streams of text, it will. So it really is best to use a better Encoding right from the start.

What’s so wrong about ASCII, you ask? It’s a 7 bit encoding: any value that is stored in the most significant bit will be lost. If you can guarantee that your data will only contain bytes that are less than 128, then it will work fine. But now a days, with globalization, that’s practically unheard of.

…Just Curious …what does Microsoft use when using MD5?

If you look within the System.Web.Security.FormsAuthentication class you’ll see the following (UTF8):

public static string HashPasswordForStoringInConfigFile(string password, string passwordFormat);
...
 algorithm.ComputeHash(Encoding.UTF8.GetBytes(password)), 0); 
...
}

…But why UTF-8, and Unicode may not always be much better

It’s a bit premature to get too involved with this subject right now (I’m getting to stream encryption in a future post), but you may as well know that there could be an issue when we get there: this post suggests that there are other problems waiting to bite you in the other Encodings…
(Note: I’m still digesting it to see what I think.)

Great! I got it…We’re done?

It depends on how safe you want your passwords to be…

Because in 1996  a flaw was found…and in 2007 even more so (see Wikipedia notes).
Which is why they started pushing SHA-1 …

SHA-1 (Secure Hash Algorithm)

The SHA hash functions are five cryptographic hash algorythms designed by the National Security Agency (NSA).

The five algorithms are denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. The latter four variants are sometimes collectively referred to as SHA-2
[Ed: we’ll discuss those 4 variants in a second.]

Src: Wikipedia

We stated above that a hash is a also known as a digest so whereas MD5 stands for Message-Digest algorythm #5, SHA stands for Secure Hash Algorythm.
In other words, it does exactly the same thing.

Whereas MD5 produced a hash digest that was 128 bits long, SHA-1 produces a message digest that is 160 bits long, which when represented as a hex string is 160/8 *2 = 40 characters.

An example of its use, in C# is:

/// <summary>
/// Calculates SHA1 hash,
/// using <c>Encoding.Default</c>
/// </summary>
/// <param name="text">input string</param>
/// <returns>SHA1 hash</returns>
public static string CalculateSHA1(string text) {
  return CalculateSHA1(text, Encoding.Default,false);
}

/// <summary>
/// Calculates SHA1 hash
/// </summary>
/// <param name="text">input string</param>
/// <param name="enc">Character encoding</param>
/// <param name="keepdashes">
/// Indicate whether to keep the dashes 
/// between hex values.
/// </param>
/// <returns>SHA1 hash</returns>
public static string CalculateSHA1(string text, Encoding enc,bool keepdashes) {
  byte[] buffer = enc.GetBytes(text);
  SHA1CryptoServiceProvider cryptoTransformSHA1 =
  new SHA1CryptoServiceProvider();
  string hash =
  BitConverter.ToString(
  cryptoTransformSHA1.ComputeHash(buffer));
  //The output looks like:
  //"19-E2-62-AE-3A-84-0D-72-1F-EF-32-C9-25-D1-A1-89-67-13-5F-58"
  //so you want to remove the hashes in most cases:
  return (keepdashes)?hash:hash.Replace("-", "");
}

 

First of all, you’ll notice that the code is identical to what we did for the MD5. Which will come in handy when we get to the bottom of this post.

The similarity in code is due to the fact that both algorithms derive from the same abstract HashAlgorithm class, which I can use to get what I want.

So doing SHA-1 is not any more difficult than doing MD5.

SHA-2 (ie SHA224, SHA256, SHA384 and SHA512)

So SHA1’s better than MD5, and good enough, right?

Not so fast…

In light of the results for SHA-0, some experts suggested that plans for the use of SHA-1 in new cryptosystems should be reconsidered. After the CRYPTO 2004 results were published, NIST announced that they planned to phase out the use of SHA-1 by 2010 in favor of the SHA-2 variants.
Src:Wikipedia

The good news is that using SHA-2 equations is just about exactly the same code as SHA1 and MD5 (and therefore no more difficult) -- bar the name of the class instantiated -- so I’m not going to waste your time writing out those methods as well.

I’m just going to say that you will be seeing more and more references to SHA512 in your code.

 

Can all this be simplified into one Method?

Who wants tons of methods, all doing the just about the same thing? I don’t.

So is there a way to use one method and be done with it?

Sure! Turns out that the HashAlgorithm class has a static method called create that makes it possible to generate all the HashAlgorithms you need.

So the modified, and final method is:

/// <summary>
/// Calculates SHA1 hash
/// </summary>
/// <remarks>
/// <para>
/// Acceptable Terms are the following 
/// (case sensitive):
/// <list>
/// <item>"MD5"</item>
/// <item>"SHA1"</item>
/// <item>"SHA256"</item>
/// <item>"SHA384"</item>
/// <item>"SHA512"</item>
/// </list>
/// </para>
/// </remarks>
/// <param name="text">input string</param>
/// <param name="enc">Character encoding</param>
/// <param name="hashType">Hash Algorithm Type</param>
/// <returns>SHA1 hash</returns>
public static string CalculateHash(string text, Encoding enc, string hashType) {

  //Convert string to byte buffer:
  byte[] buffer = enc.GetBytes(text);

  //HashAlgorithm is IDisposable:
  using (HashAlgorithm hashAlgorithm =
    HashAlgorithm.Create(hashType)) {

    byte[] hashBuffer =
      hashAlgorithm.ComputeHash(buffer);

    //Method a:
    //System.Text.StringBuilder sb =
    //  new System.Text.StringBuilder();
    //foreach (byte b in hashBuffer) {
    //  sb.Append(b.ToString("x2"));//'X2' for uppercase
    //}
    //return sb.ToString();

    //Method B: 
    //string hashString =
    //  BitConverter.ToString(hashBuffer);
    //The output looks like:
    //"19-E2-62-AE-3A-84-0D-72-1F-EF-32-C9-25-D1-A1-89-67-13-5F-58"
    //so you want to remove the hashes in most cases:
    //return hashString.Replace("-", "");

    //Method C: Homebrewed speed:
    //Convert 20byte buffer to string...
    return ConvertHexByteArrayToString(hashBuffer);
  }
}


private static string ConvertHexByteArrayToString(byte[] buffer) {
  char[] result = new char[buffer.Length * 2];

  int i = 0;
  foreach (byte b in buffer) {
    result[i] = GetHexValue(b / 0x10);
    result[i + 1] = GetHexValue(b % 0x10);
    i += 2;
  }
  return new string(result, 0, result.Length);
}

private static char GetHexValue(int X) {
  return (char)((X < 10) ? (X + 0x30) : ((X - 10) + 0x41));
}   

 

Salts

MD5, NT User table hashes, etc. can be cracked by brute force and Rainbow Table attacks (http://en.wikipedia.org/wiki/Rainbow_tables).

A very simple easy to understand introduction to how this would happen is demonstrated in this article.

One way to thwart that is to prefix the data with a Salt.

Note:
I said Prefix it (not append it, which you will often see on the net – as I’m led to believe that the math behind the algorythms work better if the salt is at the front).

One other thing about salts that leave me a bit confused…
I’ve seen examples of methods (such as this one) that generate a random salt, if no salt is provided.
Frankly, I don’t understand that at all. That does encrypt it, but doesn’t that make it impossible to ever compare against it since you don’ t know the salt that was used? Strange world….
(ie: If anybody understands that better than me, drop me a line…)

 

 

Going Crazy

Another way to thwart these kinds of attacks is to double up the hashing – ie MD5(MD5(…)), or even SHA1(MD5(…)) Or even go nuttier, which I have seen done:

MD5( MD5(Password + UserName) + Salt)

(my suggestion would have been to consider changing algorythym :-))

 

 

What Next?

Right – that just about covers the use of Cryptographic Hash Algorithms. Any questions?

Next step is to cover Cryptographic encryption of streams…

 

 

 

 

Further Reading: Other Hash algorythms to know about

Although the vast majority of the time you will come across MD5, SHA-1, and SHA-n you may encounter some of the following alternatives in your travels…

Algorithm Output size (bits) Collision
HAVAL 256/224/192/160/128 Yes
MD2 128 Almost
MD4 128 Yes
MD5 128 Yes
PANAMA 256 Yes
RadioGatún Arbitrarily long No
RIPEMD 128 Yes
RIPEMD-128/256 128/256 No
RIPEMD-160/320 160/320 No
SHA-0 160 Yes
SHA-1 160 With flaws
SHA-256/224 256/224 No
SHA-512/384 512/384 No
Tiger(2)-192/160/128 192/160/128 No
WHIRLPOOL 512 No
powered by metaPost

Tags:

Your name:
Title:
Comment:
Security Code
Enter the code shown above in the box below