Understanding Hash Codes in C# – ASP.NET

To address the issue of integrity, it is common to make use of hash codes. In a nutshell, a hash code is a numerical value that is tied to a fixed input. One interesting aspect of hash code values is the fact that they provide a form of one-way encryption, given that the generated numeric value contains no trace of the original message data. For example, in the previous section, we examined how a strongly named assembly is assigned a digital signature based (in part) on a hash code value obtained from the assembly contents. Clearly a numerical value such as 79BB0DA9D45C6AE29F8 has no trace of the original assembly contents (types, methods, etc). To further illustrate the nature of hash codes, consider the method System.Object.GetHashCode. This virtual method may be overridden by derived types to generate a hash value based on its internal state data. The System.String class has overridden this method to return a different hash value for the current character data. Thus, if you have two identical strings (in the same case), System.String.GetHashCode will return the same value. If only one bit differs by case or content, you (usually) receive another numerical value. Please note: There IS a chance that there is a collision, although it is very unlikely if you use MD5 or SHA256. That being said, Hash strings are not 100% unique, a hashcode is a checksum! Ponder the following class definition:

class Program
{
static void Main(string[] args)
{
Console.WriteLine("***** Fun with Hash Codes *****");
Console.WriteLine("Hash of 'Hello': {0}", "Hello".GetHashCode());
Console.WriteLine("Hash of 'Hello': {0}", "Hello".GetHashCode());
Console.WriteLine("Hash of 'HellO': {0}", "HellO".GetHashCode());
Console.ReadLine();
}
}

Notice that the first two string objects have identical content and case, while the final string has a capitalized letter O. Now ponder the output.

Of course, when you’re interested in generating hash codes for large blocks of data or sensitive user information, you won’t leverage GetHashCode. Truth be told, overriding this virtual method is only useful when you’re designing types that may be placed in a Hashtable collection. Luckily, the .NET platform ships with types that provide implementations of many well known hash code algorithms. Each type is capable of operating on different input blocks and may differ based on the size of the message data and/or the size of the generated hash code.

Hashing a File

Once you have determined the hash code algorithm you wish to use, you can create an instance of the algorithm using the static HashAlgorithm.Create method. Simply pass in a string name of the algorithm you require (MD5, SHA1, SHA256, SHA384, or SHA512). Assume you wish to generate a hash code for a file on your local machine:

static void Main(string[] args)
{
// Open a local file on the C drive.
FileStream fs = new FileStream(@"C:\MyData.txt", FileMode.Open);
// Now generate a hash code for this file using MD5.
HashAlgorithm alg = HashAlgorithm.Create("MD5");
byte[] fileHashValue = alg.ComputeHash(fs);
// Print out the generated hash code.
Console.WriteLine("Hash code of MyData.txt");
foreach (byte x in fileHashValue)
Console.Write("{0:X2} ", x);
fs.Close();
Console.ReadLine();
}

Notice how hash values are represented using a simple array of bytes. Therefore, if MyData.txt contained thousands of lines of text, the entire contents might be represented as:

79 DC DA F4 5B F6 5C 0B B0 DA 9D 45 C6 AE 29 F8

If you were to change even a single character within MyData.txt, the new hash code will be usually different:

B3 E3 DD 14 96 2D D2 EB 0E C3 68 BF 08 04 D5 80

Again, using hash codes you’re able to represent sensitive data as a unique byte array that contains no trace of the original message data. In a distributed system, one of the most common uses of this technology is for the purposes of storing password information. By storing a user’s password in a hash code format, you increase the security of your system given that this numerical value has no trace of the original password. When the end user attempts to log into your system again, you simply rehash the message and perform a comparison against the persisted value.

Many hash code algorithms also enable you to specify a salt value. Simply put, salting is the process of incorporating a random value to the input of the hash algorithm, in order to further ensure a strong hash.

Feel free to post your comments or questions to this tutorial!

16 comments ↓

#1 Jeff Quaz on 05.30.06 at 10:54 pm

Thanks for this nice Tutorial! Like to see more of such stuff!

#2 SysLord on 05.07.07 at 7:57 pm

😎 Great! I like things that work that well! Thanks very much.

#3 Kevin on 07.21.07 at 12:35 am

AWESOME! 😆 One of the better articles of explaining HASH to beginners.

#4 Liu on 09.17.07 at 9:09 pm

Many junks on this topic on web. But this one is the real thing.
Thanks!

#5 rajiv on 10.23.07 at 8:10 am

very good article….Thanks a loooooooooot:lol:

#6 TC on 01.01.08 at 12:21 am

Well written article! Thanks

#7 Premkumar on 01.02.08 at 9:14 am

Good piece!!
I was looking for the various hashing and compression-mapping methods implemented in the .Net framework. Please mail me if known.

#8 Rolle on 01.06.08 at 1:53 pm

This article saved me time. Thanks! 🙂

#9 shikhachamoli on 02.28.08 at 9:01 am

what is the meaning of “salt�” ?

#10 julian on 06.07.08 at 12:01 am

I am new to programming. What is “hash”, what is it used for? thanks

#11 gandhi mathi on 06.27.08 at 7:45 am

superb article….really this article very useful to me

#12 Jens on 08.05.08 at 9:10 pm

EDIT: @Jens, thanks for pointing this out, I updated the post to reflect the correct circumstances.

Quoted: “If even one character differs by case or content, you receive a unique numerical value.”

No! It is false. Many strings will have the same hash code, and it *has* to be that way — there are many more strings than integers. Depending on the hash code algorithm the collision will be for different strings, but there will always be an enormous number of strings with the same hash code. For example, with the default C# String hash code, the strings “699391” and “1241308” have the same hash code (9FDCD8F4). Sure, “Hello” and “HellO” happen to have different hash codes, but they don’t need to — a valid implementation could assign them the same value.

(With longer hash values (such as from MD5) you can get away with luck in practice since the chance of collision becomes negligible, but the hash values are still *not* unique, and cannot be.)

#13 Red on 11.10.08 at 8:52 pm

Really cool stuff, well versed and easy to understand, really cool.

#14 VM on 01.05.10 at 10:16 pm

Pretty cool. Thanks

#15 Gaurav on 06.18.10 at 1:36 am

Thanks, very useful information for hash code………

#16 Prakash.Kr on 02.23.12 at 12:54 pm

Nice. This code helps my project. thanks a lot!

Leave a Comment