Locations of visitors to this page

    Blog List       Minimize  
,NET:ASP:MVP
.NET
.NET 3.5
.NET:ACL
.NET:AppDomains
.NET:ASP
.NET:ASP ServerControls
.NET:ASP:Commerce
.NET:ASP:Config
.NET:ASP:JSON
.NET:ASP:Layout
.NET:ASP:Media/Flash
.NET:ASP:Mobile
.NET:ASP:Monitoring
.NET:ASP:MVC
.NET:ASP:Navigation
.NET:ASP:Stress Testing
.NET:ASP:Validation
.NET:ASP:WebParts
.NET:C#-Trig
.NET:CAB
.NET:CAS
.NET:Certification
.NET:CF
.NET:Collections
.NET:Configuration
.NET:Cryptography
.NET:Db
.NET:Delegates
.NET:Deployment
.NET:Diagnostics
.NET:Documentation
.NET:Encoding
.NET:Environment
.NET:Extension Methods
.NET:Globalization
.NET:I/O Streams
.NET:Interop
.NET:IO:Mail
.NET:IsolatedStorage
.NET:LicenseManager
.NET:LINQ
.NET:Metrics/Quality
.NET:Mono
.NET:MSOffice
.NET:Optimization
.NET:Patterns/Practices
.NET:Phone7
.NET:Reflection
.NET:Remoting
.NET:Reverse Engineering
.NET:Serialization
.NET:Silverlight
.NET:Silverlight UserGroup
.NET:Silverlight:Phone7
.NET:Threading
.NET:WCF
.NET:Windows Services
.NET:WinForms
.NET:WPF
.NET:Xml
Admin
Admin:Creating Software
Admin:CruiseControl
Admin:Estimating
Admin:Installers/Packaging
Admin:Methodologies
Admin:PM
Admin:SourceControl
Admin:UnitTesting
Admin:VisualStudio
Arch:Gen
Arch:Patterns
Arch:UML
Blogging
DB:Sqlite
DB:SqlServer
DB:SqlServer CE
DB:VistaDB
Graphs
IT
IT:DNN
IT:DOS
IT:IIS
IT:MailServers
IT:MS Office
IT:OS (XP/Vista/7)
Misc
Misc:Hardware
Misc:Humour
mISV:Accounting
mISV:Marketing
OS:Vista
Personal
Personal:Children
Personal:Faith
Personal:Family
Personal:History
Personal:Politics
Places:New Zealand
Places:Paris
Presentations
Tech:CSS
Tech:Regex
Tech:SQL
Tech:Web:HTML
Tech:XML/XSL
Web:HTML5

             
    Sprouting Synapses       Minimize  

             
Summary:

image I’ve been looking at several posts (I started searching from this position) on MD5 (I’m currently going over some Cryptographic Hash algorythm stuff) and found several hot arguments about why getting the same results as PHP on a net plaform is difficult -- but no clear answer as to what Encoding does work in the end…

So I put together a test to sort out the the answer…




I’ve been looking at several posts (I started searching from this position) on MD5 (I’m currently going over some Cryptographic Hash algorythm stuff) and found several hot arguments about why getting the same results as PHP on a net plaform is difficult -- but no clear answer as to what Encoding does work in the end…

So I put together a test to sort out the the answer…

 

The issue is that using the MD5 hash algorythm on .NET depends a lot on the Encoding used.

So I first pushed some tests through php to get something to compare against:

<li><? echo MD5("hello")?> Gives 5d41402abc4b2a76b9719d911017c592
<li><? echo MD5("La Defense")?> Gives 5fa39a733b79d60627e00a62aeebe8a3
<li><? echo MD5("La Défense")?> Gives 2d445db41da3dc087b35ca16f6ff8a45

(I’ve modified the code i used to put in the answers I got)

Ok. So we now know what we want, let’s push this through some code in .NET.

The MD5 method I am using is:

/// <summary>
/// Returns the Cryptographic Hash of the given text.
/// </summary>
/// <param name="text"></param>
/// <param name="encoding"></param>
/// <returns></returns>
public static string TextToMD5(string text, Encoding encoding) {

    //Convert string to byte buffer:
    byte[] buffer = encoding.GetBytes(text);

    HashAlgorithm hashAlgorithm =
     MD5CryptoServiceProvider.Create();

    //Create the hash value from the array of bytes.
    byte[] hashBuffer
        = hashAlgorithm.ComputeHash(buffer);

    //Method a:
    //System.Text.StringBuilder sb =
    //  new System.Text.StringBuilder();
    //foreach (byte b in hashBuffer) {
    //  sb.Append(b.ToString("x2"));//'X2' for uppercase
    //}
    //return sb.ToString();


    //Method B: 
    //string hashString =
    //  BitConverter.ToString(hashBuffer);
    //The output looks like:
    //"19-E2-62-AE-3A-84-0D-72-1F-EF-32-C9-25-D1-A1-89-67-13-5F-58"
    //so you want to remove the hashes in most cases:
    //return hashString.Replace("-", "");

    //Method C: Homebrewed speed:
    //Convert 20byte buffer to string...
    return ConvertHexByteArrayToString(hashBuffer);
}

 

What I wanted to test was every Encoding class I could find in the system, and see which one would produce the same results.

The code I used was:

[Test]
public void Test_MD5_All_Possibilities() {
    
    Encoding[] encoders =
        new Encoding[] { 
     Encoding.ASCII,
     Encoding.UTF7, Encoding.UTF8, Encoding.UTF32,
     Encoding.Unicode,
     Encoding.BigEndianUnicode,
     Encoding.Default};
  string[] texts = new string[] { "hello", "La Defense", "La Défense" }; foreach(Encoding encoder in encoders){ Console.WriteLine("---"); foreach (string text in texts) { string result =
              XAct.Core.Utils.Cryptography.TextToMD5(text, encoder); string s =
              string.Format("{0} [{1}, {2}]", result, text, encoder.ToString()); Console.WriteLine(s); } } }

Hopefully, I’ve missed one somewhere, because the answers I got were:

---
BC4B2A76B9719D911017C592 [hello, System.Text.ASCIIEncoding]
3B79D60627E00A62AEEBE8A3 [La Defense, System.Text.ASCIIEncoding]
9835A2ED875F7BDA6746FD5C [La Défense, System.Text.ASCIIEncoding]
---
BC4B2A76B9719D911017C592 [hello, System.Text.UTF7Encoding]
3B79D60627E00A62AEEBE8A3 [La Defense, System.Text.UTF7Encoding]
D5F8CA0EA7D2D8B895F0A186 [La Défense, System.Text.UTF7Encoding]
---
BC4B2A76B9719D911017C592 [hello, System.Text.UTF8Encoding]
3B79D60627E00A62AEEBE8A3 [La Defense, System.Text.UTF8Encoding]
20302DD82236C68DFC0362DA [La Défense, System.Text.UTF8Encoding]
---
1AD0127E555C051D15806EB5 [hello, System.Text.UTF32Encoding]
97FB8D648CD2B2125064D13D [La Defense, System.Text.UTF32Encoding]
6189EB82E53C05A5D3604FA8 [La Défense, System.Text.UTF32Encoding]
---
9A16B1BF2BD2F44E495E14C9 [hello, System.Text.UnicodeEncoding]
3340D4383A4646EB5D8AF519 [La Defense, System.Text.UnicodeEncoding]
95C80B5E2907CD4D3005BCEB [La Défense, System.Text.UnicodeEncoding]
---
13CA2631D3982CD37FBDCD8B [hello, System.Text.UnicodeEncoding]
D976518CD4ECCA6F1F7F6B38 [La Defense, System.Text.UnicodeEncoding]
3B780D6F19021EF62A11D167 [La Défense, System.Text.UnicodeEncoding]
---
BC4B2A76B9719D911017C592 [hello, System.Text.SBCSCodePageEncoding]
3B79D60627E00A62AEEBE8A3 [La Defense, System.Text.SBCSCodePageEncoding]
1DA3DC087B35CA16F6FF8A45 [La Défense, System.Text.SBCSCodePageEncoding]

 

Why People in English speaking countries can easily get fooled by their results…
What’s significant is that if you work only in English, you could easily fool yourself into thinking that ASCII, UTF7, UTF8 would all work…

But if you use french chars (basically anything else than english) you see that the only one that matches the results of the PHP output is the Default Encoding, which is actually mapping to something called SBCSCodePageEncoding.

 

What is this undocumented SBCSCodePageEncoding?

First of all, I found no documentation for it.

This post thinks the SBCS stands for Single Byte Character Set, but isn’t 100% sure.

Secondly, with Reflector, I see that its marked internal (great…)

Again with reflector I see that the Default encoding is based on the some internal flags:

private static Encoding CreateDefaultEncoding(){
    //This function retrieves the current 
//ANSI code-page identifier for the system
int aCP = Win32Native.GetACP(); if (aCP == 0x4e4){ return new SBCSCodePageEncoding(aCP); } return GetEncoding(aCP); }

…hum…”code page identifiers” is a bit beyond me at this point, so I’m not sure what I’m looking at…

But even if I don’t fully have enough knowledge of this particular area, its the if that bugs me, and the fact that the SBCS class is marked internal and therefore can’t specified if I really need to…

 

Are PHP5 MD5’s good enough to emulate – or should it be the other way around?
Thinking about it for a second, is it an issue?

Not if the data is hashed and checked on machines with the same encoders, because my guess at this point is that the MD5 function in PHP uses the default encoding on the system it is running on.

But…for the larger picture, I’m not sure if PHP’s system doesn’t have a flaw in it: correct me if I am wrong – but I think this means that an MD5 encoded on a station in china is not necessarily the same as the MD5 encoded on a station in france as on a station in the US…or NZ.

Wow.

I hope that their MD5_File method is coded differently.

 

Responses Always Appreciated

If anyone has more data or simply insight into this, let me know, so that i can correct any of the above if needed.

BookMark: Trackback

What now?!?

You've got to the end of the post...now what?

Well...a Comment would be nice... It doesn't have to be long...Will just a take a sec...

Thanks!

And (in a perfect world) if I was able to save you some time on your project:

0 comment(s) so far...


Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
Enter the code shown above in the box below
Add Comment   Cancel 
Copyright 2007 by Sky Sigal