Talk:MD5/Archive 1

Latest comment: 3 years ago by 85.149.83.125 in topic little endian
Archive 1

Another online hashing tool

I don't know if it's ok to add this too to the external links:

[1]

Online hash reversal and generator —Preceding unsigned comment added by 68.151.45.101 (talk) 02:33, 20 January 2009 (UTC)

May be interesting: among the other links there's one specific for MD5 hash. This is generic and, moreover, allows to upload files to be hashed online. —Preceding unsigned comment added by 87.17.86.109 (talk) 20:37, 12 January 2008 (UTC)

Byte-ordering of test hashes

I implemented the pseudo-code into Delphi and tested the empty string result - my output was correct, but the byte-ordering was reversed. For the sake of program simplicity, I split the hash into four DWORDs.

Wikipedia's result:

d41d8cd9 8f00b204 e9800998 ecf8427e

My result:

d98c1dd4 04b2008f 980980e9 7e42f8ec

Each 32-bit block has the bytes reversed.

Can anyone confirm if the hashes on the page have big-endian 32-bit blocks or if I need to make changes to my own implementation? (Note: the test program was run on Windows XP, which is naturally little-endian, and no changes were made to byte ordering after the algorithm)

WhiteCrane 23:57, 13 April 2007 (UTC)

I tried implementing it and had the same results as you, with the bytes reversed:
d98c1dd4 04b2008f 980980e9 7e42f8ec
(Using Windows 98, Intel Pentium III processor, which I think is little-endian). Make what you will of this.
Seanqtx 20:44, 12 May 2007 (UTC)
Use byte arrays then. -- intgr 10:26, 13 May 2007 (UTC)
The empty MD5 array is d41d8cd98f00b204e9800998ecf8427e, as appears on RFC 1321. --Platonides (talk) 22:28, 25 November 2007 (UTC)
The example C code appears to have an endianness problem here:
uint32_t bits_len = 8*initial_len; // note, we append the len
memcpy(msg + new_len, &bits_len, 4); // in bits at the end of the buffer
Using memcpy on the memory location of a uint32_t variable would presumably fail produce the wrong result on a little endian system. RFC 1321 states (while referring to the appending of the length):
"(These bits are appended as two 32-bit words and appended low-order word first in accordance with the previous conventions.)"
The code should copy one byte of the length at a time, or use some function to correctly order the data into big endian order. Also, use of uint64_t would be more proper.
Dk white 7:51, 18 Feb 2013 (UTC)
Another endian problem:
uint32_t *w = (uint32_t *) (msg + offset);
This will likely produce an incorrect value for w[n] on little endian systems, when referenced.
Dk white 15:14, 19 Feb 2013 (UTC)

Sfv format article

Hello, can someone help clean up Sfv checksum format article, as well as perhaps have a list of checksum formats as well? crc32 is a format, which sfv uses, there must be others. --ShaunMacPherson 04:33, 14 Apr 2004 (UTC)

Input

We need to reword this better:

Wikipedia --> 20ee8f504f73e6894f328d1194280bcb
WIKIPEDIA --> b2f4895c3df311be0e3b07edc0974534

Firstly, we should probably avoid self-references, so we might be better off changing Wikipedia to something else; secondly, we need to be explicit in how "Wikipedia" is interpreted into the input bitstring used by MD5. Is this string represented as ASCII? — Matt

  • Feel free to have at it. I'll tell you what I was trying to do. I'd like to show somehow to the casual reader that hashes carry no easily observable characteristics of the inputs, so that something like ABC and ABCD or ABD will likely have dissimilar looking hashes. Probably belongs in cryptographic hash function or something like that, but might be a nice exercise here. Or if not, we can scrap it entirely :) Jewbacca 22:32, Aug 18, 2004 (UTC)

P.S. Found this in cryptographic hash function which expresses it well:

"Broadly speaking, the security properties are required to ensure that the digest is 'random' to prospective attackers, and does not leak any information about the message itself, and that other messages cannot be found that produce the same digest. Any change to the message, even a single bit, should result in a dramatically different message digest when re-generated from the received message."

Jewbacca 22:35, Aug 18, 2004 (UTC)

Round nonsense

Has anyone ever noticed that the concept of breaking X rounds out of a particular hash function's total is a little silly? For example, if you make a hash function out of just one group of 16 rounds in MD5, a chosen-hash attack can be done with Windows Calculator. The concept of "round" really ought to be taken to mean the number of times each input bit is reused. It is the simultaneous congruences that gives hash functions their security, and how you make those is by using each input bit more than once. -- Myria 07:21, 19 Oct 2004 (UTC)

What? --Ihope127 21:05, 11 July 2005 (UTC)

What an effort - Unrealistic

I understand that people are busy cracking these algorithms but to me, the effort required is just impossible and unrealistic. You really have to work hard perhaps for the rest of your life to get anything meaningful and by the way there is no system that is fool proof!

Simba

Hey Simba; what makes you think that the people that are busy cracking these algorithms are the ones writing encyclopedia articles, or that they will read this talk page? I suggest you duplicate your comments on sci.crypt for a better interaction with your target audience. — Matt 12:17, 19 Oct 2004 (UTC)


Does this mean that is possible to get the reverse of an MD5 hash within minutes now?

What do you mean by a reverse? mic 01:33, 29 May 2006 (UTC)

Sorry, I meant a collsion. Basically if I gave you an MD5 hash, could you tell me a collsion for it?

That's not a collision. A collision is where you create two texts which have the same hash - this has been done for MD5 many times and can be done within minutes. You're talking about a preimage, or a second preimage; no practical algorithm is currently known for this, except for generic algorithms based in guessing the original string.
As Matt Crypto says, sci.crypt is a better place for these questions; this page is for discussion of the article, not for discussion of MD5. — ciphergoth 07:26, 10 June 2006 (UTC)

Diagram

The picture does not correspond with the description of the algorithm:

to b the long expression is assigned to; in the algorithm this is a.

You're right... I fixed it. Sorry about that. -- Myria 18:19, 28 Dec 2004 (UTC)

The diagram also does not appear to correspond with the intuition of how A,B,C, and D correspond to the inputs with X, Y, and Z. In the diagram, D is given as the 'top' input, which might cause someone to mistakenly assume that D, C, and B correspond (respectively) to X, Y, and Z. Maslen (talk) 04:59, 4 November 2012 (UTC)

Implementations section

I'm worried about this new section becoming a spam trap; Wikipedia isn't really meant to be a link farm. If people start adding lots of things to external links, then they can be easily removed under the argument that we only select a few high-quality links. However, if they are in an "Implementations" section, then that argument is weakened; it's hard to argue that someone's VB script MD5 school project (or whatever!) is not a valid addition to such a section. My suggestion is that we keep it as before. There's likely hundreds of MD5 implementations, and we don't really want to be listing them all. — Matt Crypto 17:12, 24 October 2005 (UTC)

We cannot keep it as it was before, because some of the links have become internal and cannot be listed under external links anymore. I didn't name the section "All implementations" however, and I'm all for culling and keeping only links that really add value. Personally I think one good implementation in C++, Java and VB is enough. People shouldn't use {VB|Java}Script for crypto in any case. Shinobu 16:41, 25 October 2005 (UTC)
Just get rid of it. It's just a spamtrap that will need constant surveillance, and Wikipedia is as always not a linkfarm. Haakon 20:07, 26 March 2006 (UTC)
I think I agree with you. We did this on SHA hash functions, and it's better for it. People can't seem to resist the temptation to advertise on our hash function pages, for some reason. It's been happening for months, if not years. Ideally, we should link instead to a page that lists MD5 implementations, and indeed we do, the "Unofficial MD5 homepage". — Matt Crypto 22:39, 26 March 2006 (UTC)
If you like you can direct these people to my wiki LiteratePrograms.org where I would be happy to accept their implementations as contributions. Sometimes they just need an outlet. Deco 10:13, 10 June 2006 (UTC)
There is a case for linking to the Dyalog implementation or listing it as well as the pseudocode. Kenneth Iverson derived the APL programming language from the mathematical notation he devised for describing algorithms; describing algorithms is what the language was designed for. While an IBM Fellow he launched Expository Programming, a forerunner of Literate Programming, using APL as a descriptive and executable notation for teaching and exploring algorithms in various fields, such as X-ray crystallography. The CMN versions of the F G H and I functions in the algorithms section of this article are clearly recognisable in the APL source. 5jt (talk) 14:34, 10 April 2008 (UTC)

Disputed

"... because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix."

This seems to be claiming that preimage attacks exist for MD5, whereas I thought only collision attacks had been demonstrated. If it's not saying that, I think it needs to be rephrased for clarity. -- Antaeus Feldspar 22:33, 16 November 2005 (UTC)

It's not saying that preimage attacks exist, but that you can specify an arbitrary prefix in your collision. I believe that's accurate, or at least if the length of the prefix is a multiple of the message block size. That is, if you have a prefix X, you can find a collision of the form X || Y1 and X || Y 2 such that H(X || Y1) = H(X || Y2) (where || means concatenation, H is the hash function). — Matt Crypto 23:01, 16 November 2005 (UTC)
I think that's called length-extension, and it is a feature of most digest functions, so they can operate on unbounded streams of data, whithout requiring unbounded memory. 193.230.245.6 13:28, 17 November 2005 (UTC)
OK, I think I see what you're saying -- it's not saying that for any prefix, you can find another prefix which collides with it (which equates to a preimage attack); rather, it's saying that you can start with any one prefix and create two colliding files which share that prefix. Can we rephrase it to make the distinction more clear? -- Antaeus Feldspar 23:05, 17 November 2005 (UTC)


MD5 is no longer safe. plz look at: MD5 Collision Generation http://www.stachliu.com.nyud.net:8090/collisions.html

Well, we're aware that MD5 is vulnerable. The problem is that many people wrongly interpret what they've heard about MD5's vulnerability. Many of the security applications which use MD5 which people think are now broken are not, because in order to exploit them, you would have to find a way to derive, for a given MD5 hash, a file which has that hash. This is called a "preimage attack"; no preimage attacks against MD5 are known. What can be done against MD5 now, which compromises some security applications, is a "collision attack"; which means being able to create two files which will have the same hash -- even though it is not (currently) possible to control which hash that will be. Therefore, while some attacks are now possible with MD5, it's an exaggeration to declare with no clarification that it is "no longer safe". -- Antaeus Feldspar 18:59, 20 November 2005 (UTC)

Move Down Photo

uhh motion to move down that photo to the first paragraph under heading 1... at first glance, it looked like i was looking at a biography of some dude named MD5...--Htmlism 00:11, 13 February 2006 (UTC)

Agreed and done. Deco 01:09, 13 February 2006 (UTC)

Example?

I've added (or just about to) add an external link to a site that gives someone an md5 hash. that is one thing that this page is missing http://www.instantmd5.com/

Moonrat506

I removed your link. There are countless such sites online, and they don't have much value for this article. Also, instantmd5.com is your own site, and you should generally not use Wikipedia as a vehicle for promoting your own sites. Please refer to WP:EL if you want to learn more about Wikipedia's policy for external linking. Thanks. Haakon 18:57, 11 April 2006 (UTC)

Agreed: please don't add your own site. — Matt Crypto 23:24, 11 April 2006 (UTC)

Thought it would be handy. Moonrat506 14:19, 12 April 2006 (UTC)

How come the external links section currently has a link to one site that can be used to lookup MD5 hashes, whereas all the other links to similiar sites have been removed? Any particular reason for doing so?

It's useful to have a link to one, but not particularly useful to have a dozen. — Matt Crypto 05:49, 9 May 2006 (UTC)

Pseudocode

  • That isn't pseudocode.
  • I second that! Is it possible for someone with an understanding of the algorithm to rewrite pls?

To the above, I don't know if you're getting unexpected results or not, but at first I did. But then I noticed it said "LITTLE-ENDIAN", after converting my big-endians to little-endians, everything worked. The pseudocode is good, just be aware of endian-ness.

Pseudocode and comments both cause the above problem. Because they can't be executed, they require human interpretation, and can't be tested for correctness. Supplement the pseudocode with the executable (and tested) version in Dyalog APL, a language originally designed as a mathematical notation for describing algorithms. (See Implementations section for more on this.) 5jt (talk) 11:25, 16 April 2008 (UTC)

key strengthening, wtf!?

"Also, it is a good idea to apply the hashing function (MD5 in this case) more than once—see key strengthening. It increases the time needed to encode a password and discourages dictionary attacks."

That is stupid. double md5 makes dictionary attacks and rainbow crack easier, because it makes content have known, fixed size and limited character set. It makes input with (theoretically) infinite complexity a simple string with 2^128 known combinations.

You are mistaken. 2^128 combinations is far too many for any table lookup approach to be practical today. However, applying a hash function multiple times increases the cost of a dictionary attack proportionately and can be an effective security measure. The term you are looking for is key stretching, not key strengthening.
Of course MD5 is no longer recommended for any application, but the point stands.
Also, please sign your messages by appending ~~~~ when you write them. Thanks! — ciphergoth 08:19, 13 June 2006 (UTC)

MD5 encryption? Should it not be MD5 coding.

MD5 with SHA-1?

This article says SHA-1 is now prefered over MD5. Naively, it seems that an MD5 sum and a SHA-1 sum, together, would be stronger than either one, even if MD5 is suspect on its own. Is that right in theory? In practice? —Ben FrantzDale 15:24, 15 June 2006 (UTC)

By "together", do you mean the new hash function being the concatenation of the two sums, or function composition? — Matt Crypto 19:21, 15 June 2006 (UTC)
Yes. As in, the MD5+SHA-1 of the empty file would be
   d41d8cd98f00b204e9800998ecf8427e da39a3ee5e6b4b0d3255bfef95601890afd80709
—Ben FrantzDale 20:35, 15 June 2006 (UTC)

I have been wondering about almost the same thing - see Talk:SHA_hash_functions#Combining_SHA1_.2F_MD5 - this would be function composition. I assume that there is an answer to this question around somewhere, as it appears to not be a new idea - any ideas where?

-Preceding unsigned comment presumably added 2006.

Using MD5 and SHA-1 both, just in case if one of them is broken, is as old as SSL, the old NetScape predecessor of TLS. Both are now of course obsolete, and SHA-1 MUST NOT be used in applications requiring NIST approval. -82.113.106.31 (talk) 04:29, 8 February 2011 (UTC)

Infinite collisions

Since there are an infinite number of inputs but only a finite number of ouputs, does that mean MD5 technically has an infinite number of colliding inputs? If not, is there a way to calculate exactly how many collisions there are? --Tim1988 talk 17:39, 21 August 2006 (UTC)

Yep. It must have at least one collision by the pigeonhole principle. Further, assume you have a complete finite list of any inputs that are part of at least one colliding pair. Then consider the remaining inputs not on this list, of which there are a still-infinite number. Using the pigeonhole principle, you can again prove there's another collision amongst those remaining, contradicting the assumption that the finite list is complete. Hence there must be an infinite number of colliding inputs. Of course, proving existence is easy; finding them is, or rather was, the tricky bit. — Matt Crypto 19:05, 21 August 2006 (UTC)
Thank you for the explanation :) --Tim1988 talk 21:42, 22 August 2006 (UTC)

Colliding executable files.

User:Superm401 added a [citation needed] tag to this text in secition "Applications": "Now that it is easy to generate MD5 collisions, though, it is possible for the person who creates the file to create a second file with the same checksum, so this technique cannot protect against some forms of malicious tampering."

I removed the tag since it is a true statement and we already do have have a refence for it. There is a link to a detailed description of how to do it in the "External links" sections. That is the link Two colliding executable files.

If you know a little about how executables work and how the 2005 MD5 attack works and a little about hacking it is a pretty easy trick to do. Here is an explanation how to do that attack I came up with even before seing that link:

The 2005 MD5 attack works by manipulating a small number of bytes in a special fixed position near the beginning of a file. So only some bytes are changed when doing collisions with that attack. The rest of the file will keep the exact same content. So the attacker codes up a nice executable that contains a bunch of functions that does good work. But he also designs the functions so they can do evil work if called in the right order or with the right parameters. And in the special position in the executable that he will have to change to create a colliding executable he puts a constant value in the source code. That is, he stores some constant in that position. When he manipulates that position to find collisions that constant will change, but no other part of the program. Then at runtime the program can check that constant in an if-statement. If the constant is the original one the executable behaves nice. If the constant is the changed one the executable then choses to do its evil work. So the attacker first publishes the nice version of the file and people will test it and say it is nice and publish the MD5 sum of it. Later he will publish the evil version of the file instead. And that version has the same MD5 sum but does evil work instead.

There are many other ways to use the (random) change of bytes in that position. This was just one simple example.

--David Göthberg 23:53, 25 October 2006 (UTC)

Pseudocode wrong?

The pseudocode looks wrong...

I'm not an expert, but things seem off right from the beginning:

RFC1321(http://www.faqs.org/rfcs/rfc1321.html) says:

3.3 Step 3. Initialize MD Buffer

   A four-word buffer (A,B,C,D) is used to compute the message digest.
   Here each of A, B, C, D is a 32-bit register. These registers are
   initialized to the following values in hexadecimal, low-order bytes
   first):

          word A: 01 23 45 67
          word B: 89 ab cd ef
          word C: fe dc ba 98
          word D: 76 54 32 10


However, the pseudocode says the following (which looks more like the initial values for SHA):

//Initialize variables:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476

—The preceding unsigned comment was added by 74.122.57.137 (talk) 20:02, 11 April 2007 (UTC).

What exactly is the discrepancy? As far as I can see from a cursory look, the pseudocode does exactly what the quote from the spec says. Oli Filth 20:38, 11 April 2007 (UTC)
Never mind! It's just confusing due to differences in endianness. —The preceding unsigned comment was added by 74.122.57.137 (talk) 21:09, 11 April 2007 (UTC).

Pseudocode Exponents

Can we find a better solution to represent "2 to the power of 32" in the pseudocode? (Currently represented as "2^32".) I looked at the code and couldn't figure out if it was supposed to be an exponent or an XOR operation. I had to go dig up a few external source implementations to confirm that it is "2 to the power of 32" and not "2 XOR 32".

I looked at the Wikipedia:Algorithms on Wikipedia article for guidance, but found no reference to XOR operations. Based on the way it is written, however, I assume that the two operations should be:

2 pow 32
2 xor 32

That would clarify which operation we're talking about. Does anyone mind if I change the pseudocode to match? --Jbanes 15:10, 26 April 2007 (UTC)

I'd be in favour of a change to the words "pow" and "xor" as per "mod" on Wikipedia:Algorithms on Wikipedia. Martin Hinks 16:35, 26 April 2007 (UTC)
It's done then. Hopefully the next person who comes along won't be as confused by it. :) --Jbanes 20:17, 26 April 2007 (UTC)

Under the algorithm section, you could also change the symbols into their engineering notations (being + is and, - is or, ' is not, etc.) instead of the \oplus, \wedge, \vee, \neg signs. --Snugg (talk) 02:29, 30 July 2008 (UTC)

One steop decryption

If anyone knows the RFC for the one step decryption of the MD5 algorithm please fix it, I can't find it. I am just using the number I found from another website for now - but it is wrong Comperr 02:58, 29 April 2007 (UTC)

Yes, RFC 3876 is unrelated to MD5. I can neither find an RFC for MD5x nor any other reliable definition of MD5x. It may not be a standard at all. In any case it is better to just delete that part than promoting wrong information. 85.2.53.198 16:16, 29 April 2007 (UTC)
True - I know it exists but I can't seem to find it...looking now —The preceding unsigned comment was added by Comperr (talkcontribs) 02:51, 11 May 2007 (UTC).

Example hashes

  MD5("The quick brown fox jumps over the lazy dog") 
   = 9e107d9d372bb6826bd81d3542a419d6

Even a small change in the message will (with overwhelming probability) result in a completely different hash, e.g. changing d to c:

  MD5("The quick brown fox jumps over the lazy cog") 
   = 1055d3e698d289f2af8663725127bd4b

Surely it would be better to use "dog" and "eog"?

The ASCII character codes for "c", "d" and "e" are 99, 100, and 101, respectively, or, in binary:
99 = 1100011
100 = 1100100
101 = 1100101

Using "c" changes three bits. Using "e" only changes one. The smallest change, in terms of bits, in the message would be using "e" instead of "d".

- Seanqtx 22:43, 11 May 2007 (UTC)

External Links change today

Hi, I wanted to let everyone (esp Feezo) know that I was the one who added the External link [2] to the code comparison wiki. I know that spam is a problem, and that there is a notice not to add External links, AND that I did that anonymously. I apologize. But I think it's ok to put this link back since the site seems to be working, and it wasn't spam to begin with. Then again, I was having a problem with the site all last week and the site looks a little sparse, so maybe we should wait a while before putting it back up. What do you think? -- C-Blade 00:10, 10 July 2007 (UTC)

I agree with the removal of the link. The target page only contains stub-like info, and there's already a list of links to language-specific implementations. Oli Filth 00:42, 10 July 2007 (UTC)

By the way, I noticed that one of the references: The Status of MD5 After a Recent Attack is broken now. I couldn't find another copy of it online. C-Blade 04:07, 10 July 2007 (UTC)

Link to useful app

I just found this useful program and I think it should be linked to from this article. It is a windows app that generates a md5 hash of any given file. I am not sure if it is appropriate, so I am posting here to see if anyone else will do it for me. The address is www.way2web.net/?md5app 220.101.108.93 08:00, 17 July 2007 (UTC)

I have built a simple webpage that calculates md5 and sha1 checksums, because I was constantly referring to the web to get those. The link provided in wikiedia is broken. This is my site: www.md5generator.tk or md5generator.awardspace.com. It would be nice if you could include it on wikipedia. Thanks 190.64.107.104 (talk) 18:44, 13 February 2008 (UTC)

Odds to get the same hash for different messages ?

I could not find an authoritative source for my last addition to the article, please check it, and add a good source if you can find one. My figures come from various Usenet threads, for instance this one: [3]. Nicolas1981 16:07, 19 July 2007 (UTC)

I'm pretty sure it's 1/2128, not 1/264. Therefore I'm going to remove the addition for now (as Usenet is not a reliable source). Oli Filth 17:36, 19 July 2007 (UTC)

md5compress / pseudocollision

Writing the equation md5compress(I,X)=md5compress(J,X) is meaningless if you don't tell us the role of X in this equation, or how md5compress is related to the pseudocode below. -- 62.3.242.174 14:07, 29 August 2007 (UTC)

Is it any better after this edit? -- intgr #%@! 08:08, 30 August 2007 (UTC)

External Link Suggestion

I have a suggestion for an external link

Simple tool to calculate MD5 of a file on MS Windows

It is a very simple free program for MS Windows that allows you to right click on a file and get the MD5 checksum.

I tried some of the other external links to get a checksum of a file but I found some the md5 tools unhelpful or hard to use. This tool is very very easy to use and install. Like I said it is free and the author has no ads on his page which is also nice.

This is my first addition to a Wikipedia discussion page. I don't fully understand how this discussion works so use the link if you want, or don't use, I don't care. I found it useful and I was trying to give something back to Wikipedia which I use all the time.

71.42.178.154 15:13, 1 October 2007 (UTC)

Another link.

Microsofts' free download utility to calulate MD5 or SHA1 checksums JBadger169 19:41, 25 October 2007 (UTC)

An online MD5 calculator link suggestion:

– Anylyze.com Online File Analyzer - Tools - Hashing page —Preceding unsigned comment added by Jospedia (talkcontribs) 09:48, 3 July 2008 (UTC)

Pseudocode for k-table is dubious

It is correct that the constants in the k-table were derived by the function given (based on sin(i)). But that is hardly a guarantee that they can be reproduced that way, unless the behaviour of the pseudocode sin() is specified in greater detail. The k-table should be defined by the actual constants used by the MD5 reference implementation. Athulin 06:18, 23 October 2007 (UTC)

Done
JohnAdriaan (talk) 08:34, 15 September 2011 (UTC)

The "google cracked MD5" comment

At the end of the "Vunerability" section I changed:

Supposedly, Google has also been reported to been able to decode MD5 hash to figure out passwords.

To:

The use of MD5 in some websites' URLs means that Google can also sometimes function as a limited tool for reverse lookup of MD5 hashes.[6] This technique is rendered ineffective by the use of a salt.

The referenced article is very clear about the fact that Google just happens to have picked up on MD5 hashes in webpages (along with enough context to figure out what key had been used for that hash), not that Google has deliberately decoded them. One comment talks about generating lookup tables as webpages for Google to index, although they didn't have much luck and would be restricted to some reasonable dictionary. Another comment talked hypothetically about how Google's many servers could be used to brute-force a single hash, which is certainly not worth mentioning.

I'm not even convinced that the text I've left now is worth mentioning, since it's less useful than other methods (as that article's author pointed out). It's simply an enjoyable gimmick (one that I certainly enjoyed), and further encouragement to use salting. Quietbritishjim (talk) 16:23, 25 November 2007 (UTC)

Code examples at Wikia:Code

I wonder if it would be helpful to include a link to code.wikia.com examples in different languages. Still looks a bit sparse, but maybe by adding the link, it will be filled out more. I definitely don't think the code examples belong in the article on Wikipedia, but they're a nice reference. C-Blade (talk) 08:05, 29 December 2007 (UTC)

Broken footnote link

I think that the link in footnote1 (next to the word "insecure") in the opening portion of the article returns a "page not found" error. If others are getting the same thing then I suggest that the footnote be removed. Thanks. --Wam067 (talk) 19:42, 22 February 2008 (UTC)

md5 breaker with GPU support

http://www.elcomsoft.com/md5crack.html acchieves 600 millionen passwords per second on a decent graphic card - looks scary. Worth mentioning in the article?--134.147.252.130 (talk) 13:10, 28 August 2008 (UTC)

md5 database

http://www.tmto.org/ 13:42, Oct 3 2008 (CEST) —Preceding unsigned comment added by 80.132.191.59 (talk) 11:50, 3 October 2008 (UTC)

LabVIEW implementation of MD5

I found this implementation and it's NI supported : LabVIEW MD5 --JCFC (talk) 22:32, 5 October 2008 (UTC)

hashr - MD5 hash maker

hashr: Create hashes in more than 40 algorithms including MD5, in a website or firefox extension. —Preceding unsigned comment added by Rogeriopvl (talkcontribs) 00:03, 15 December 2008 (UTC)

PS3 Cluster decrypts!! MD5 hash

http://www.ps3fanboy.com/2008/12/30/researchers-use-ps3-cluster-to-reveal-internet-security-flaw/ —Preceding unsigned comment added by 217.136.26.207 (talk) 12:06, 31 December 2008 (UTC)

Why can some people not even get the most basic things right? MD5 is a hash function not a cipher. It is generally not used for encryption. The web page by the author of the attack is already referenced in this article, is much more detailed, easier to understand and above all correct. 81.62.38.43 (talk) 12:21, 31 December 2008 (UTC)

Is the recent PS3 attack a preimage attack or a collision attack?

I have not seen it made clear in any of the news articles I've yet read, so I'm hoping someone can find this out and include it in the article: does the recent faking of an SSL certificate represents a successful preimage attack, or only new usage of already known collision attacks? -- 192.250.112.200 (talk) 13:35, 31 December 2008 (UTC)

A collision. They're using dead space within the certificate to jig it so the MD5 signature passes. --Blowdart | talk 13:45, 31 December 2008 (UTC)
Check out reference 5 "MD5 considered harmful today" of the this wikipedia article. The authors of the attack put a lot of work into writing a web page that is detailed and comprehensible. News articles on the other hand are (as you write) usually unclear and leave interesting details out. 92.106.157.89 (talk) 16:41, 31 December 2008 (UTC)

Verisign claims during the RapidSSL fiasco aftermath

Tim Callan's post goes to great lengths to explain that the SSL system has been fixed. However, this is not factual:

  • The SSL system remains broken as long as there is at least one Certification Authority still issuing MD5-signed certificates. The researchers mention at least one other CA other than RapidSSL that still uses MD5.
  • Even after the issuance of MD5-signed certs has ceased, rogue CA's set up using certificates issued up to that poit will still be able to issue new rogue certificates, indefinitely (or until the expiry date on the rogue certificate, which is under the attackers' control).

That blog should not be used as a source. —Preceding unsigned comment added by Rdancer (talkcontribs) 19:05, 31 December 2008 (UTC)

If you're going to call them liars you need to prove it. Whilst I agree with your points, CRLs negate the second point (and pretty much all browsers and most OSes now support them) and making the fake cert is not simple (as the researchers state). Whatever your beef with VeriSign the blog is an official blog and so would be a reliable source --Blowdart | talk 19:18, 31 December 2008 (UTC)
Yes, CRLs will allow to revoke the rogue certificate, once and if it is identified, that is true. If you don't use CRLs, you are asking for trouble, that's true. It will however take time before the certificate is noticed and reported, it will take time to process. Whenever a short window of opportunity is sufficient for the attack -- perhaps a single transaction -- CRLs can't help you. We can debate how smart it would be for somebody to spend expertise and time upfront plus $1,500 per certificate to enable them to snoop on a few hours of traffic with a big financial institution's online website, but we must respect the fact that it is doable. Apart from this grossly misleading blog post, I have no beef with Verizon. I suggest it would be prudent to treat the whole post with suspicion, and not as a sole source. Also, it seems less than ideal that Wikipedia be linking to a factually wrong article. If there is less controversial source, we should use it. rdancer (talk) 05:46, 1 January 2009 (UTC)
Verisign could revoke their own root keys that are using MD5RSA. This would make all certificates that have been issued by this root key invalid. Hence it is not up to the attacker to decide when his/her rogue key expires. I agree that claims made by Verisign need to be confirmed by a second party. They were lazy and needed a real "kick in the butt" before stopping to use MD5. However, I don't see evidence of a real lie in their post. E.g., can you provide us with MD5 signed certificate that was issued recently? Is there any reliable information about when root keys using MD5 will/have been revoked? 85.3.120.185 (talk) 10:22, 1 January 2009 (UTC)
It's not the root keys that are the problem - and no-one has stated the Verisign root certs use MD5; it's that a certificate issued using MD5 as its checksum mechanism can be faked to be root cert - remember it gets checked via the checksum algorithm within the cert - so even if the true root has SHA256 that doesn't matter, the fake cert says "Check me using MD5" so the checking mechanisms will obey that. The problem we have with confirmation by a second party is how? Verisign would have to list every certificate issued since they said they stopped and someone would have to check them. Frankly you're asking the impossible.
Out of interest I fired up the certificate console on Vista; for classes 1 & 3 the root cert uses md2RSA, but the primary root cert, the one they issue SSL certs under is SHA1. Thawte is all MD5 for their root cert; but look at the ones they've issued me those certs are SHA1. Equifax however issued me with one using MD5. --Blowdart | talk 11:05, 1 January 2009 (UTC)
Ok, I clearly wasn't taking about root certificates. As you said the signature there is not that important. What I said is that if you used your key to generate signatures using MD5 then you should revoke that key now. I hope this is what Verisign is indeed doing, but I have no confirmation of them doing so. It is not impossible to show that Verisign has revoked their keys. All we need is their signature for the key revokation. 62.203.2.31 (talk) 14:47, 1 January 2009 (UTC)
Except cancelling the root key will, of course, cancel any certificate derived from it, including the root CAs. If you read the paper you'll see that the authors state "Our method required the purchase of a specially crafted digital certificate from a CA and does not affect certificates issued to any other regular website.". So unless there are other "specially crafted" certs from vulnerable root CAs it's not a real worry, they've been very explicit in stating this - you could even, as a CA, watch for those types of certs and reject the request. If you simply mean any signature you generated from an X509 certificate you have which uses MD5 is vulnerable, that's true, but that's been known for years and isn't new. --Blowdart | talk 15:06, 1 January 2009 (UTC)
Of course, I read the paper. Note that right now we must assume that there are other rogue certificates out there signed by Verisign. They may not look like the ones crafted in the attack. Do you know a reliable way to distinguish these rogue certificates from legitimate ones? I don't. I only see two solutions: (1) don't accept any signatures using MD5. (2) Revoke CA keys that are no longer reliable because they were used to sign using MD5. Yes, it may be hard to revoke your keys. But if someone is not careful about what they sign, this may be the only solution, unfortuantely. I hope you do not think we should wait for another attack that allows even more freedom choosing the messages before accepting that the keys have been compromised.
Regarding the problem with the text in the article. Verisign cleary said they stopped using MD5, but as you agree we can't confirm this. It is customary in the news to make clear that this is a claim by Verisign only and has not been confirmed independently. Adding "said that" does in no way mean we don't trust them and is more accurate than just posing a claim as a fact. 62.203.2.31 (talk) 15:45, 1 January 2009 (UTC)

info hash - infohash

There are no current WP articles about "info hash", which is used with torrents. It should at least be mentioned in this article, with links to more information. The passing mention in BitTorrent protocol encryption seems to be the only current content. There should be more user-relevant content along these lines:

"The info hash is strictly for the bittorrent client. The BT Client you use uses the info hash to make sure that what you are downloading is legit from the server's point of view. You don't do anything with it." -96.237.7.209 (talk) 13:48, 18 January 2009 (UTC)

Actually it shouldn't be here; according to the BitTorrent protocol the info hash is a SHA1 value. --Blowdart | talk 13:54, 18 January 2009 (UTC)

A Small Question about MD5

If md5 ALWAYS output a 32 character long string, doesn't it mean that there are infinite strings that will match with the same combination? --80.179.184.162 (talk) 18:08, 23 March 2009 (UTC)

Indeed it does. But finding hash collisions is a non trivial computational problem in the general case. There are very few special cases. —EncMstr (talk) 00:01, 24 March 2009 (UTC)
All hash functions have collisions, by definition. A hash or fingerprint is not a lossless compression. If that is not yet the case the article should explain hash vs. compression, and cryptographic vs. general hash. Cryptograhic hash vs. pseudo-random number generator would be also nice - AFAIK that is completely unrelated (if I'm wrong please add a reference explaining this detail to the article). -82.113.106.31 (talk) 04:08, 8 February 2011 (UTC)
The theory is that, yes, all hash functions have collisions — but collisions in cryptographic hashes are so rare, that finding them with Earth-bound technology is infeasible. Of course this does not apply to MD5 anymore, since the algorithm has been broken. Secure hash functions, like the SHA-2 series, are still collision resistant for practical purposes.
The cryptographic hash function article — linked from the second link in the article — explains these things. Explaining all those concepts in every hash function article (SHA-1, SHA-2, MD4, MD2, RIPEMD, Whirlpool, Tiger and many more) would mean lots unnecessary of duplication. -- intgr [talk] 15:07, 8 February 2011 (UTC)
Makes sense, thanks. But cryptographic hash functions apparently says that any cryptographic hash can be used as PRNG. With a collision for two different MD5 output values a PRNG based on h(n+1,x) := h(h(n,x)) for h(1,x) = MD5(x) cannot reach all possible output values. –79.246.39.1 (talk) 03:56, 8 May 2011 (UTC) (same user as above)

MD5 not recommended for new applications

A while back I edited the article lede to say that MD5 was broken and should not be used in new applications, and it was reverted for being opinion/commentary. I think something pretty close to what I wrote is defensible as the consensus of the crypto community, not just my opinion. Here are some third-party sources on MD5's security; the bold quotes in effect say new applications should not use MD5 (or its successor, SHA-1):

The thing is that the article lede is noting only that certain uses of MD5 are broken, not that use of MD5 in new applications is generally not recommended, full stop. The bold quotes recommend moving away from MD5 (or SHA-1) without limiting to applications that require collision resistance. That may be because the collision attacks serve as certificational weaknesses -- they indicate that the algorithm might have other, as yet undiscovered, flaws, possibly even second-preimage or preimage attacks. As Schneier is fond of saying, "Attacks always get better; they never get worse." [4].

Some of the same links above say not to panic about the collision attacks (and they're right), and some say not to replace MD5 in most applications, but I don't think you'll find reputable sources recommending MD5's use in new applications.

So, is it kosher to say "MD5 is not collision-resistant and is generally not recommended for use in new applications," referencing the sources of the bold quotes? If not, is there appropriate language that's stronger than what's there now?

24.7.68.35 (talk) 03:44, 2 April 2009 (UTC)

Each of your bullet points makes a better statement than the final conclusion, because each of the bullet points that you list are precise statements, stating exactly who was doing/saying what at what time and they are all verifiable statement. The conclusion "MD5 ... is generally not recommended for use in new application" however is an unprecise statement and can be regarded as an opinion, even if this opinion is held by a vast majority. Generally it is preferable to make factual statements and just let the reader make his/her own judgement on the matter. If some people think that after all the attacks against MD5 it is still ok to use for example HMAC-MD5, then that is their problem. It is not the purpose of wikipedia to make recommendations. NIST etc. can do a much better job here. 92.107.159.108 (talk) 18:18, 15 April 2009 (UTC)
Eh, I meant "generally not recommended" as a factual statement -- a description of what most cryptographers recommend, supported by that list of references -- not as a position that I'm taking/Wikipedia's taking. That is, I was aiming for a statement like "most biologists believe humans developed through evolution" -- not Wikipedia endorsing evolution but Wikipedia describing what biologists verifiably think. Still, fair enough that it helps to make it explicit who's doing the recommending in the text. I'll put something in that reflects that.
24.7.68.35 (talk) 04:07, 21 April 2009 (UTC)
is it kosher to say "MD5 is not collision-resistant and is generally not recommended for use in new applications,"
New applications considering to support SHA-1 instead of MD5 would be a bad idea, SHA-1 is not really better. You could say "deprecated in cryptographic applications", that's harsher and precise. As a general hash function MD5 is still fine, only crytographic uses are deprecated. Let alone abuses, I never figured out why MD5 is used as pseudo-random number generator (PRNG) in some applications. -82.113.106.31 (talk) 03:53, 8 February 2011 (UTC)

how can one make two diffrent strings with the same MD5 hash?

or you may give an example —Preceding unsigned comment added by 85.250.197.2 (talk) 23:32, 31 May 2009 (UTC)

 d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89 
 55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b 
 d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0 
 e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
 and
 d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89 
 55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b 
 d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0 
 e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70 
 Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4. from [5] Save monkey love 4 me (talk) 14:26, 16 February 2010 (UTC)

Rainbow tables, hash collisions, and salts

I have put the rainbow tables and Google indexing in a subsection to make it clear that salting doesn't resolve the larger issue of hash collisions. I'd like to see a reference that explains why salting helps in the specific case of MD5 (I think I see why, but I, a lay person, am not convinced that there are no shortcuts). —Eric S. Smith (talk) 00:25, 22 November 2009 (UTC)

HBGary attack was NOT a result of MD5 cryptographic vulnerabilities

The last sentence of the summary says that Anonymous hacked into the HBGary website by "exploiting MD5 vulnerabilities". This is wrong. As the referenced article indicates, HBGary stored its password file using unsalted MD5 hashes. However, passwords stored with ANY hash algorithm are vulnerable to rainbow table attacks if the hash is not salted. That is, it wasn't any cryptographic weakness of MD5 that was responsible for the HBGary breakin - it was the poor security procedures of HBGary itself. We should remove this sentence. Sashaman (talk) 02:05, 19 February 2011 (UTC)

I pretty much have to disagree with you - failing to use a salt is a bad practice, but using a known-broken hash for the password table is a worse one. Failing to use a salt with your known-broken has is worse than making either mistake alone, of course. Having said that, the material in question does not belong here under any circumstances, since it's not related to the hash function by anything more than incidental circumstances. I removed it. Gavia immer (talk) 02:12, 19 February 2011 (UTC)
Thanks for removing it. Note, though, that your statement "failing to use a salt is a bad practice, but using a known-broken hash for the password table is a worse one" is not correct for a number of reasons. First, as this article points out, the only non-theoretical vulnerability of MD5 is a lack of collision resistance, but password hashing is NOT vulnerable to collision attacks (see collision attack scenarios). Second, it would be much easier to crack, for example, a database table of passwords stored with unsalted SHA256 hashes than salted MD5 hashes. Indeed, my original point was only that the attack against HBGary was unrelated to any MD5 vulnerabilities. The same attack would have been possible had HBGary used unsalted SHA256 hashes instead. Sashaman (talk) 05:06, 19 February 2011 (UTC)
> First, as this article points out, the only non-theoretical vulnerability of MD5
> is a lack of collision resistance, but password hashing is NOT vulnerable to
> collision attacks
First, finding two passwords such that MD5(pwd1) == MD5(pwd2) is very relevant under some threat models (why build the tables if I can find collisions?). I'm very suspect of the claim that password hashing is not subject to an attack. Second, MD5 provides a theoretical security level of 264, which is well below the recommended level of 2112. For those who claim MD5 is safe to use (secure), they should provide a reference. I know that NIST, ISO/IEC and ECRYPT will not provide the reference - you'll have to find a practicing cryptographer. I don't believe folks like David Wagner or Alfred Menezes will provide a reference either - so best of luck to you. —Preceding unsigned comment added by Noloader (talkcontribs) 07:47, 15 May 2011 (UTC)

External links

JFTR, I've added MD5 Homepage (unofficial), the famous MD5 link collection for developers in various programming languages. If some overzealous bots or admins remove the link please add it again until it sticks. -82.113.106.31 (talk) 03:34, 8 February 2011 (UTC)

Well, that didn't take long, User:EncMstr removed the link without comment after I added this section to the talk page. -82.113.106.31 (talk) 04:53, 8 February 2011 (UTC)

Removal of external links is almost always explained by the external links guideline. For a simple revert like that (for something we see thousands of times a day), there is no easy way to add a descriptive comment.
The external link add little to the article. The application links are the only new information, but the section seems to be dated: six of the first ten links I tried (randomly selected) are dead. —EncMstr (talk) 05:42, 8 February 2011 (UTC)
Actually I wanted to check if this page uses the current MD5 test suite 1.7, or a link to the older version 1.6 as on the MD5 Homepage (unofficial). Surprisingly there were no external links to resources for developers at all in the article (years ago that used to be different). For a minimalistic approach one link to the "MD5 Homepage (unofficial)" is better than nothing, and nothing (for developers) is what you have now in the article.
I'll now add a link to the MD5 test suite, because it covers obscure MD5 bugs in some RFCs (= officially verified and published RFC errata). Please copy all other working links from the "MD5 homepage (unofficial)". -82.113.106.28 (talk) 08:53, 8 February 2011 (UTC)
Somewhat unrelated, I replaced the shaky "AUTH48" reference by the now published RFC 6151, and added this RFC also to the external links (it updates RFC 1321). -82.113.121.52 (talk) 11:59, 7 March 2011 (UTC)

Application of MD5 and 'Citation needed'

If I recall correctly, MD5 is no longer recommended for [security] use by most sanctioning bodies (if not all), including NIST, NESSIE, ECRYPT, and ISO/IEC. Claiming that MD5 can be made 'secure' by including a salt or using multiple applications (ie, 'key stretching') is misleading at best. I understand that MD5 is widely used by Free and Open Source Software (FOSS), but that does not make it secure either.

MD5 provides a theoretical security level of 264. After nearly 20 years, there is clearly a difference between MD5's theoretical and practical security level. In contrast, the current accepted security level is 2112, which is achieved by 3-key TripleDES and SHA-224. For those who observe security levels, December 2010 marked the end of 280 - 2-key TripleDES and SHA-1.

Folks who claim the algorithm is secure if peppered with a salt or multiple applications must provide a reference - other than, "that's what everybody else uses" or "that's what WWW uses".

Jeffrey Walton 01:48, 15 May 2011 (UTC)

I took out the tags because salt (or key stretching) will usually improve things somewhat, even with a bad algorithm. There's no claim that this will make MD5 'secure'. That said, I would actually propose removing those two sentences entirely, since they don't reflect any recommended security practice that I know of. Feezo (send a signal | watch the sky) 12:00, 15 May 2011 (UTC)
>I would actually propose removing those two sentences entirely
That's actually what I wanted to do, but I was worried I would offend the free software world. Adding a 'citation needed' was a compromise. 96.244.81.47 (talk) 23:39, 15 May 2011 (UTC)
It's cool, I've taken it out. Feezo (send a signal | watch the sky) 08:40, 16 May 2011 (UTC)
AFAIK MD5 and SHA-1 are still okay for uses in HMAC, but that's irrelevant for your fix wrt salted MD5 applications — a horrible example is the "beerware" BSD $1$ or Apache $APR1$ algorithm. –89.204.153.164 (talk) 09:46, 31 May 2011 (UTC)

Archiving of talk pages

Archiving of talk pages used to be a perfectly simple job, limited to the "permalink method" for unregistered users (= cannot create subpages for the "subpage method"). I recall a case where I used this method in 2011, when some users speculated that this simple job requires a bot. Today on this page it was rather tricky, some "unconstructive edit"  filter blocked me from blanking sections 1…39 in what is now the first permalink archive.

I reported the bug (if it is a bug) and used "plan B", blank the 39 sections individually. For some short sections it worked, for some longer sections it didn't work (= potentially unconstructive edit  blocked). No harm done, the talk page now simply still contains obsolete sections: Anything before MD5 with SHA-1? should be blanked, please check if registered users can do this. If even registered users cannot refactor a talk page the "unconstructive edit"  filter is broken. –89.204.137.163 (talk) 05:26, 17 July 2011 (UTC)

Seems like a lot of work just to archive a talk page. If you don't want to create an account, (which would solve your problems with the filter) you could ask another editor to do it for you. You should also be aware that excessive editing creates problems for people trying to verify the contents of the page. I'm not saying you did anything wrong, but from a bookkeeping perspective, the best way to clean this up would be to revert your edits and then archive the page as usual. Feezo (send a signal | watch the sky) 06:10, 17 July 2011 (UTC)
Well, if you want to know it I'll explain why I cannot create an account without violating WP:SOCK of an intentionally "destroyed" account five years ago. In any case please try to revert anything after my "step 1 of 39", and then try to blank the remaining obsolete sections as a registered user. If that doesn't work undo your revert, then it is clearly a bug in the filter, and my "manual" attempt already went as far as possible for any user. –89.204.137.133 (talk) 06:19, 17 July 2011 (UTC)
I think you're using an overly strict interpretation of WP:SOCK—multiple accounts are only a problem if they're abused. If you disclose the name of your former account on your new user page, it's very unlikely there will be any problems. Feezo (send a signal | watch the sky) 06:30, 17 July 2011 (UTC)
The reasons why the creator of Template:! left en:Wikipedia under "right to vanish" are still valid, but admittedly he or she (might be me) could follow your advice on m: and mediazilla: - but it might require a steward to join the accounts destroyed in 2006 with a new single-sign-on account on commons: created in 2011. Back to the issue at hand, I've saved this section on "my" talk page, you can try your revert approach. Or just try to blank the remaining old sections, it would be good to know that the new (2011) filter issue is limited to unregistered users. –89.204.137.133 (talk) 06:44, 17 July 2011 (UTC)
What does that template have to do with anything? Anyway, accounts can't be merged, and since this page is < 40 KB, there's no pressing need to archive it. Feezo (send a signal | watch the sky) 07:00, 17 July 2011 (UTC)
Obviously you refuse to test the refactoring of talk pages as a registered user, and you prefer to keep this messy page as it was before I tried to refactor it. Sigh…–89.204.137.133 (talk) 07:09, 17 July 2011 (UTC)
Of course, my cunning plan was to keep this talk page as messy as possible :) Feezo (send a signal | watch the sky) 07:16, 17 July 2011 (UTC)
Hopefully the folks investigating the bug report check out if the current behaviour of the filter is as it should be, from my POV it still worked on Wikipedia Talk:Book sources earlier this year. That you like this messy talk page better is perfectly okay, but I'm now more interested in this edit filter bug or feature. –89.204.137.133 (talk) 07:49, 17 July 2011 (UTC)
There is no bug. The filter was created last month. Feezo (send a signal | watch the sky) 07:58, 17 July 2011 (UTC)
Interesting, refactoring talk pages (never mind this case, where you didn't like it - I'm talking about the filter) used to work for all users, is there a new policy? My policy knowledge is very old (2006) and vague. Quick caveat for others: If you try to refactor this page start from scratch, User:Feezo reinserted the moved section at its original place (section 17). –89.204.137.133 (talk) 08:10, 17 July 2011 (UTC)

If you want the page archived, there's nothing stopping you from creating an account. I can understand how you could be upset that the software is no longer allowing you to do something you used to do, but Wikipedia changes like everything else. If you leave a popular public website for five years, should you be surprised that things aren't exactly the way you left them? Feezo (send a signal | watch the sky) 08:25, 17 July 2011 (UTC)

I have no general problem with new features, and your use of {{unindent}} is very nice, I didn't know this template. It's also nice that {{tlx|unindent}} still works as it used to work in 2006. If the edit filter was introduced last month this has nothing to do with 2006, it is apparently a new modification of what some users can do. Refactoring talk pages is not per se dangerous (cf. book sources example), therefore I expect that this modification is documented in a new policy (or anything in this direction). Please simply say no if you're sure that this is not the case. –89.204.137.133 (talk) 08:44, 17 July 2011 (UTC)
In the sense that last month is chronologically between five years ago and now—yes, it does. You can read about the edit filter extension at WP:FILTER, and a link to the specific filter you encountered is given in my original reply. I hope that this allays your concerns. Feezo (send a signal | watch the sky) 08:53, 17 July 2011 (UTC)
Thanks, I missed your link (age, glasses, the works), and clearly namespace 1 (Talk, here) is not the same as 5 (Project talk, book sources). So now this weird business has to be documented on WP:REFACTOR or better WP:ARCHIVE, or  the filter has to be disabled as not exactly necessary, or  if the best approach is unclear I can note it in the pump triggering more informed guesses. –89.204.137.133 (talk) 09:34, 17 July 2011 (UTC)
Kudos, you were certainly very neutral about the minor detail that you belong to the selected few creating these abuse filters :-) –89.204.137.133 (talk) 12:12, 17 July 2011 (UTC)
Feel free to bring it up at the pump, but my guess is that the consensus will be to leave things as they are. Edit filters aren't normally mentioned in policy pages—not because there's a rule against it, but because they're supposed to be transparent. Your situation is relatively unusual; most editors interested in maintenance issues like page archiving create accounts. It's too bad that we can't give full technical access to everyone, but this would result in an unacceptable level of vandalism. If you look at the log for that filter, I think you'll agree that overall it's doing a fine job. Feezo (send a signal | watch the sky) 18:43, 17 July 2011 (UTC)
Yes, but now after I began to grok the details see the 2nd paragraph of this section, I'm not against a filter affecting only unregistered users (= working as designed, not broken). Documenting this in WP:ARCHIVE is only a wish for transparency (this particular filter cannot hide any secrets, it's designed to be very fast and simple). –89.204.153.227 (talk) 11:32, 18 July 2011 (UTC)
One of several things I still like about en:w: is its SoFixIt style of doing things. The WP:FILTER business is now (or maybe was, if reverted) documented on WP:ARCHIVE. –82.113.99.166 (talk) 13:18, 23 July 2011 (UTC)
Yep, it's the power of the wiki :) I've edited your wording slightly; feel free to tweak it again if you want. (We could also cite the specific filter like a reference.) Feezo (send a signal | watch the sky) 15:36, 25 July 2011 (UTC)

Formatting of pseudocode implies a bug

In the pseudocode section, the (lack of) indenting in the different blocks makes the code incorrect. As given, the //Main loop section effectively does 63 nugatory assignments to f and g, and only the results of the 64th are actually processed in the next block that starts with temp := d. Surely the temp := d section needs to be indented so that it is inside the for loop?

I'd do it myself, but I'm not sure if the section after that, which starts //Add this chunk's hash..., should also be part of the for loop and indented.

Of course, this is compounded by the previous for each 512-bit chunk loop also not having its children sections indented: this time, though, it is more obvious that everything from there up until just before the final var char digest[16]... section belongs.

Or maybe there just needs to be explicit end for statements inserted at the relevant positions.

JohnAdriaan (talk) 04:32, 15 September 2011 (UTC)

I went ahead and implemented the algorithm, getting the given results for the given text. I also tested against Cygwin's implementation, and got the same results. I then went back and indented the for loops and added end for keywords also.
JohnAdriaan (talk) 08:13, 15 September 2011 (UTC)

POV claim

User 208.54.5.209 has put a POV flag on the page together with the claim

Entry seems confusingly tilted in favor of viewpoint of anti-MD5 hackers

As far as I can see there are no explanations why 208.54.5.209 thinks the page is biased. After reviewing the page I do not understand this claim. In particular:

  • Since MD5 is a cryptographic hash function it is important to address its security. Discussions of cryptographic primitives typically also contain results that are of theoretical nature.
  • The sections discussing weaknesses of MD5 are large because there is a large amount of literature on the subject, not because there is some bias.
  • The requirements and the severity of the attacks are appropriately stated. In most cases the complexity for an attack is explicitely given.
  • Recommendations against using MD5 for cryptographic applications are properly cited.

Therefore the POV flag is in my opinion not justified and can be removed. 178.195.230.127 (talk) 06:13, 10 April 2012 (UTC)

Agree. The fact that MD5 is broken is, in my opinion, a very important fact to document, and warrants this amount of content as proof. 99.187.241.4 (talk) 01:02, 19 April 2012 (UTC)
I agree too. Security problems with MD5 need to be discussed, and as far as I can see they're in their own section anyway, which is pretty unobtrusive. I removed the POV tag, if the person that put it there (or anyone else) disagrees then they should discuss it on this talk page. Quietbritishjim (talk) 09:16, 21 April 2012 (UTC)

C source code broken

The psuedocode for the MD5 algorithm defines an integer array, k via a table or a sin function. The C program/function posted uses k without defining it. On compilation, following the comment at the top of the code, I receive the following error:

$ gcc -o md5 -O3 -lm md5.c
md5.c: In function ‘md5’:
md5.c:103:21: error: ‘k’ undeclared (first use in this function)
md5.c:103:21: note: each undeclared identifier is reported only once for each function it appears in

--128.8.120.3 (talk) 21:39, 10 December 2012 (UTC)

Yup. The code is buggy. Besides the undefined array k the code also suffers from not being endian neutral. Also copying the message before hashing is unnecessary and not very elegant since one does not want to make extra copies of potentially confident input data. Not sure if having code in the article is helpful at all. 178.195.225.28 (talk) 20:39, 20 December 2012 (UTC)

Figure is wrong

The figure uses the fraction of the message Mi. In the pseudo code and also in the RFC Mg is used with g depending on the round equals i, (5i + 1) mod 16, (3i + 5) mod 16, or (7i) mod 16. I don't know how to edit the figure, otherwise I would to the corrections.--213.135.238.211 (talk) 14:52, 8 February 2013 (UTC)

Thanks, but please spell out exactly what needs to be changed. Are we talking about the figure at File:MD5.svg which is in the MD5#Algorithm section with caption starting "Figure 1. One MD5 operation. MD5 consists of 64 of these operations"? If you have an SVG editing program (like Inkscape) you could download the file, then edit it, then upload the new version. The file is actually at commons:File:MD5.svg, and the download/edit happens there, and you probably would need an account. Alternatively, if you spell out what needs to be done, someone here might do it (I could, but it's a long time since I thought about MD5 and I would need to check the RFC). Changing Mi to Mg would be easy, but would something about g be needed, as per your comment? If so, could that extra text be in the caption? Johnuniq (talk) 23:01, 8 February 2013 (UTC)
I would suggest the following changes in the figure: Mi replaced by Mgj(i), s replaced by sj(i) and F replaced by Fj. The text I would adjust to this: "One MD5 operation. MD5 consists of i=0..63 of these operations, grouped in four rounds j=0..3 of 16 operations. Fj is a nonlinear function (F, G, H or I); one function is used in each round. Mgj(i) denotes a 32-bit block of the message input, and Ki denotes a 32-bit constant, different for each operation. gj(i) differs for each round and each operation. sj(i) denotes a left bit rotation by sj(i) places; s varies for each round and each operation. <symbol> denotes addition modulo 232." — Preceding unsigned comment added by 80.144.96.182 (talk) 11:10, 9 February 2013 (UTC)
Thanks for that, but I am now hoping we get comments from other editors as I'm concerned about the complexity for a reader of putting too much detail into the diagram. The suggested replacement for Mi has two levels of subscript–that's readily achievable, but would it actually help? Perhaps just fixing the caption would be a better option: the text could hint that Mi depends on more than i, and that might be sufficient? Johnuniq (talk) 11:19, 10 February 2013 (UTC)
I can understand that. Then I would suggest that you just replace Mi by Mg and change the text this way: "Mg denotes the selection of a 32-bit block of the message input (g depends on i), and Ki denotes a 32-bit constant, different for each operation.". But somehow it should be mentioned in the main text, what is going on there, because form reading, there is a gap between the description and the pseudo code.213.135.238.211 (talk) 09:40, 11 February 2013 (UTC)

MD5 test suite

Just let it stay, the presence of this link was already a compromise here after somebody insisted on removing the link to an admittedly outdated "unofficial MD5 homepage" years ago. Nobody would be interested to create a list of verified RFC errata related to MD5 here, the test suite covers it, and the collection of test vectors from various public sources (mostly RFCs) can be easily adopted to any programming language. I've removed the "external links" check request, because that was obviously done in September 2013. There's another "dmy" request (1st line) with no visible effect, I have not yet checked what that is. –82.113.106.176 (talk) 23:28, 8 February 2014 (UTC)

License

What kind of licence is MD5 under? Can it be used in properterial software?

I don't believe MD5 is patented, so you wouldn't need a license to use it. You might need a license to use Rivest's source code (in the RFC), though, since it's copyrighted. Some pieces of proprietary software (such as mIRC) use various prewritten libraries to perform MD5 hashing, so you might be able to use one of those libraries. -- Olathe November 17, 2003

"md5-announcement.txt" is the announcement from RSA Data Security that MD5 is being placed in the public domain for free general use. Anyone may write a program implementing the MD5 algorithm for any purpose.

RSA has written a reference implementation which is the source code in this directory. This source code is copyrighted by RSA. Here are the few copyright restrictions *with using this source code*. There is no restriction on any code which implements MD5 that you write yourself.

RSA's MD5 disclaimer

Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All rights reserved.

License to copy and use this software is granted provided that it is identified as the "RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing this software or this function.

License is also granted to make and use derivative works provided that such works are identified as "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing the derived work.

RSA Data Security, Inc. makes no representations concerning either the merchantability of this software or the suitability of this software for any particular purpose. It is provided "as is" without express or implied warranty of any kind.

These notices must be retained in any copies of any part of this documentation and/or software.


Indeed, today we'd say that this is some kind of CC-BY license, common courtesy for centuries. All those detailed "cite web" references with quotes in Wikipedia are also attributions, not only verifications. –82.113.106.176 (talk) 23:58, 8 February 2014 (UTC)

Link to IBM p690 is broken

I am just trying to report a broken link IBM p690. According to IBM (http://www-03.ibm.com/servers/eserver/pseries/hardware/highend/p690.html) the p690 series is no longer on the market.— Preceding unsigned comment added by Kenster (talkcontribs) 2006-02-12

No idea what the problem was eight years ago, but FWIW I've redirected and extended the hopeless IBM p690 stub. –82.113.106.176 (talk) 00:32, 9 February 2014 (UTC)

Algorithm description unclear

What happens if the original message length is just under some multiple of 512? You won't have room for the 64 bits. Do we then pad to the next multiple of 512? — Preceding unsigned comment added by 82.141.130.38 (talk) 10:33, 8 August 2014 (UTC)

Yes, that is exactly right. If there isn't room for the 64 bits, both md5 and sha256 use zero padding to the next multiple of 512, and the 64 bits are put at the end of that last 512-bit block (which is otherwise all zeros). How could we make this clearer for the next reader? --DavidCary (talk) 18:39, 9 December 2014 (UTC)

Collision reproduction

I cannot reproduce the MD5 hash for the Collision vulnerabilities section. I have tried the message blocks as is, without spaces, and without spaces and newlines. The reference points to a broken link[1].

I tried reproducing it with md5sum (GNU coreutils) 8.23:

echo $codeblock | md5sum
echo $codeblock | tr -d " " | md5sum
echo $codeblock | tr -d "\n " | md5sum

194.75.78.178 (talk) 14:08, 10 April 2015 (UTC)

  1. ^ Eric Rescorla (17 August 2004). "A real MD5 collision". Educated Guesswork (blog).
It's your lucky week, I recall the article, search engines confirm that it's no hallucination, and I added the collision to a MD5 test suite a decade ago, where it still works as expected. REXX code:
  • ignore function TEST(), it only counts errors (= unexpected outcomes)
  • ignore x2c() for hex. to bytes, and bitxor() for what you think it is
  • ignore MD5(), because you have your own implementation
   /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
   X =   'd1 31 dd 02   c5 e6 ee c4   69 3d 9a 06   98 af  f9 5c'
   X = X '2f ca b5 87   12 46 7e ab   40 04 58 3e   b8 fb  7f 89'
   X = X '55 ad 34 06   09 f4 b3 02   83 e4 88 83   25 71  41 5a'
   X = X '08 51 25 e8   f7 cd c9 9f   d9 1d bd f2   80 37  3c 5b'
   X = X 'd8 82 3e 31   56 34 8f 5b   ae 6d ac d4   36 c9  19 c6'
   X = X 'dd 53 e2 b4   87 da 03 fd   02 39 63 06   d2 48  cd a0'
   X = X 'e9 9f 33 42   0f 57 7e e8   ce 54 b6 70   80 a8  0d 1e'
   X = X 'c6 98 21 bc   b6 a8 83 93   96 f9 65 2b   6f f7  2a 70'
   C = x2c( X )
   Y = '79054025255fb1a26e4bc422aef54eb4'
   TXT = 'MD5 collision test, 6 of 1024 bits modified'
   BAD = BAD + TEST( MD5( C ), Y, TXT '- see also at URL:' )

   X =   '00 00 00 00   00 00 00 00   00 00 00 00   00 00  00 00'
   X = X '00 00 00  80  00 00 00 00   00 00 00 00   00 00  00 00'
   X = X '00 00 00 00   00 00 00 00   00 00 00 00   00  80 00 00'
   X = X '00 00 00 00   00 00 00 00   00 00 00  80  00 00  00 00'
   X = X '00 00 00 00   00 00 00 00   00 00 00 00   00 00  00 00'
   X = X '00 00 00  80  00 00 00 00   00 00 00 00   00 00  00 00'
   X = X '00 00 00 00   00 00 00 00   00 00 00 00   00  80 00 00'
   X = X '00 00 00 00   00 00 00 00   00 00 00  80  00 00  00 00'
   C = bitxor( C, x2c( X ))      /* toggle 6 bits of 1024 =16*8*8 */
   TXT = 'www.rtfm.com/movabletype/archives/2004_08.html#001055'
   BAD = BAD + TEST( MD5( C ), Y, '<http://' || TXT || '>' )
Have fun, and if you can please post pseudo-code for PHPASS(), the MD5 code by Solar Designer, I never managed to get this right, after in essence all RFC examples and seriously weird stuff like APR1. –Be..anyone (talk) 04:52, 13 April 2015 (UTC)

The pseudocode is missing variable "g" declaration

I've noticed that the variable g is not defined in the pseudo-code before its use, while all other are. Would anyone know the proper definition and type for that variable? --185.112.167.100 (talk) 08:07, 4 December 2015 (UTC)

It looks like the work variables (F, g, dTemp and possibly more) are not declared because they do not need to be initialized. Each has a type that can be inferred from what it stores—for example, i is an integer from 0 through 63, and g is an integer calculated from i. It is pseudocode, not a compiled program. Johnuniq (talk) 10:08, 4 December 2015 (UTC)

Acronym MD

Acronym MD The following text was added to the lead by NetBlues on 12 January 2017 (diff):

The MD5 hash function receives its acronym MD from its structure using Merkle–Damgård_construction.

Computer people are well known for enjoying humor and there well may be a backronym explanation for what "MD" means, but given MD4 (which has the same unsourced claim diff) and RFC 1321 a good source for that assertion is needed. The text should be removed from here and MD4 if no reference is available. Johnuniq (talk) 23:46, 13 February 2017 (UTC)

The acronym MD actually refers to the term Message Digest, rather than Merkle–Damgård_construction. While Merkle–Damgård_construction is used in MD5, I could not substantiate that it was used in any previous iterations of MD* algorithms, leading me to now believe that there is no correlation between the names, and the acronyms for each are a coincidence. My original assertion is withdrawn. NetBlues (talk) 15:27, 20 September 2017 (UTC)

External links modified

Hello fellow Wikipedians,

I have just modified one external link on MD5. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 14:43, 10 January 2018 (UTC)

Figure wrong

The figure uses the part of the message Mi.

But in the code and the RFC it is Mg with g either i, (5i + 1) mod 16, (3i + 5) mod 16 or (7i) mod 16.

Additionally the rotation of inside the figure should be si an not s only

I cannot correct the wrong Mi in figure but maybe someone else can.--158.64.4.213 (talk) 10:10, 15 March 2019 (UTC)

Comment on content management systems is irrelevant

The section on MD5 security notes "As of 2019, one quarter of widely used content management systems were reported to still use MD5 for password hashing."

However, this is irreverent, because the security of password hashes are not impacted by collisions, they are impacted by the speed of hashing. MD5 is a very fast hash, so it's no longer appropriate for password hashing. I propose moving this to another section, and will probably do so within a few days unless I hear otherwise. Simsong (talk) 01:40, 11 September 2020 (UTC)

little endian

bit76543210 is 8bit
  %10000000 = 128 or 0x80
bitFEDCBA9876543210 is 16bit
  %0000000010000000 = 128 or 0x0080

but what language is the pseudocode suposed to be with that ':=" sign ?? 85.149.83.125 (talk) 15:49, 4 March 2021 (UTC)