Wikipedia:Reference desk/Archives/Computing/2010 April 25

Computing desk
< April 24 << Mar | April | May >> April 26 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 25

edit

Impossible hashes

edit

Do the MD5, SHA-1 or SHA-2 hash functions have any strings of the same length as their output which it has been proven they won't output for any input, or will output only for inputs longer than the hash? (A use for such a string would be disabling a user account without deleting it on a system that wasn't designed to make that possible -- just set the "hash of the password" to that value, and then it's safe even if someone breaks the hash function.) NeonMerlin 03:30, 25 April 2010 (UTC)[reply]

"*" is traditional in the world of /etc/passwd. I think the easy proof that it works is that it's the wrong length, which doesn't fit your criteria, but does make it clearer what is going on.
Being able to find out anything about the documents that could hash to a particular value is a weakness in a cryptographic hash so the answer is "hopefully not". Skimming the vulnerabilities discussed in our MD5 article, I didn't see any that would allow one to generate an unreachable hash, however. Paul (Stansifer) 04:15, 25 April 2010 (UTC)[reply]
There is a trivial probabilistic argument that those strings almost certainly exist, but as Paul Stansifer says, if you can find one you will have in some sense broken the hash. 69.228.170.24 (talk) 07:20, 25 April 2010 (UTC)[reply]
Per above, if such a value could be found, it would indicate a weakness in the hash function — It would not be complete. For example, if there are several 160 bit inputs that map to the same SHA-1 output, then there are situations where every output bit does not depend on all input bits. See also Correlation immunity. decltype (talk) 23:08, 25 April 2010 (UTC)[reply]
The problem with using '*' is that in my app at least, SHA224 hashes are stored in a MySQL binary(28) column, and MySQL pads input that's too short (so '*' would become 0x280000...0000) and truncates input that's too long. I could make the column nullable, but an unreachable value would have the same effect without costing an extra bit per row. I'm allowing UTF-8 strings of up to 128 bytes as passwords. NeonMerlin 06:18, 28 April 2010 (UTC)[reply]
Just use 0xDEADBEEF in all 7 words. That's a standard sentinel value. Nobody will ever choose a password that hashes to it by accident, or even (if the hash function is secure) find such a password on purpose. 06:36, 28 April 2010 (UTC) —Preceding unsigned comment added by 69.228.170.24 (talk)

Rainbow table size

edit

I have a password (not on Wikipedia, and I'm not saying where) that's 15 characters long and consists of numbers and lowercase letters. How large would a rainbow table have to be to crack it? NeonMerlin 03:45, 25 April 2010 (UTC)[reply]

With all upper/lower case letters and numbers (in ASCII) there are 26+26+10=62 characters per position. Because not all positions are filled, you have to add "null" to make it 63 characters per position. However, null is special in that no non-null character fill follow null. If every possible combination (which obeys the no character after null rule) is used. you have 62^15 combinations with all 15 characters used, 62^14 combinations with 14 characters used, 62^13 with 13 characters used, etc... So, you have 62^(15+14+13+12...) which is 62^120 possibilities (or 1.22e215). Add to that one completely null combination. -- kainaw 04:16, 25 April 2010 (UTC)[reply]
ab + ac isn't a(b+c).   = 781 514 782 079 074 318 856 775 915, which, while well short of 62120, is still roughly 289. Orders of magnitude (data) suggests you have little to worry about.—Korath (Talk) 08:33, 25 April 2010 (UTC)[reply]
And with just lower-case and letters, you have   = 227 390 317 427 040 025 268 341, about 277.5. Still infeasible. —Korath (Talk) 08:36, 25 April 2010 (UTC)[reply]
That said, a proper rainbow table doesn't need a separate entry for for each combination. —Korath (Talk) 10:30, 25 April 2010 (UTC)[reply]

AES and multiple encrypted copies

edit

Does a plaintext encrypted with the Advanced Encryption Standard algorithm become significantly easier to crack if the attacker has multiple encrypted copies, each produced by a different key, but the keys were all generated independently? NeonMerlin 04:45, 25 April 2010 (UTC)[reply]

Well, it depends on the encryption mode. If you just mean multiple ECB encryptions of a single block, unless I'm missing something youre fine if the keys are random and AES meets its security specification, of being a pseudorandom permutation. For other modes, who knows. If you want to learn some crypto theory to be able to answer this kind of question yourself, you might like to read Phillip Rogaway and Mihir Bellare's online notes.[1] 69.228.170.24 (talk) 07:23, 25 April 2010 (UTC)[reply]
To your original question, no. If the keys are independent, then there's no decreasing level of security with a subsequent encryption. If there was then if I was an attacker I'd just re-encrypt it over and over til it was sufficiently weak. On the other hand, if those keys are related, then now you're talking about a different type of attack. Yes, there are some concerns with it, but I don't know much more beyond that. I think you're probably thinking more about concerns with triple DES, which are slightly similar on face, but fundamentally quite different. If I'm misunderstanding please tell me and I'll try to fix it. Shadowjams (talk) 08:13, 26 April 2010 (UTC)[reply]
On the problems with ECB with similar/related keys, this blog post is relevant. --Mr.98 (talk) 02:39, 27 April 2010 (UTC)[reply]
That's not about multiple keys, it's about ciphertext malleability. ECB mode is a building block for more complex protocols. It's not a reasonable way to encrypt messages. You really have to know what you're doing to choose modes with the right properties for your application, but for simple message encryption with authentication (and you do want authentication), EAX mode or GCM mode eliminate the most common errors. 69.228.170.24 (talk) 06:11, 27 April 2010 (UTC)[reply]

Weakness of overly long hashes?

edit

It occurred to me tonight that once the hash length exceeds the number of possible plaintexts, increasing the length of the hash function might actually decrease the theoretical security of hashed and salted passwords by making the salt more feasible to guess. If we use a 128-bit hash function to store a password of up to 19 ASCII characters, this makes just under 2^125 hashes possible for a given salt value, meaning that a cracker with a huge precomputed table can eliminate seven eighths of the possible salt values for every known hashed and salted password, even without knowing any plaintext passwords. This means he needs 43 hashes to crack a 128-bit salt. Granted, the table would be 2^5511 bits long (2^(128*43) possible lists of hashes times 128=2^7 bits per hash, neglecting the relatively small amount we'd save by skipping lists that contain duplicate hashes) and take 2^253 iterations of the hash function to generate; but if we stored only 120 bits of the hash then it wouldn't be possible even then. Further, if we switch to a 256-bit hash function, then each known hash narrows the salt space by 2^131, and cracking a 256-bit salt requires only two hashes and a table occupying 2^520 bits. Is my analysis correct? NeonMerlin 05:51, 25 April 2010 (UTC)[reply]

tldr, but the idea of cryptography normally is to pick your security parameters so that the attacker has an unfeasible amount of work to do, rather than worrying about attackers with unbounded resources. Those collisions are useless if the attacker can't exploit them within the lifetime of the universe. 69.228.170.24 (talk) 07:29, 25 April 2010 (UTC)[reply]
Yeah. Even if your analysis is correct, "only... a table occupying 2^520 bits" is rather a lot. The information capacity of the observable universe is only 2^305 bits. See Orders of magnitude (data). —Korath (Talk) 08:22, 25 April 2010 (UTC)[reply]
I was thinking this earlier but screwed up the calculations so confused myself and didn't respond somewhat unnecessary now with the existing responses but in the same vein I guess... Let alone 2^5511 which is so big that neither the Bing or Google calculators handle it (2^1023 being the limit for both). The Windows one does so I know you're discussing 1.18363e+1658 bytes. If you can find something to store and handle that then great. In the mean time...
Of course you have a lot of time to find a solution to your problem since even if you can compute 1 billion iterations of the hash per second, which sounds like an awful lot, you'd still need 4.58663e+59 years (which is 3.299746e+49 times longer then current upper bound estimated age of the universe) to generate your table requiring 2^253 hash iterations.
Nil Einne (talk) 11:29, 25 April 2010 (UTC)[reply]

Edit gzipped file without decompressing?

edit

Under what conditions, if any, would it be faster to perform a search-and-replace on a gzipped text file by inspecting and updating the compressed data than by unzipping, performing the search-and-replace and rezipping? NeonMerlin 06:28, 25 April 2010 (UTC)[reply]

If decompressing the file would make it too large to fit into memory, and thus require paging or some other method to store the excess data, that may be a point where keeping it compressed is the better strategy. Of course, if it's possible to decompress only a portion at a time, such that it fits into memory, then this changes the equation. StuRat (talk) 06:36, 25 April 2010 (UTC)[reply]
What is the application? In a pure sense the answer is "almost never", but zlib lets you insert synchronization points into the compressor output, so that you can (for example) edit blocks in the middle of it. This has a slight cost in compressed output size. 69.228.170.24 (talk) 07:32, 25 April 2010 (UTC)[reply]
That's a good point about decreasing the efficiency of the compression. If the replace string is the same length as the original (specifically if this is true of the compressed versions), then that should work better. StuRat (talk) 07:39, 25 April 2010 (UTC)[reply]

Geographical Information software: OS OpenData

edit

Hello. Can anyone recommend me a free (preferably open-source) and simple GIS which will be capable of processing all the UK government datasets that have just been released? Thanks! ╟─TreasuryTagperson of reasonable firmness─╢ 10:40, 25 April 2010 (UTC)[reply]

Try Data.gov.uk. Kittybrewster 11:16, 25 April 2010 (UTC)[reply]
Er—no. Was my query poorly-worded? I'm looking for a Geographical Information System: software with which to load the date released by Ordnance Survey. ╟─TreasuryTagvoice vote─╢ 12:10, 25 April 2010 (UTC)[reply]
I expect you have already looked at Comparison of geographic information systems software, including the links at the end of the page. 89.243.216.99 (talk) 12:49, 25 April 2010 (UTC)[reply]
Yes, I have, but it's not sufficiently clear to indicate which software is able to process which formats of data :( ╟─TreasuryTagFirst Secretary of State─╢ 13:45, 25 April 2010 (UTC)[reply]
Quantum GIS should be able to open most, if not all of the formats listed. I know it can do ESRI shapefiles and (I think) TIFF images, plus any software should be able to import ASCII text files. It isn't the most user-friendly program (the manual for the latest version is 270 pages) but I think it's easier to use than most other GIS programs under active development -- plus it's free and open-source. I would be happy to respond to QGIS questions through e-mail, if some come up while you're working. Xenon54 / talk / 14:06, 25 April 2010 (UTC)[reply]

A problem with Facebook

edit

Why do my friends who have type 1700 friends Facebook tell me that they only have 400 and not see all the other 1300? If by the "show all", tells me that the person has only 400 friends e. .. but it is not so and more than 400 I do not display them ... Makes me with all my friends. For example, a friend of mine who has 287 friends I have only displayed 285. I have a friend who has 490 friends. I clicked on "490 friends" of his board or the "show all" of its board, but Facebook has shown me only its first 398 friends and all others not as well not exist. Perhaps that might be Facebook, with the new change, it makes that you can see only a limited number of friends of your friends because if not, with new subscriptions to facebook, the server becomes overloaded.? What do you think? Let me know. Hello and thanks in advance. —Preceding unsigned comment added by Plantigrados (talkcontribs) 16:51, 25 April 2010 (UTC)[reply]

I am having trouble understanding your question. Do you mean your friend has written on their Facebook page that they have 1700 friends, but you can only see 400 of them from your account? Do you mean your friend has written on their Facebook page that they have 1700 friends, but they have told you (in an email/chat for example) they only have 400? Or do you mean they said they had 1700 friends, but you can only find 400? Finally, does it really matter? 1700 friends is one big christmas card list! Astronaut (talk) 17:13, 27 April 2010 (UTC)[reply]
  Resolved

Say I have a text file with the data "1" "2" "3" etc all on new lines. How would I call this data to display using php, so that it would display something like:

<font color='red'>1</font><br>
<font color='red'>2</font><br>
<font color='red'>3</font><br>

and if I changed the data in the text file to say "1" "2" "hello" it would display something like:

<font color='red'>1</font><br>
<font color='red'>2</font><br>
<font color='red'>hello</font><br>

To clarify, the text file itself would look like:

1
2
3

Thank you 82.43.89.71 (talk) 16:52, 25 April 2010 (UTC)[reply]

There are many ways to do this. Here is an easy one that is pretty flexible:
<?php
ini_set('auto_detect_line_endings', true); 
$lines = file($filename);
foreach($lines as $line) {
    echo "<font color='red'>$line</font><br>\n";
}
?>
That loads the contents of $filename into an array named $lines, then just goes through each line in the array and outputs it in your preferred formatting. The first line just tries to compensate for the fact that different file formats use different line endings (and will make your life effortlessly easier when reading from text files in this way). --Mr.98 (talk) 17:11, 25 April 2010 (UTC)[reply]
Squee!! Thank you :D 82.43.89.71 (talk) 17:30, 25 April 2010 (UTC)[reply]

Computer math operations

edit

How does a computer perform addition and other operations. I assume it's done by the ALU, but I don't know how it physically works. I'd guess that each number is transformed into it's binary equivalent, which you can represent by having a switch on or off, but (even if this was right), I don't know what happens next. Would calling the computer to do an arithmetic operation be similar to calling a program, in that it goes into the memory location, etc.?

Also, would stuff like integration be done by the ALU or a program you write? Could you just point me in the broad direction for how someone would write a program for integration? Do I have to convert the function into Taylor series?

Thanks! 66.133.196.152 (talk) 17:51, 25 April 2010 (UTC)[reply]

For an explanation of the how hardware performs the simplest arithmetic operation, addition, see Adder (electronics). As for where the two numbers to be added are located, some computers require that they first be moved from memory into a register, while other computers allow addition directly from memory locations.
As for numerical integration, we have an article about that. There are many methods, but I would be very surprised if there is any ALU or CPU that can do integration directly; a program would be needed. Jc3s5h (talk) 18:20, 25 April 2010 (UTC)[reply]
(edit conflict)If you're talking about integer arithmetic, addition is done by a circuit in the ALU, the adder. Negation is done by taking the two's compliment. Subtraction is done (often) by negation followed by addition. Integer multiplication is performed by a number of schemes described at binary multiplier and linked articles; likewise division (digital) describes how division is done. As shifting is very easy, many multiplication and division implmentations have shortcuts that try to do some or all of the work by shifting, as applicable; similarly as division and multiplication by small integers is common (you'll find many more programs that ask for division by 3 than by 197) many processors have special shortcuts for a select few integers (that are implemented using the algorithms described above, but with some of the steps hardwired). Most modern cpus had ADD, SUB, MULT, DIV, and NEG instructions. For non-integer operations, floating point operations are implemented in most CPU's "floating point unit", a part of the chip that does arithmetic ops for floating point numbers. Some applications (such as number theory and cryptography) need accurate handling of very large integers (which overflows the CPU's ordinary arithmetic unit, and for which the FPU isn't sufficiently accurate); these use fixed-point arithmetic schemes; unlike the others fp arithmetic is mostly done in software. More advanced operations, like logarithms, exponents, and the values of functions like sine, are done using software that builts on the elements described above (usually floating-point); the specific wikipedia article for a given function will often describe the computer algorithm that calculates it - see Trigonometric functions#Computation for example. SOmetimes these are mathematical expansions like Maclaurin series, but very often they'll use special algorithms that take into account the computational characteristics (and the need for accuracy, or not) of the computer and the desired application. -- Finlay McWalterTalk 18:28, 25 April 2010 (UTC)[reply]
High-level summary: numbers are stored as a binary representation all the time. A rather large proportion of the primitive CPU instructions are the four basic math instructions (things like "add two floating point numbers", "divide two integers", etc.). The logic gates that accomplish these instructions look a great deal like elementary school arithmetic, but in base 2 (this makes some things much simpler: the 100-element multiplication table we're used to now has only 4 elements). In many cases, you'd be able to match up individual wires to individual digits in the intermediate results that base-2 schoolchildren would get. (However, in order to go as fast as possible, the design will usually turn out to be more complicated than that.) Paul (Stansifer) 04:26, 26 April 2010 (UTC)[reply]
Besides numerical integration there is also symbolic integration (a not very helpful article right now). Symbolic integration means your program accepts a formula like f(x)=3x2 and integrates it to give the formula x3+C. This can be done with pencil-and-paper methods or with fancy algebra (Risch algorithm) or some combination of both. 69.228.170.24 (talk) 10:00, 26 April 2010 (UTC)[reply]