Wikipedia:Reference desk/Archives/Computing/2013 August 20

Computing desk
< August 19 << Jul | August | Sep >> August 21 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 20 edit

Windows 8.1 question edit

From the official Microsoft anouncement " on October 18th ..., Windows 8.1 and Windows RT 8.1 will begin rolling out worldwide as a free update for consumers with Windows 8 or Windows RT devices through the Windows Store. Windows 8.1 will also be available at retail and on new devices starting on October 18th by market. "

It mentions getting it from the Windows Store of from retail places. What about normal Windows update, since it is supposed to be a free upgrade for W8 users? Bubba73 You talkin' to me? 02:32, 20 August 2013 (UTC)[reply]

Without a source, I can't say, but chances are it will come as an automatic update, or at least one prompted. All current updates I get for Windows 8 software comes through the store, though. I don't need to pay; it's just the channel that they use to send be update notifications. Mingmingla (talk) 02:51, 20 August 2013 (UTC)[reply]
Since it's changing from 8.0 to 8.1, it might be a much larger upgrade, which can't just be done in the background while you work. So, they might not want to use the same method as for smaller updates. StuRat (talk) 06:32, 20 August 2013 (UTC)[reply]
I think that they mean you'll be able to buy a full copy of 8.1 for installation on new devices at retail stores, not an 8 -> 8.1 upgrade disc. Jessica Ryan (talk) 11:48, 20 August 2013 (UTC)[reply]
And if the upgrade is in the Windows store, then it will probably be free. I agree that it seems strange not to mention Windows Update, although the store is used for updating apps now, so I guess it seems plausible that the update would be handled by the store updater. Jessica Ryan (talk) 11:50, 20 August 2013 (UTC)[reply]
I have Win 8 but I've never used the store. Bubba73 You talkin' to me? 14:21, 20 August 2013 (UTC)[reply]
From [1]:
"The Windows 8.1 update will be a free update to Windows 8 for consumers through the Windows Store. You won't need to leave the house to get it, but you will need an internet connection." This is said elsewhere, too. --.Yellow1996.(ЬMИED¡) 16:03, 20 August 2013 (UTC)[reply]

Library hygiene for child processes edit

Traditional Unix programming is single-threaded and (disregarding certain nonsense about spurious wakeups and bad packet checksums) operates entirely by blocking until an appropriate event occurs. Careful use of pselect() allows any number of simultaneous conversations with child processes, for instance, to be orchestrated without ever looping over a set of O_NONBLOCK file descriptors.

As seen in the waitpid() rationale, libraries are expected to use waitpid() (of which many variants exist like waitid() and wait4()) instead of wait() (or the obsolete wait3()) to avoid accidentally reaping other child processes. Is there any standard/effective technique for the case where a library organizes the work of many child processes?

Obviously a loop over all of them with WNOHANG "works", but it multiplies the inefficiency of polling with the inefficiency of waiting on each child separately. (A careful client process would invoke such a poll_children() only upon receiving SIGCHLD, but that is still very inefficient if multiple such libraries are in use!) Ideally, a clever library API would support the traditional Unix programming model (including communicating with the child processes as well as waiting on them) without assuming that it controlled all the subprocess-oriented business of the client program. (In particular, this disqualifies the strategy of, say, Twisted; in general, I'm not looking for a non-composable framework.)

This goal is already difficult to achieve in the select() case: I don't know a better solution than for each library to publish its set of file descriptors so that main() can select() across all of them at once. I thus have little hope of doing better in the child process case than to similarly push the multiplexing responsibility off entirely onto main(), which must then inform each library about the fate of its children (and perhaps maintain a table mapping child processes onto libraries!).

We can relax the requirements a bit; for instance, we could allow threading. (I think traditionalist Unix programmers would say that threads used for managing I/O and IPC are a lazy, inefficient solution, but this is a case where they bring elegance to the table as well.) Consider that it solves the select() case trivially: each library defines a blocking process_next_message() function and main() has but to create a thread for each. However, for child processes it is still complicated: the concept closest to a fd_set appears to be a process group, which is ugly for various reasons (it is externally visible, externally mutable, and designed for interactive job control rather than automated coprocessing).

One could also turn the child process problem into the select() problem by having each library create a single child process which then creates all the others and reports on them over a pipe, but that adds considerable complexity (consider the control protocol needed to ask for further child processes).

Subprocesses that work so closely with the parent arise particularly in the the fork()-but-not-exec() case, where it is more standard to simply use threads instead of subprocesses in the first place. However, in some cases subprocesses are indispensable (e.g., languages with a GIL), and (given the single-threaded Unix heritage) it surprises me that the facilities for dealing with them are so primitive.

Well: that got long, even merely mentioning simultaneous communication and organization. Any suggestions? --Tardis (talk) 06:37, 20 August 2013 (UTC)[reply]

Process handling is horrible on standard POSIX - the real fun starts when your subprocesses also launch their own subprocesses, and you want to be able to clean up the mess (and if necessary, SIGKILL runaway forkng processes). systemd might be an interesting example - they basically gave up on the standard POSIX APIs and switched to control groups to solve this problem. It's probably efficient, but not standard. Unilynx (talk) 20:21, 20 August 2013 (UTC)[reply]

RJ Cables edit

What is the difference between a RJ11-4P4C connector and a RJ10 connector?

Joneleth (talk) 09:09, 20 August 2013 (UTC)[reply]

See Modular connector - Q Chris (talk) 09:28, 20 August 2013 (UTC)[reply]

Problem is, i have a cable that is RJ45 in one end, and what seems to be either RJ11-4P4C or a RJ10 in the other end, I need to buy a new cable now and thus i need to know if theres any difference between the 2 connectors.

Reading through the Modular connector section does not answer this.

Joneleth (talk) 10:02, 20 August 2013 (UTC)[reply]

See this. --.Yellow1996.(ЬMИED¡) 16:06, 20 August 2013 (UTC)[reply]

Returns edit

Yesterday i typed in a celebrity's name and he got 40 million returns on Google. Today he has 9 million. Why did 11 million returns dissapear? 11:52, 20 August 2013 (UTC)

Your math seems really questionable. Joneleth (talk) 12:03, 20 August 2013 (UTC)[reply]

Google says "About", for example "About 11,000,000 results". I don't know the details but to save resources they make an estimate without going through all the pages containing at least one of the searched words. This often gives results which appear inconsistent. For example, adding further words to a search will often increase the "About" number. PrimeHunter (talk) 12:47, 20 August 2013 (UTC)[reply]

Google result counts are a meaningless metric.--Shantavira|feed me 16:12, 20 August 2013 (UTC)[reply]

A trick I learned is to go the the end of the search results, you'll probably find there are more like hundreds of results rather then millions. I've done this many times. Do the search again, and click on page 10. Then turn on page 14, then 18 and keep clicking. For example, the 1st celebrity that popped into my head that would have lots of results was Bono, on the 1st page google says "About 43,900,000 results (0.25 seconds)" , it only took me 43 pages to get to the last result and it now says "Page 42 of 420 results (0.40 seconds)".. This is NOT unusual, why it's SO far off so frequently I do not know. Mind you, I actually find it hard to believe there's only 420 results for someone like Bono who has been around for years and frequently appears in the media... Might have to do a bit more research. Vespine (talk) 04:43, 22 August 2013 (UTC)[reply]
Just tried that, with Justin Timberlake (first celebrity to come into my mind! ;) and I initially got About 227,000,000 results (0.25 seconds). I kept going like you said but wasn't getting any change. But... lo and behold on page thirty-four I got Page 34 of about 324 results (0.43 seconds). Very peculiar. --.Yellow1996.(ЬMИED¡) 17:50, 22 August 2013 (UTC)[reply]
Google will only display the first 1000 results of a search. And many of those are usually removed because they are too similar to others, often because a website has different url's for the same content. You don't have to keep clicking the next page to see how many there are are. After doing it once, just modify the url to say start=990 instead of start=10. PrimeHunter (talk) 13:37, 23 August 2013 (UTC)[reply]
Here's more strangeness - I searched for JT again, and first time it got 377 million (0.16 seconds) results. Then, I repeated the same search and got 220 million (0.32 seconds). Hmmm... --.Yellow1996.(ЬMИED¡) 16:30, 23 August 2013 (UTC)[reply]

RedHat 5 - gdb & pstack edit

I'm chasing a weird bug and I have a clue, but I don't know what it means.

I have a program that uses OpenGL and Motif running on Red Hat Enterprise Linux 5 workstations. On two machines, the program takes 5 minutes or so to come up, when on all the other machines (five of them), it comes up immediately. All seven machines are loaded the same and have the same hardware.

So here's my clue. If I run the process on either of the two "slow" machines under gdb, or do a pstack while it is hung, it comes up right away. I know this must be telling me something important, but I cannot figure out what.

Any ideas? Tdjewell (talk) 13:46, 20 August 2013 (UTC)[reply]

Isn't pstack just a gdb wrapper? Attaching debuggers does funny things with system calls, signals and interrupting them - what happens if you throw a STOP+CONT at the process, or some other funny non-fatal signal while it hangs? If the process is stuck in a system call, ps with the proper options should show it.
I'd think http://stackoverflow.com/ might have more luck at answering these types of debugging questions. Unilynx (talk) 20:11, 20 August 2013 (UTC)[reply]
For a first quick try I'd kill(1) the hanging process with a signal that causes it to drop core, then run the debugger on the core file to see where it is hanging. Also remove all environment variables (except DISPLAY and whatever is strictly necessary) before running the program, in case that makes a difference. And the good old-fashioned debugging method of sprinkling the code with printf statements. 88.112.41.6 (talk) 17:29, 21 August 2013 (UTC)[reply]

Encoding files by adding random data to binary files edit

To prevent the police from looking into files stored on my memory cards, I was wondering if there are simple programs that scramble files by adding random bits to the bits of binary files. I could, of course, do this myself using a hex editor, but I was thinking that there probably exist ready to use programs that do this. You would then send a file containing the random bits to yourself via one channel (e.g. email) and put the modified files on your memory card and carry that with you. And, of course, one can encrypt both these files, the splitting into two parts adds an extra layer of security. Count Iblis (talk) 17:42, 20 August 2013 (UTC)[reply]

That method is only secure if you use a new random bit for each bit that you want to encrypt. If you do so, the method is known as a one-time pad. If you Google for "one time pad software", you can find programs that use the technique. Looie496 (talk) 17:51, 20 August 2013 (UTC)[reply]
Thanks, that's exactly what I was looking for! Count Iblis (talk) 18:58, 20 August 2013 (UTC)[reply]
If you use cryptographic-quality pseudorandom bits it's a stream cipher, which is no less secure than a one-time pad. -- BenRG (talk) 22:59, 20 August 2013 (UTC)[reply]
Well, that's more or less the definition of "cryptographic-quality pseudorandom", isn't it? So this is tautological, and doesn't in itself answer the question of whether a cryptographic PRNG actually exists, and if so, what is an example of one. (My view is that they do exist, and actually aren't that hard to find — but it seems to be extremely difficult to prove this.) --Trovatore (talk) 01:09, 21 August 2013 (UTC)[reply]
Yes, I was just pointing out to Looie that there are such things as stream ciphers and they seem to be secure. -- BenRG (talk) 17:39, 21 August 2013 (UTC)[reply]
By definition if the number of bits needed to seed the PRNG is less than the number of bits in the longest message you will encode then it is less secure than a one-time pad. -- Q Chris (talk) 14:37, 21 August 2013 (UTC)[reply]
Why not use white noise? Count Iblis (talk) 14:56, 21 August 2013 (UTC)[reply]
White noise = random bits; the question is how to obtain them (and after obtaining them, how you hide them from the attacker). -- BenRG (talk) 17:39, 21 August 2013 (UTC)[reply]
In fundamental information-theoretical terms, yes, but that notion of security is useless in practice. Fort Knox is vulnerable to a brute force attack in which you keep sending people in until one makes it because all the guards are looking the other way and all the electronic systems are malfunctioning because of cosmic rays. Security is about making that unlikely, not impossible, and a probability of 2−128 is unlikely enough. -- BenRG (talk) 17:39, 21 August 2013 (UTC)[reply]
A few comments:
1) Coming up with the "truly random" bits to add is more difficult than you might think. Computer "random" functions actually generate pseudo-random numbers which aren't random at all, but just follow a pattern too complex to understand. However, another computer might figure out the pattern and crack the code.
2) The other obvious flaw is if the snooper manages to get hold of both halves of the message.
3) There are much shorter encryption codes than one bit of encryption key per bit of data, which are still unbreakable (either just with current equipment, or, with a long enough key, because the universe will end before it's mathematically possible to break it). StuRat (talk) 21:25, 20 August 2013 (UTC)[reply]
4. Encryption is really hard to get right and you shouldn't try to invent your own. Use a product like GnuPG or TrueCrypt. -- BenRG (talk) 22:59, 20 August 2013 (UTC)[reply]
Concur with StuRat and BenRG - a truly complex password would take ages to crack (see also: brute-force attack). --.Yellow1996.(ЬMИED¡) 00:43, 21 August 2013 (UTC)[reply]
I mostly agree with BenRG, with the caveat that it is conceivable that there are specific exploits built into computers by spy agencies to defeat known popular encryption. (That's paranoid speculation, but these days, paranoid speculation has been rendered respectable by things like PRISM and Lavabit and the Guardian hard drive smashing spree) But if you're that worried, you should do your homemade encryption first then encrypt that with a real product. Wnt (talk) 04:52, 21 August 2013 (UTC)[reply]
Yes, it should be an additional measure. To get the necessary big file with random numbers, you could use white noise. Pseudo-random generators can have probkems with generating a large number of bit if the seed numbers used are just a handful of bits long. You'll then end up repeating the same pattern or random numbers. The problem with a password protected file is that the password can fall into the hands of the enemy. A problem is also what Wnt mentions about computers being ceased. So, while the enemy may not be able to decrypt, they can prevent from you using it which may be good enough for them. Or you can be thrown in jail, and if no one else knows where the password is, they have shut your operations down.
So, you may want to keep the secret data online, so that it can't be easily physically destroyed by the authorities. But if all the information that one needs to decode the files including the passwords are online (so that the police can't prevent access to the files), then it's vulnerable to being read by the authorities. But if you have split the files up using the one-pad technique (including the onces containg the passwords), it will be far more difficult for the authorities to try out all possible combinations this could have been done. But you can easily communicate to someone the way the files should be combined. Compare that to giving a big password to someone who should use it in case something happens, it could well be that the person will have lost it many years later when it is actually needed. Count Iblis (talk) 14:10, 21 August 2013 (UTC)[reply]
As Bruce Schneier likes to point out when these discussions come up, the cryptographic primitives are basically never the weak point of real security systems. The point was made pithily in xkcd #538. -- BenRG (talk) 17:39, 21 August 2013 (UTC)[reply]
My few thoughts
  • As for generating random bits, I prefer to start with an analog input. You can write a script that analyses a picture or sound clip, and generate a bit string from that. If using that, you don't actually get 'random' bits, but they are not recognisable without the picture/sound.
  • Using the above method is potentially more secure (as long as you use a different source per encryption), as you can transfer the source file with less chance of the bits being discovered; without the exact method that the script used to generate the bits from the source, anyone who can intercept the two pieces is still missing the script.
  • Unless I'm missing something, you're inserting random bits into predefined locations of the file. How are these locations chosen? If they are in the same place on every file, it's fairly simple to (once unencrypted) remove the offending bits. If not, you'd need to remember where you put them (problematic in larger files).
  • My suggestion to resolve the above issue: generate a second bit string the same length as the file. For every 1, append a bit after that location. For every 0, skip. This has the effect of limiting added bits to 1 at a time (can be overcome with a slightly more complex algorithm), but increases security again. This second bit string could be generated from the same source as the first by using a script with different parameters. That way, one image or sound file contains all the information on the bit strings, but the manner of unpacking that data is only known by your script.
  • Both the original file with bits added and the bit source should be encrypted, preferably using different keys that are unique each time. The standard key selection rules apply. It's better if you could remember these keys rather than record them where they may be intercepted.
The above solutions would protect most elements of your proposition against readable interception. With the original file, interceptors have a pile of encrypted bits that are extremely difficult to decrypt; when using brute-force, algorithms will look for certain signs to flag possible outcomes. With the inserted bits, these patterns will not appear as regularly. With the file and the bit source (both encrypted), they'd have to perform a decryption of both, and them calculate not only what the bits inserted are, but where they were inserted. This basically produces a two-fold brute force attack, where each bit must be tested in each state in both files.  drewmunn  talk  15:49, 21 August 2013 (UTC)[reply]
If you insert bits between the plaintext bits you're leaving all plaintext bits in order in the output, which leaks a lot of information, probably enough to recover the whole plaintext. The usual information-theoretically secure way of combining the OTP with the plaintext is modular addition. This is a good example of why you should use cryptosystems designed by experts. (It's okay to implement them yourself as long as you test the implementation to make sure it follows the spec; both GnuPG and TrueCrypt come with human-readable file format specifications, though you will also need a source of good random bits, which is harder.) -- BenRG (talk) 17:39, 21 August 2013 (UTC)[reply]
I personally use industry-standard encryption for my projects; except for the project where I created my own, but that was an exercise. I think it's always important to think of other ways, even if you don't use it. That's how progress happens. Also, my proposition of encoding a bit string key in an image makes it a little harder from drugs to get it out of you.  drewmunn  talk  18:06, 21 August 2013 (UTC)[reply]

You guys have assumed from post 2 that the line "by adding random bits to the bits of binary files" means XORing those... taken literally Iblis wants to just append random bits to files, or, maybe add them on top... I'm not really sure. You guys seem to have gone down the same information hole that the RD is familiar with as usual. Shadowjams (talk) 04:16, 22 August 2013 (UTC)[reply]

I'm quite confident that my assumption was correct. We're not operating in a void here -- Count Iblis has a pretty sophisticated physics background, and the chances that he would suggest anything so silly are very low. Looie496 (talk) 15:22, 23 August 2013 (UTC)[reply]