Wikipedia:Reference desk/Archives/Computing/2015 December 2

Computing desk
< December 1 << Nov | December | Jan >> December 3 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


December 2 edit

Neural Nets, FOL, and Set Theory edit

I've been doing some readings in evolutionary psychology (EP) lately and one thing keeps nagging at me. If we accept the idea that the human mind uses rules of inference and has ontologies (relations between sets, subsets, part-of and other relations) to represent domains and knowledge then a natural question is how can such formalisms be represented in neural networks? The EP stuff I've been reading is filled with examples of various rules that are hypothesized to be part of the modular structure of the human mind as well as ontologies to represent things like kinds of animals, physical objects, social hierarchies, etc.

Really this question probably belongs more in general science because what I'm most interested in is the question of representation in human neurons but I thought I would start here because more is probably known about how (if?) they can be represented in computer neural nets. I have a good background in AI but its from the standpoint of expert systems, ontologies, logic, etc. I'm familiar with the basics of neural nets and neurons and synapses but just the basics. From what little I know representing things like if-then rules or sets and subsets via neural nets is not a solved problem. I seem to remember Michael Arbib talking about the importance of representing frames which I think were roughly what I would think of as a set (if you are familiar with frame based environments such as KL-One, Loom, or OWL/Protege) but its just a vague memory and it was a long time ago that I heard Arbib present some of his work and I could be remembering it wrong. My other impression is that the connectionist researchers tend to be not so interested in representing this kind of stuff but again just an impression not sure how accurate that is. Any opinions, pointers to papers, etc. would be welcome. --MadScientistX11 (talk) 02:42, 2 December 2015 (UTC)[reply]

Connectionists have been interested in that, but so far their ideas have been pretty much untestable because of limits on our ability to observe brain activity. People like David Touretzky and Paul Smolensky have worked on schemes for using neural networks to represent high-level structure -- if you go to the Parallel Distributed Processing books edited by Rumelhart and McClelland you'll find a number of chapters addressing those issues; our article on parallel distributed processing gives a few more relevant references. Looie496 (talk) 12:03, 2 December 2015 (UTC)[reply]
Thanks a lot Looie496 very helpful. I'll check those out. --MadScientistX11 (talk) 02:26, 3 December 2015 (UTC)[reply]

IPv4 and IPv6 edit

How can one convert between IPv4 and IPv6? Are there any IPv6 addresses that cannot be converted to IPv4? GeoffreyT2000 (talk) 04:20, 2 December 2015 (UTC)[reply]

IPv4 has only 232 possible addresses, while IPv6 has 2128 addresses. So, there's no way most of the IPv6 addresses could be converted to IPv4, unless you want many different IPv6 addresses to convert to the same IPv4 address. On the other hand, there's no reason why every IPv4 address can't be converted to an IPv6 address, by whatever method you choose. For example, if we were to phase out IPv4, such a conversion would be in order, so every IPv4 device could keep the same address under the new system. StuRat (talk) 06:00, 2 December 2015 (UTC)[reply]
Stateless_IP/ICMP_Translation specifies that IPv4 addresses can be mapped to IPv6 address that are 46 zero double-words, followed by ffff:0000, followed by the 8 bytes of the target IPv4 address. LongHairedFop (talk) 10:39, 2 December 2015 (UTC)[reply]

Does Excel (or its accomplices) change files if you just open them without saving? edit

Does any program of the MS Office package change the file that you opened for viewing only? Specially in Excel, would it change a number with leading zeros like 001, 002, ..., to 1, 2, ...? Or would it insert end-of-line characters or anything else? If the user does not press the save button, is he save from any automatic change? --Denidi (talk) 14:32, 2 December 2015 (UTC)[reply]

Not a technical expert, but from personal experience in a work context: if you open an Excel file created by a particular Version of Excel using a different Version, it may change some formatting automatically. Hopefully someone else can contribute a deeper and more informed answer. {The poster formerly known as 87.81.230.195} 185.74.232.130 (talk) 14:48, 2 December 2015 (UTC)[reply]
No, if you don't explicitly save, the file on disk should not be modified. It is possible that you will see unexpected things when you look at the display, but there should not be any alteration of the file stored on disk unless you specifically "save" it. You might get a warning when you try to close the program that changes will be lost, and an offer to save before exiting, but if you decline to do so, the file should stay the same as before. Looie496 (talk) 15:08, 2 December 2015 (UTC)[reply]
Excel is a presentation tool. It takes the file and decides how to show it to you. So, it can (and almost always will) change the information. It will round off numbers. It will change fonts if you don't have the font specified in the saved file. It will convert values to dates if they look like they might be dates. However, it doesn't change the actual file on the drive. 209.149.113.52 (talk) 15:30, 2 December 2015 (UTC)[reply]
If you open the file convert it to another format, and Excel saves it, then it changes the file. The same applies to Word. --Scicurious (talk) 19:25, 2 December 2015 (UTC)[reply]
The question states that the file is opened for "viewing only". How can you save it when it is opened for viewing only? 209.149.113.52 (talk) 13:32, 3 December 2015 (UTC)[reply]
If you are opening a text format like CSV in Excel, I believe the other answers here are correct: When you open a CSV file, Excel will make educated guesses about what number formats the text is using and show you the results of those conversions, but the file will only be modified if you save.
If you are opening an Excel workbook and notice the modified date of the file change, read this: Excel changes Modified date and time when you open the workbook. For some file formats and some versions of Excel, "when you open an Excel workbook, Excel writes the name of the current user to the header of the file. This is necessary so that other users receive the "file in use" notification." --Bavi H (talk) 01:55, 3 December 2015 (UTC)[reply]
Excel does change the file just by opening it, even if you don't save. To see for yourself in Windows, right-click on the spreadsheet and choose 'Properties.' Observe the modification date. Then, open the file without changing it and then observe the modification time again while the file is open. Note that it has changed. Likewise, you can observe that the MD5 checksum of the file will change after you open it. After you close the file, the modification time will return to the previous date, but the MD5 checksum will not. This is proof that Excel changes the file even if you close it without saving it.—Best Dog Ever (talk) 02:08, 3 December 2015 (UTC)[reply]
I'll admit I was skeptical, I just did a file compare on a excel file that I have which is only 200kb but includes some macros. I opened the file, enabled the macros and closed the file without saving, this is the result 102 differences: 1014 lines, 4466 inline differences in 1014 changed lines. Color me surprised! Vespine (talk) 23:46, 3 December 2015 (UTC)[reply]
You can simply open a spreadsheet (with no macros or date/time-based fields) then close it again (without changing, saving, or being prompted to) and Excel will change the file contents (but not modified time). This is easily verified by copying the file, opening it, closing it, then doing a binary comparison with the copy (which has not been opened since copying). You can use Windows command line tools fc.exe or comp.exe, or your preferred compare tool. Note that you need to do a binary/hex compare - "smart" tools like Beyond Compare may report no difference because by default they compare "interpreted contents", eg cell values, formulae in this case, although BC will report the difference if you open the files with "Hex compare". I've seen this with Office/Excel 97 and Office/Excel 2007 and .xls files (I haven't tried with .xlsx). 165.225.98.89 (talk) 00:56, 4 December 2015 (UTC)[reply]

Share my Location edit

Until a few days ago whenever I opened Google Maps in my browser (Firefox) the default map would be one of Birmingham (UK), where I live. However, it has now started centring the map on Edinburgh, about 300 miles to the north. I haven't been able to figure out any way to correct this, but Googling the problem suggests my assumed location is based on my IP address. I have a fixed IP address, so I used Geolocate to check it out, and that says I am in London, 100 miles in the other direction! This is mad. I've trawled through endless help files without success. Does anyone know how I can get Google maps to recognise my correct location please?--Shantavira|feed me 16:36, 2 December 2015 (UTC)[reply]

An extension (particularly one designed to enhance your privacy) may have blocked the geolocation API. Try starting Firefox in safe mode (which disables all extensions) to see if the problem persists. In addition, this extension says it allows you to manually set (whether accurately or deliberately otherwise) your location; I've not used it, so I'm not in a position to vouch for it. -- Finlay McWalterTalk 17:00, 2 December 2015 (UTC)[reply]

How much of the accessible information is already indexed? edit

Has Google, Bing or Yahoo released any stats about the amount of web-pages already indexed and web-pages that exist? --Scicurious (talk) 19:39, 2 December 2015 (UTC)[reply]

I see claims in various places that Google indexed about 30 trillion pages as of 2014 (e.g. [1]). Re total pages (including unindexed pages), Deep web (search) is the Wikipedia article, but it looks like it could use some work. There is probably no way to reliably estimate the number of unindexed pages. -- BenRG (talk) 00:52, 3 December 2015 (UTC)[reply]
What makes measuring the total number of indexed pages difficult to estimate? Wouldn`t a random pick of 1,000s of IPs or domains be representative? --Scicurious (talk) 13:49, 3 December 2015 (UTC)[reply]
This is getting old, but it is a Google blog about how much they've indexed and how much there is left to index. Basically, the search engines are working in "millions" of web pages and the Internet contains "trillions" of web pages. 209.149.113.52 (talk) 13:31, 3 December 2015 (UTC)[reply]
I'd say that search engines "were" working in millions, in a web of trillions of web pages. According to BenRG search engines already reached trillions of indexed pages. But maybe the number of pages has increased faster than the search engine's indexed.--Scicurious (talk) 13:49, 3 December 2015 (UTC)[reply]
I'll have to read the article again to see if it makes the clarification or not... "Indexed" means that the search engine knows the URL exists. That does not mean it is searchable. To be searchable, the web page must be analyzed and given some sort of keywords, tokens, etc... to allow a search query to say "Hey! I know a page like that!" Obviously, search engines want to make that distinction very very fuzzy. They want to say that they are your access to every page on the Internet. The reality is that they give access to a very very tiny part of the Internet. 209.149.113.52 (talk) 14:52, 3 December 2015 (UTC)[reply]
"Indexed" means that the search engine knows the URL exists. That does not mean it is searchable. That is incorrect. In this context,Web_indexing DOES actually mean to make it "searchable". In computer terms in general, to "index" something does mean to make it "searchable". If an email client "indexes" your email, or your operating system "indexes" your hard disk, in both those cases the content already exist and you CAN even perform a search on it without an index, but "indexing" makes the search a lot quicker, because instead of performing the search against the whole raw data you can search against a much smaller optimized index. Vespine (talk) 23:29, 3 December 2015 (UTC)[reply]

Firefox vs. Evercookie edit

I am very much unimpressed with Evercookie being something that would continue to seem like a viable business model. Still ... I understand that it would work with something like Chrome that is sold by an evil corporation and distributed mostly by drive-by download, but what about a real browser like Firefox?

As I understand, Firefox set to delete history on close (i.e. cookies) can delete the regular cookie and the Flash cookie. If cache is included to the history fields to delete, it should resist cache cookies -- not that I really understand how those work. I'm less clear on some of the more arcane things mentioned in the Evercookie article ... and in any case not too sure about my statements above.

As a bonus: why would a regularly updated copy of Firefox be meat for Panopticlick? Specifically, what is unusual about a user agent string like "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0", and is there a way to fix it? Likewise, how can a recently downloaded Shockwave Flash plug-in be incredibly unique?

N.B. despite frequent representations to the contrary, WMF is apparently not too reluctant to use such fancy hacking to help collect better data for its friends in the NSA. See Meta:2015 Community Wishlist Survey/Moderation and admin tools. Wnt (talk) 23:25, 2 December 2015 (UTC)[reply]

The Mozilla Foundation, like Google, is supported almost entirely by advertising revenue. If you're going to trust one of the big three browsers on that basis it should be IE. In reality, I think they are all pretty similar from a privacy perspective.
When the browser has a document in cache, it sends a request to the server to check if a newer version of the document is available; with that request it sends some identifier of the version it has, and that functions as a cookie if the server sends a unique identifier to every client. Clearing the cache should prevent that. Fingerprinting through browser history used to be possible but I think it's fixed (in all major browsers). Clearing all cookie-like data is hard because it's open ended: any browser add-on you have installed could potentially store something you don't know about. Private browsing mode is probably your best bet. Some fingerprinting techniques, like canvas fingerprinting, work even then, but probably can't uniquely identify you by themselves.
One possible explanation of Panopticlick's strangely high User-Agent entropy is that they keep old fingerprints for a long time or forever; Firefox has gone through about 40 major version numbers since Panopticlick launched, so even if almost everyone uses the latest version now it would still be a small fraction of all Panopticlick visitors. Of course, the same would apply to new plugin versions but not most other identifiers.
I can't divine the meaning of your last paragraph. -- BenRG (talk) 06:20, 3 December 2015 (UTC)[reply]
One of the leading options in that discussion is to implement Evercookie-like techniques to block users returning with a new account name -- and of course, there's no use in implementing it on blocked users if they don't collect the same data from everyone else. Once they've abused script access they've been given to collect a database about the readers, it is bound to fall into all the usual wrong hands. This would be infuriating and appalling even if they didn't make a big song and dance about how they're anti-NSA... Wnt (talk) 10:31, 3 December 2015 (UTC)[reply]
The proposal only asks that the Evercookie mechanism be investigated. In my opinion there is no chance that anything like that will be implemented. I can just imagine the publicity: if Wikipedia blocks you, it will try to hack your computer to enforce the block. Not going to happen. Regarding the statement that there's no use in implementing it on blocked users if they don't collect the same data from everybody else, that's false. It would be necessary to check all requests for the presence of a cookie, but that's not an intrusive act. Giving a cookie is intrusive, checking for one (an ordinary cookie, not an "Evercookie") is not. Looie496 (talk) 12:39, 3 December 2015 (UTC)[reply]
Evercookie is JavaScript. Using it to enforce a user block is silly. Do you really think that it would be difficult to block the Evercookie script? Even if the script was written to be required to make the website functional, all it takes is one visit to write a replacement script that provides the functionality without Evercookie - and then share that with everyone else. 209.149.113.52 (talk) 14:26, 3 December 2015 (UTC)[reply]
That sort of escape valve is actually convenient for those doing mass surveillance. The spies are administrators more than ideologues --- for example, they may rail against the notion that anyone might blaspheme them through the use of encryption, but so long as it is a few geeks on Tor they don't care; it's only when major market players start trying to secure the users' communications that they bring on the threat and bluster. In the meanwhile, the geeks' escape passivates them, keeps them off the case - but that doesn't mean if whatever tech they had weren't there, they'd be able to do it.
As for the "no chance" ... in what universe? I'd have thought there was no chance I'd hear even Republicans calling for registering Muslims and beating up protesters, let alone Obama calling for the no-fly list to be turned into a generic prosecutionless mechanism of revoking any and all rights without a legal proceeding. It used to be that people would reassure me for ten years about how something fascistic would never happen before it was enacted ... now it's more like six months. Reality is, Wikipedia will do whatever it can, as soon is it can, to aid the authorities around which the datacenters in Northern Virginia have sprouted, and the only input we have to the conversation is at a technical level. Wnt (talk) 15:03, 3 December 2015 (UTC)[reply]
And yet you seem to have no issue posting on Wikipedia about how Wikipedia is an arm of the fascist police state. You're not worried that the WMF is going to send people to knock on your door at 1 in the morning? --71.119.131.184 (talk) 21:13, 3 December 2015 (UTC)[reply]
Based on watching the WMF try to implement Visual Editor, Flow, and Mobile App, I have my doubts as to whether they they could empty water from a boot with instructions printed on the heel, much less find someone's door at 1AM. See User talk:Guy Macon#Depiction of Wikimedia Foundation destroying Wikipedia with Visual Editor, Flow, and Mobile App. --Guy Macon (talk) 15:56, 4 December 2015 (UTC)[reply]
The intent is/was obviously to set the supercookie on blocked users after they're blocked, not on everyone just in case they get blocked some day. Supercookie-blocked vandals would still be less trackable in principle than you already are, as they could use a different profile/browser/computer, whereas every Wikipedia page you've visited while logged in as Wnt is tied to the unique person that knows that account's password, no matter how often you change other things. -- BenRG (talk) 17:57, 4 December 2015 (UTC)[reply]
@BenRG: Actually, this gets at a side issue, which is that my understanding at least is that the same origin policy is a cookie thing, not necessarily extending to these other features (though some may emulate it - I don't really know enough) It's one thing to be tracked on one site, another to be tracked web wide. Wnt (talk) 21:51, 7 December 2015 (UTC)[reply]