Talk:Bush hid the facts

Latest comment: 2 months ago by Artoria2e5 in topic Self-published source

More lines edit

"i'll get the linux" also works, according to an old revision of the Easter eggs in Microsoft products page. —Woddfellow2| 22:56, 24 June 2008 (UTC)Reply

Bug edit

It's not really a bug. It's just a best effort guess that's not good enough. --193.43.89.206 18:30, 28 June 2007 (UTC)Reply

It's still a bug. Software bug: "A software bug (or just "bug") is an error, flaw, mistake, failure, or fault in a computer program that prevents it from behaving as intended (e.g., producing an incorrect result)." --Goblin talk 19:09, 12 January 2008 (UTC)Reply
You are incorrect, sir. Entering the Chinese characters and saving would be storing the exact same information, yet when the text file was opened up the result of showing Chinese characters would be correct. The bug is with the formats, not with the application that adheres to them. —Preceding unsigned comment added by 66.119.171.18 (talk) 00:42, 9 April 2009 (UTC)Reply
Yup, seconded. There is no way of determining the correct encoding without applying a natural language model. —Preceding unsigned comment added by 150.101.232.226 (talk) 04:06, 1 June 2009 (UTC)Reply
It's not a bug - a bug means an undesired behavior by programmers, something they didn't want the program to behave like. This is by design; the function IsTextUnicode is not buggy, it works as specified. —Preceding unsigned comment added by 188.112.72.45 (talk) 17:02, 14 March 2010 (UTC)Reply
It IS a bug. The correct implementation would have been to require the BOM on the new, incompatible encoding, ie UCS-2/UTF-16. There should NOT be a BOM on text in old encodings. Whether there should be one on UTF-8 is questionable, I don't think so because an IsTextUnicode to distinguish UTF-8 and CP1252 is almost impossible to get wrong, but others disagree. The fact that Microsoft thought there should be no distinguishing mark on the new encoding that is easily confused with the old one is a design error and thus a bug.Spitzak (talk) 22:23, 21 April 2010 (UTC)Reply
However, note that even though Microsoft Notepad saves BOM in UTF-16 files, it still has to be able to process text files storing raw text for compatibility purposes. The 8-bit text file for "bush hid the facts" is short enough to cause ambiguity in encoding detection. In newer versions of Notepad, Microsoft slightly altered the detection heuristic. 2A01:119F:220:9400:58C3:7988:DAF7:149F (talk) 14:37, 12 February 2023 (UTC)Reply

Cleanup edit

Why is there a 'cleanup' sticker on this, it seems fine. I am removing it.Seth J. Frantzman 18:06, 17 August 2007 (UTC)Reply

What do the Chinese characters say? edit

just a nonsensical string —Preceding unsigned comment added by 204.174.12.18 (talk) 22:42, 5 May 2008 (UTC)Reply

The characters are: 畂桳栠摩琠敨映捡獴. You can see one analysis at Wikipedia:Reference_desk/Archives/Language/2008_September_25#Pseudo-Chinese_question.... AnonMoos (talk) 06:12, 8 October 2008 (UTC)Reply

Sounds like a lie edit

I don't buy it. If this sentence were terminated because of the amount of letters in the four words, it's all a lie. I tried typing "reagan hid the facts" and the same happened. The only thing I can think is that Microsoft doesn't like people typing negative things about their favourite politicians when editing an html-page, or what looks like it. —Preceding unsigned comment added by 80.163.25.176 (talk) 13:28, 21 September 2007 (UTC)Reply

Well it does work with Reagan too, so the bug is more extensive that originally thought. Odd. violet/riga (t) 14:05, 21 September 2007 (UTC)Reply

Removal of "Explanation" section. edit

I removed the "Explanation" section from this page because technically, it makes no sense, or is incomplete. The section read as follows (wrapped so it doesn't cause long lines - an unedited version is in this page's source):

==Explanation==
Text files containing [[Unicode]] [[UTF-16]]-encoded Unicode start with a
"[[Byte-Order Mark]]" (BOM), which is a 2-byte flag that tells a reader how the
following UTF-16 data is encoded.  When you save a file in Notepad, by default you
are saving to 8-bit Extended [[ASCII]].  When the file is opened again, the bit
pattern tells notepad that you are reading from 16-bit [[Unicode]].  This causes
the eighteen 8-bit [[ASCII]] characters to be displayed as nine 16-bit [[Unicode]]
characters.
'''
Verified by JT (KNUSTComputerScience2004)'''

It certainly seems as though there's a misinterpretation of the file as UTF-16, but there's a vital part missing from this; the BOM itself isn't present in the text entered, or in the saved file. (The BOM being U+FEFF). Therefore, the explanation that the BOM is causing this makes no sense.

Maybe something in the text is causing Notepad to mistakenly think that it's UTF-16 and that it found a BOM, but the explanation doesn't mention that and it's technically incomplete. I don't know what the bug is in Notepad, but I have confirmed it myself. --Ciaran H 15:57, 1 October 2007 (UTC)Reply

Notepad calls IsTextUnicode, which runs a heuristic on the text. (Heuristic is programmerese ≈ educated guess.) Sometimes it guesses wrong, and the likelyhood of guessing wrong is bigger for short texts. Shinobu (talk) 19:55, 29 November 2007 (UTC)Reply

Removal of the comment at the bottom edit

I removed that "This arcticle is bullshit" comment at the bottom of the article. —Preceding unsigned comment added by 87.180.56.34 (talk) 15:36, 19 December 2007 (UTC)Reply

Good call. It's original research, and would need to be presented in the form "according to NOTABLE SOURCE, QUOTE this article is bullshit UNQUOTE REFERENCE". -Ashley Pomeroy (talk) 20:30, 28 September 2008 (UTC)Reply

Not a notepad-only bug? edit

I saved "bush hid the facts" with Metapad and it still bugged when I closed and reopened the text file. The exact error message I received: "Detected non-ANSI characters in this Unicode file. Data will be lost if this file is saved!" When I hit OK the text became nine question marks. Does this occur with other text editors, perhaps across operating systems? 71.115.6.93 (talk) 04:07, 27 April 2008 (UTC)Reply

Metapad is yet another Win32 editor so the problem is probably caused by IsTextUnicode(), too. There is some technical insight if you care. saimhe (talk) 20:37, 21 June 2008 (UTC)Reply

Conspiracy theory edit

I think the following points should be made (provided they're true and you can prove it...):

  • Bush is a reference to George W. Bush
  • The trigger text "Bush hid the facts" was chosen for a lark after the original bug had been discovered, and helped knowledge of the bug spread as it made a fairly mundane bug seem funnier/weirder
  • Some people probably thought the "Bush hid the facts" was a specific Easter Egg phrase planted by hackers or anti-Bush programmers inside MS.
  • Did MS go to the trouble of denying any such conspiracy?

jnestorius(talk) 20:55, 25 September 2008 (UTC)Reply

There are so many combinations that make this happen, it's actually nothing to do with Bush at all, e.g. "Pete ate the pasta", "Pete ate the pasta", "Bush ate the files" all trigger it in Notepad in WinXP SP2. 79.78.200.240 (talk) 16:05, 7 January 2009 (UTC)Reply
That's what they guy above you just said. Note his second point in particular. It should be made more clear in the article that it's not anything to do with Bush himself, but that the phrase was chosen on purpose because it worked. —Preceding unsigned comment added by 66.119.171.18 (talk) 00:46, 9 April 2009 (UTC)Reply

Newline edit

If you save the file with a trailing newline (0D 0A, since we're talking about the DOS/Windows world), this bug won't be triggered. This may go some way explaining why the bug is less frequent than it could be (as misparsing ASCII text as UTF-16 is not statistically rare). —Preceding unsigned comment added by 89.0.2.3 (talk) 00:22, 4 February 2009 (UTC)Reply

All even-length ascii text when decoded as UTF-16 will result in code points in the first half of the BMP. Some of those code points are unallocated but afaict all of them are potentially allocatable. CRLF interpreted as UTF-16LE would map to U+0A0D which is an unallocated character in the "Gurmukhi" block. Presumablly the MS implementation either knows that character is unallocated or considers a mixture of Chinese and Gurmukhi unlikely. Plugwash (talk) 16:59, 7 February 2018 (UTC)Reply

"This app can break" doesn't work! :) edit

It's funny, but I tried this bug and it behaves in the described way for all strings I tried except for "This app can break". If I changed any letter in this (e.g. "app" to "apa") and saved in the new Notepad window, it still changes to Chinese, but this one phrase fails. :)

Anybody can confirm that? I restarted Notepad for each phrase and saved only new files, so I shouldn't have made any mistakes. m_gol (talk) 12:25, 1 March 2009 (UTC)Reply

I suspect that the sequence " app" isn't mapping to two valid Unicode Chinese characters. But I'm unable to verify that. --Alvestrand (talk) 17:28, 4 March 2009 (UTC)Reply
It might be the capital T. Since Chinese doesn't have capitals, in some cases, it may cause the (istextunicode) thingy to recognize it as English. I don't know the exact cause, only speculating. Annihilatron (talk) 16:46, 25 March 2009 (UTC)Reply
It is the capital T, but it has nothing to do with the fact that Chinese may not have capitals. A capital T has a 8-bit value than a small T, so therefore "Th" has a different 16-bit-value than "th". Obviously that different 16-bit value doesn't represent a Chinese character (or perhaps a "typical" Chinese character) so that is why it doesn't work. —Preceding unsigned comment added by 66.119.171.18 (talk) 00:50, 9 April 2009 (UTC)Reply

Exceptions edit

There are exceptions to this bug. For example, if you type 1234 123 123 12345, it stays the same!--Jupiter.solarsyst.comm.arm.milk.universe (talk) 22:14, 13 May 2009 (UTC)Reply

�F�v���ֲ� —Preceding unsigned comment added by 149.169.212.191 (talk) 00:19, 8 January 2010 (UTC)Reply

Inaccuracy edit

"Bush hid the facts" becomes 畂桳栠摩琠敨映捡獴. The characters shown in the article (only the first one is different) are for "bush hid the facts" instead (first letter uncapitalised). I would edit the article myself but these characters probably also have a different transliteration than the one provided, so I fear I'd only make it more inaccurate. 91.107.57.148 (talk) 15:35, 5 January 2011 (UTC)Reply

Fixed in SP1 of Windows 7? edit

If it was already there, I think it has been fixed in Service Pack 1 of Windows 7. Updated windows doesn't have this behavior. When I type anything like Bush hid the facts etc. and follow the procedure, it remains the same. Others, please confirm this and edit it here if you find the same! — Preceding unsigned comment added by 223.181.0.206 (talk) 03:23, 30 January 2012 (UTC)Reply

Doesn't work for me either. 2Awwsome (talk) 17:27, 23 May 2013 (UTC)Reply

According to this article it was fixed well before that, in Vista.Spitzak (talk) 03:25, 29 May 2013 (UTC)Reply

Bug does still exist in Windows 10 edit

I am using Windows 10. This bug just occured to me while I was writing to a txt file through the "ofstream" class in C++. To take the question away, no, the sequence written to the file was not "Bush hid the facts" ;) I randomly typed things on my keyboard. — Preceding unsigned comment added by Copperazide (talkcontribs) 02:13, 30 January 2018 (UTC)Reply

Dubious phrase edit

In the Workarounds section, I find the dubious phrase "Notepad prepends a UTF-8 byte order mark". WTF is a "UTF-8 byte-order mark"? UTF-8 is a byte-stream encoding; unlike UTF-16 it has no endianness. Accordingly, BOMs on UTF-8-encoded files are the exception, and getting rarer. Wegesrand (talk) 15:33, 26 January 2022 (UTC)Reply

It's sometimes useful to begin a file with the three-byte UTF8 encoding of FEFF to indicate unambiguously that a text-file is in UTF8 to a program which can understand it. (Of course in other cases the three bytes can cause problems.) There's discussion of this at Byte order mark... AnonMoos (talk) 18:20, 27 January 2022 (UTC)Reply
Most Windows programs (at least supplied by Microsoft) insist on writing the code for U+FEFF at the start of any file that is "unicode", including UTF-8. This of course destroys the compatibility of UTF-8 with ASCII, which is the entire point behind the design of UTF-8, and likely delayed the adoption of Unicode in files and the internet by decades. However this pattern does not trigger the Bush hid the facts bug so it "fixes" this.Spitzak (talk) 18:49, 27 January 2022 (UTC)Reply

Self-published source edit

Following a YouTube video about the subject (arguably demonstrating that the Wikipedia article is wrong), that video has now been added as a source. Can we consider the author, FlyTech Videos, a subject matter expert? I am reminded by the recent discussion here where Karl Jobst is rejected as a subject matter expert on something he made a video about. --Renerpho (talk) 18:04, 4 July 2023 (UTC)Reply

I have asked the relevant Wikipedia projects to comment on this.[1][2][3] --Renerpho (talk) 19:39, 5 July 2023 (UTC)Reply
A tool to create sequences that trigger the bug, or check if a given sequence will do so, has been put on Github.[4] It also explains how the bug works. Of course it is also self-published (same person who made the YouTube video). --Renerpho (talk) 19:45, 5 July 2023 (UTC)Reply
Why should we care about self publishing? This video not only explain with more details why the bug happen, it also provides a (close enough) oracle to prove that what he says is true (up to new lines.) and that the IsTextUnicode() function is faulty. What do you want more as a proof? I mean, if the article was talking about history, or politics, well, ok. But it's science here, any repeatable source, self published or not, should be accepted. (Link to the video for info https://www.youtube.com/watch?v=sPShnuBSvBg) 2001:861:4286:8CA0:79D0:F61A:A524:1AF8 (talk) 17:02, 7 July 2023 (UTC)Reply
I don't care about proof. This is an encyclopedia, not a truth seeking machine. If reputable sources mention it, it belongs into the article. If the user has to confirm it themselves, by doing their own tests or by trusting the video, it does not. The fact that this is science, not politics, doesn't change that. If anything, the standards regarding WP:OR and WP:NOTTRUTH should be more strict, not less. Renerpho (talk) 02:00, 8 July 2023 (UTC)Reply
I wonder what kind of publication this sort of thing can go in. In the old days it would probably end up on a computer magazine and become citable for us. There would at least be an editor.
(The author noticeably hand-waves over how the Oracle is produced: probably decompilation of the IsTextUnicode function, not a wise thing to admit from a legal perspective indeed. What did the old magazines do with this sort of knowledge?) Artoria2e5 🌉 09:13, 13 February 2024 (UTC)Reply