Talk:Escape sequences in C

Latest comment: 2 years ago by DarkShadow4774 in topic Wrong values for hex encoding?

About the merge with Escape sequence edit

I believe such a merge is inappropriate. The article escape sequence refers to character sequences used to change the state of a machine. Escape sequences in C refers to character sequences used in source code to print special characters which may not be typed directly in source code - due to syntax, for example. Merging this article with equivalents for other programming languages would be fine, but I don't agree that this should be merged with escape sequence because over the years the terms have simply differentiated. Microphonicstalk 18:45, 24 June 2013 (UTC)Reply

Confusion about number of digits, other errors edit

It does sound like this describes the C99 standard but it may be useful to point out that many other languages that interpret C-like escapes treat sequences of numbers differently (including lots of C compilers). Also describe some strangeness and unexpected aspects of the C99 standard, where you might very well expect the alternative behavior described here:

\x may only consume 2 digits, not all hex digits. IE "\x066" produces "^F6", rather than "f". Python at least does this, and so do lots of C compilers.

\uXXXX may end at the first non-hex digit (and thus allow less than 4 digits). It may also extend for more than 4 digits, thus making \U redundant. Often it works for values that gcc --std=c99 rejects (so far I have found it does not like any values less than \u00A0 and the "surrogate halves" \uD800..\uDFFF, making it really hard to put these into strings in a portable manner. But that appears to be the standard, sigh).

\UXXXXXXXX often does not require a full 8 digits, ending at the first non-hex digit. I have no idea why the standard did not do this, considering the highest useful Unicode value is 0x10FFFF and thus there are two digits that are always zero. Again many languages do not restrict the values like gcc --std=c99 does.

There is a significant subset of languages that do not interpret octal escapes, but interpret "\0" as a nul. Ie "\012" is "^@12".

\n being converted when it's written to a file or the terminal? edit

I'm not sure why this section (in "notes") is here. \n being converted to \r\n in the terminal or file write is very much library-dependent. If you want to mention it at all we should probably add a note with a link to Binary_file#Manipulation or something. gwinkless (talk) 08:49, 25 June 2020 (UTC)Reply

The article very much conflates the idea of wanting to embed a "newline" character in a string literal, with the idea of wanting to send a newline to an output stream, and the idea of what any particular output stream implementation might do with the newline when the stream has been opened in "text" mode. 173.75.33.51 (talk) 18:54, 31 August 2020 (UTC)Reply

Wrong values for hex encoding? edit

> char s2[] = "\u00C0"; // Two bytes with values 0xC2, 0x80, the UTF-8 encoding of U+00C0

Shouldn't that be values 0xC3, 0x80? Also see https://www.fileformat.info/info/unicode/char/00c0/index.htm — Preceding unsigned comment added by DarkShadow4774 (talkcontribs) 17:44, 6 October 2021 (UTC)Reply