The Mudcat Café TM
Thread #135056   Message #3092864
Posted By: JohnInKansas
10-Feb-11 - 09:08 PM
Thread Name: Tech: Entering special characters (moderated)
Subject: RE: Tech: Moderated thread on ampersand escapes
In "plain English" an escape character is one designated in the language the computer is using that tells the computer to treat what follows in a special way.

The usual and most common usage is to tell the interpreter that what follows immediately after the "escape" is a code of some kind, and not just simple text to be copied and displayed.

The & character is designated in HTML as being an "escape" character that means that what follows it has a special meaning. Because of this special usage, in ancient times it was necessary to "use the escape code to code the display of the & character" so we had to type & to "just type" &.

At mudcat, instead of "the & character is an escape" the interpreter has been told "the & character is an escape unless it's followed by a blank space" and we can now (usually) just type & if we just want to display & - as long as there's a space before and after it.

The HTML standard includes the ability to designate any typographical character by using the character number assigned to the character. Each character is assigned a unique number.

A variety of "character definitions" have been used, beginning with DOS and Unix, later merged into the ANSI standard definition for most uses. The current "most complete" definition of character numbers is the Unicode Standard.

All of the later sets of character definitions are intended to be able to "include" all earlier ones, but there are cases where this doesn't always work. Artful Codger will explain to us when and why it sometimes fails.

If you want to use the character numbers to be sure that the character you intend is sent to the html interpreter, you must type an & to tell the interpreter that "code is coming."

To use character numbers, the & must be immediately followed by a # character to tell the interpreter that what follows is a number.

The decimal number for the character must follow immediately after, and the "code" must be ended with a semicolon ;.

If you type © the "copyright" symbol © should be displayed in your html post, since the decimal number 0169 is assigned to that symbol in all of the various standards.

Because of variations in different Operating Systems, and different ways in which information is sent between computers, there are several different ways in which information "in transit" can be "encoded." Failures to get an accurate transmission and interpretation of the characters you attempt to send, when someone else receives and displays them, may result from these differences.

Artful Codger will explain these problems, and what you can do about them.

Fewer errors may result if the Hexadecimal numbers are used for characters that are assigned "bigger numbers." The Unicode Standard uses only hex numbers to define the characters.

As before, to "post" a specific character using the "hex number" assigned to it, you begin with the & "escape" to tell the interpreter that you're using a code.

The & must be immediately followed by the # character to say that the code is a number.

The # charcter must be immediately followed by an x (or X) to tell the interpreter that the number is in hexadecimal format.

The hex number follows immediately, and the code is ended with a semicolon ;.

The decimal number 169 is the same as the hexidecimal number 00A9.

Typing © should display the "copyright" symbol © in an html post.

The above is the "simple" explanation of the basics of posting characters using the numerical "names" assigned to characters in the standards.

Additional information will follow in later posts.

John