The Mudcat Café TM
Thread #137789   Message #3153697
Posted By: JohnInKansas
13-May-11 - 08:26 PM
Thread Name: Mudcat rewriting preformatted text
Subject: RE: Mudcat rewriting preformatted text
The details are a bit fuzzy since I haven't had my obsolete HTML book out recently, but I believe the "escape" form cited above applies only to URLS, and there's a different description for "body text" in posts. The escapes used for web addresses are a carryover from another standard (maybe it's the RFC1738?), included in the HTML std for reference. Web addresses use a different "language" than html posts.

Within a post, the space (ASCII decimal 32, Hex 20) is a "printable" character, and no substitution/escape is necessary. The "control characters, like CR (decimal 13, Hex 0D) and LF (decimal 10, Hex 0A) appear to be "nonfunctional" if included in a post, regardless of how you code them. The "tag" <BR> is the only thing that works in the page code.

It actually is somewhat difficult to find (on the web) an ASCII/ANSI Character chart that shows the "unprintable" characters. I did find one at ASCII Table on about the third or fourth page of a Google search return.

I don't recall which was what, but back when we all used DOS, IBM's "paragraph" break was CR LF and Microsoft's was LF CR, or maybe it was the other way 'round. (Microsoft said it's a "special character" but has never defined a "code value" for it.) Since Windows and Macs also differ as to little endian and big endian, there probably is also a difference there as well. That means that the "Automatic Linebreak" function has to recognize either form and substitute <br> for either one of them, and the "reader" on each user's browser has to change <br> to whichever is appropriate for the machine it's on.

The "extra" space could come from the interpretation by mudcat when it saves the post into the database, or could happen when the browser "reads" the post back from the code.

John