The Mudcat Café TM
Thread #135686   Message #3095524
Posted By: JohnInKansas
15-Feb-11 - 05:20 AM
Thread Name: BS: HTML & Ampersand Code testing
Subject: RE: BS: HTML & Ampersand Code testing
& = escape character to tell the interpreter that what follows is code

# = modifier to say that the code is a number

x or X = modifier to say that the number is a hexadecimal one, omitted for decimal character numbers.

nnnn = the number

; = code ending mark to say the code is finished

&#nnnn; for decimal codes and &#xhhhh; for hex numbered codes     ☃ = ☃


Recent Windows versions should all include the "Arial Unicode MS" font, that contains about 39,000 characters (IIRC?). It should contain all the characters from "alphabetic" languages, but two-thirds(?) (or more) of the Unicode characters are for "pictographic" languages, mostly "oriental." Even with just alpha based languages, the font file is about 22.5 MB, so using it can slow down your computer. People who didn't get it with their version of Windows should be able to download it from Microsoft, but it's in one of the fringe "group" areas - "Internationalization" or something like that.

Times New Roman is the next largest "alpha based" font on my machine, and it's only 809 KB. It's an "extended" font that contains virtually all of the chars anyone is going to have any reason to put up at mudcat. I've been using it for quite a while, and have never seen a broken char on mudcat due to not having the char in the font.

The Arial (non-Unicode) font with WinXP and later was also, I think, an "extended font" so it should be a good one if you want a sans-serif one.

Windows (IE) installations recently have defaulted to a font in IE that supposedly was invented "just for the web." It includes most of the European chars, but deliberately omits all but the most common variant ones in order to keep the file size as small as possible. I recall being annoyed that it kept "resetting" itself for a while, but don't recall what it was called. It's a good one for general browsing, but not so good if you encounter significant numbers of non-English sites.

Even the much maligned "curly quotes" come through in text if they're coded correctly, although as MUST BE REMEMBERED they don't work in code (or URLS).

(The different rules for URLS are a separate subject, but they are significanlty different than the general html rules, and it appears that some of our people have been confusing the two in recent posts.)

UNICODE deliberately leaves large "areas" of numbers unassigned so that users can assign "local definitions" to them. Unfortunately Mac expecially appears to use local code pages that put "curlies" and some other (mostly punctuation) marks in an unassigned area and seemingly doesn't convert them to Unicode as well as Windows. Windows can post broken code characters, but for the most part they're not in "language and spelling" but are mostly in "marks" that don't affect meaning.

(NO COMPETENT TYPIST should ever use the Office "Insert Symbol" because these will nearly all be "broken" in html, and will earn you curses and incantations from your typographer and prepress setup people.)

John