The Mudcat Café TM
Thread #135626   Message #3093668
Posted By: Artful Codger
11-Feb-11 - 10:04 PM
Thread Name: Tech: Non-ASCII character display problems
Subject: RE: Tech: Non-ASCII character display problems
There is a related problem: that of font coverage for the Unicode character set. No font contains all the defined Unicode characters; each font covers only some parts of the domain, with frequent holes within the covered parts. So if you specify a character, even though it may be visible to you, it may not be visible to another use who is using a similar, though not identical, font, with holes where you have visible characters.

The good news is that, for the European languages at least, coverage is fairly complete for virtually all the characters you're likely to encounter. Also, browsers are savvy enough that if they can't find a character in the default or specified font, they will find the closest font they know of which does have the character. So you're pretty sure of getting the character displayed, if it's available on your system.

But people are finicky: they like things to look uniform, so it's a bit jarring if they're reading Times New Roman text and suddenly there's a jump to Ariel just to display an Irish long-r. Better if the entire text (or a certain span) were displayed in Ariel to begin with. (Though I personally dislike reading sans-serif text.) In many cases, users can address this by bracketing their posts with style or font directives—if they know how. But they're still guessing that (1) that font will exist on all or most other users' systems and that (2) all the needed characters will be defined in their font implementation.

Font issues also tie into the high-bit character problem. For old, improperly encoded posts, some have suggested fixing the display problems by bracketing the text with encoding or font specs, or changing the encoding for the whole page. I don't believe encoding can be changed on the fly—an encoding spec applies to the entire web page file (though you might be able to specify a different encoding for an included file). And hopefully fonts which cater to specific codepages rather than to Unicode are becoming historical relics. If you know the encoding, it's much better to re-encode the text than to dance around the problems it causes.