The Mudcat Café TM
Thread #135626   Message #3096718
Posted By: JohnInKansas
16-Feb-11 - 05:05 PM
Thread Name: Tech: Non-ASCII character display problems
Subject: RE: Tech: Non-ASCII character display problems
Since WinXP, Micorosft has supplied some fonts in the form that they call by the technical name of "Big Fonts." In all of those fonts, all the ASCI/ANSI English characters, and all the principal "European Language" characters are included.

The character numbers "printed" by those fonts are the Unicode numbers.

Internal font coding in Office programs since Office 2003 has been UTF-16 for the fonts provided by Microsoft.

Anyone typing from a Windows computer with WinXP/Office 2003 or later and using one of the big fonts should not need to code any character they can type.

If a person gets a font from somewhere else, it probably is NOT a "big font," and coding of characters outside the "font range" would be necessary. A font from another source may also force "font page" use, which could cause characters to be differently encoded.

Fonts with the same names existed on earlier Windows versions, but installing WinXP/Office2003 or anything later should have upgraded them to "big."

The Microsoft Office "big fonts" are:

Arial Black
Arial Bold
Arial Narrow
Bookman Old Style
Courier New
Times New Roman
Trebuchet (Central and Eastern European languages only)

At least since Office 2003, Office program installation has selected Tahoma as the default font, and claims have been made that it was "designed for improved web visibility" (but I've failed to find any real advantage to using it).

Nearly all surviving Windows computers should have, or can get, the Microsoft "Arial Unicode" font which contains all Unicode characters up to hex FFFD, but the chars beyond those in the "Big Fonts" will probably require kidnapping an oriental person to steal a keyboard to use them without coding.

"Currently in the Microsoft Windows operating systems, the two systems of storing text — code pages and Unicode — coexist.
However, Unicode-based systems are replacing code page–based systems.
For example,
Microsoft Windows® NT 4.0, Microsoft Windows 2000, Microsoft Windows XP,
Microsoft Office 97 and later,
Microsoft Internet Explorer 4.0 and later,
Microsoft SQL Server 7.0 and later all support Unicode."
Unicode Support in Office 2003

Code pages are used by Office only for (some of) the "little fonts." They might also be used for a "free font" from someplace chosen at random.

As an example, using Times New Roman (a Big Font) the "shortcut" allowing US users who don't have a "euro key" to use Alt-Numpad 0128 to enter € returns the correct Unicode Hex character number 20AC if you use the Alt-X toggle in Word to flip it back to the char value. Any character that comes "off the keyboard" should be sent by its Unicode char number.

So the broken characters that appear at mudcat are due to people using "little fonts" that still use code paging rather than Unicode, loss of the Unicode char values by the mudcat database, or people using something other than Windows programs.