The Mudcat Café TM
Thread #41856   Message #1058690
Posted By: JohnInKansas
21-Nov-03 - 03:21 PM
Thread Name: Help: HTML practice
Subject: RE: Help: HTML practice
For those using the Unicode tables to look up characters, note that you can code a character using the hex character number without converting to decimal, if you insert an x after the # in the character code.

Mark gave the example of the beh which is decimal 387 and hex 183:

ƃ (decimal) should print ·

ƃ (hex) should print the same ƃ

Win98 was built back in ancient times when large font sets weren't too much used. It is limited, by its word length, unless a programmer deliberately incorporates DWORD (double word) characters into specific functions. It takes 32 bits(or is it just 16?) to represent all of the Unicode character numbers, and Win98 isn't equipped to do that.

To some extent with Win2K, and fully with WinXP, the ability to use true Unicode fonts is available; but it's not a recommended practice except where necessary because of the very large size of full Unicode fonts.

Windows operates from "Code Pages" that contain instructions for how it responds to each given instruction. One of the things that's included in the code pages is the instruction for how to replace a given character number with a glyph - i.e. a graphic that displays the "character" in a selected font - for screen display. The setup of the machine, and the fonts selected, will determine which (of literally thousands) of possible code pages are loaded. Windows can flip through the pages that are loaded, to see whether it can represent an unusual character - is there a glyph for it in another code page - but it can only use the pages it has up.

[Persons with ancient experience will recall adding ANSISYS to their Autoexec.bat or Config.sys file to load the page for the ANSI, or "Extended ASCII" characters in the good old days – which did nothing but load an additional code page with glyphs for character numbers above decimal 064.]

Microsoft has produced a number of what they call "big fonts," that load additional code pages to permit display of characters that are not commonly used. Most of these contain glyphs only for "Western European" alphabets (with a few spares thrown in to fill out the pages). NONE of them contain a full Unicode character set.

With some limitations, Win98 can use "big font" pages, but you have to go get them - almost on purpose. Win98 users who happen to have them most likely got them with an IE upgrade, since it's the program displaying the document that determines what the characters should look like. Some "big fonts" may have been included in Office updates, but it's a little tricky to say what's available. Win98 is obsolete, and is NOT fully supported, and the Office versions commonly used with it may, or may not, have similar status. Update tools for identifying what you can add may not be readily available. Existing downloads are mostly still there, at Microsoft's download site(s) but you may have to figure out for yourself what you want.

A particular problem arises from Microsoft's failure to rename the newer "big fonts." There is no simple way, in Win98, to tell whether a given font, such as Times New Roman, is on your machine in the "old" version or in the "big font" version. Most Unicode fonts will include "Unicode" or "UC" in the font name, but even that is not guaranteed.

An additional difficulty comes from the "Internationalization" of Windows and Windows applications. Those who purchase their Windows in Germany will get a different set of code pages than those who get theirs in the UK, which is in turn different than what you get in the US. In general, the characters each can use is pretty much the same, but they may be arranged differently on the code pages to facilitate access to the characters most frequently used, in the language expected to be most often used.

If you see the little rectangle, it's because – in the font you're using to view the document, the "little rectangle" IS THE GLYPH used by that font to represent the character number called for in the document. Nothing that can be done to the document will make that character number show a different glyph. If you want to see something else there, you must change to a display font that associates some other "picture" with that character number on the machine displaying the document.

The only real problem is in figuring out which of the fonts you have, or can get, uses that different picture.

John