The Mudcat Café TM
Thread #53410   Message #822509
Posted By: JohnInKansas
09-Nov-02 - 08:49 PM
Thread Name: BS: Non-ASCII Character wanted
Subject: RE: BS: Non-ASCII Character wanted
Bill D -

I downloaded AllChars some time ago - I think it was on your recommendation, and if so, thanks.

This is a useful tool, but unfortunately not as useful for most people as it may at first appear. The difficulty is that, while it lists all the "Unicode" characters that could be in a font set, nobody (in my real world) actually has (or can use) full Unicode character sets.

The "original" ASCII character set used 7 of the eight bits in a "single byte" to define what character is wanted. This is why there are only 128 characters in the ASCII set - that's all the numbers you can represent with 7 bits. When a program sees a "character number," it goes to a character code "page" to look up how to "draw the character."

Quite a long time ago, someone realized that the eighth bit in the "character byte" could be used to tell a program "look on the other page." In simple terms, this allowed the use of 256 characters. It is not generally appreciated though, that there is no single standard "other page." Some may remember that back in the DOS days you had to "boot" a program called ANSI.SYS if you wanted to use the Symbol Font or "graphical characters." Booting ANSI.SYS told your machine "the other page is the ANSI EXTENDED CHARACTER SET."

The Windows systems most of us work with can have a rather large number of "other pages," but unless you have taken specific action to install them, most systems only use a few. It should be noted that the "default other page" that you get if you buy Windows in London is NOT the same as what you get if you buy it in Chicago (or Warsaw, Paris, or Hong Kong etc). The "other page" is diffent for a US Mac than for a US PC. The differences between locales come under what Micro$oft refers to as "localization of software." The differences for the Mac are because "that's the way he did it."

Characters that are less frequently used but are common to quite a few "localizations" may appear on virtually everyone's "other page," but it is very difficult to find specific information that will allow you to guess whether an "extended character" that you use will be legible (as the same character) on someone else's machinery.

When you "swap languages" with the systems most of us use, all you're doing is using a different "other page." You still have only 256 characters available until you make the next swap.

To be able to make use of anything approaching all the characters shown by AllChars (or in Unicode tabulations) you need to be able to "code" the full (at least two byte) Unicode character number, which brings us back out of the thread drift and back to the original topic of the thread.

People considering "keeping up" with things, may want to plan on an operating system (as well as hardware) upgrade in the sometime in the future, because:

WINDOWS 2OOO PROFESSIONAL (OR BETTER) AND WINDOWS XP PROFESSIONAL (OR BETTER) ARE THE ONLY EXISTING MICROSOFT OPERATING SYSTEMS CAPABLE OF FULLY "HANDLING" DOUBLE-BYTE CHARACTER SETS.

The "or better" above is to include "Server" and "Enterprise" versions of those two operating systems.

At present, even with an op system that can handle "double byte character sets" (DBCS) it is a real "trial of perseverence" to actually implement full Unicode usage. As an example, even though there is a "kit" to let Win2K Pro do the right-to-left and top-to-bottom characters (Hebrew/Chines] alignment, it appears that you can only get it from the Win2K Server "superpack" - only available to owners of 2K Server.

I suspect that it will all come, ...someday..., but I'm not going to predict when we can call it "useful" for us po' folk. But you might want to "get ready(?)"

(I will note, for the record, that other op systems have varying capabilities with DBCS and MBCS (multi-byte character systems). Unix can, in principal handle up to four-byte characters, but I've not seen it implemented.)

Reference: Developing International Software, 2d edition, Microsoft Press, ISBN 0-7356-1583-7, about $49 (US). [My copy shows copyright 2003 - so it may not be in your bookstore yet.]

John