The Mudcat Café TM
Thread #135626   Message #3096047
Posted By: Artful Codger
15-Feb-11 - 06:51 PM
Thread Name: Tech: Non-ASCII character display problems
Subject: RE: Tech: Non-ASCII character display problems
It doesn't matter what format is used for transport--as long as the target encoding (affecting the web page entry area) is UTF-x, the user input will be received and handled unambiguously. The browser and OS take care of that; neither the user nor Mudcat need be concerned.

The web pages are not stored, they are constructed from constituent information. So the page encoding should be that which makes things easiest from the Mudcat scripting side. Yes, UTF-16 may mean that the pages sent (for display) are nearly twice the size (though probably not, if the transport is always UTF-8), and that may mean that UTF-8 is preferrable, although increasing the burden on the Mudcat input handling.

Escaping is only encouraged presently because failing to do so results in inconsistent display. If the encoding is changed to UTF, there is no longer any need for users to escape text (except for &, < and >).

Can the HTML tag dictate whether a new tab, rather than a new window, is opened? I greatly prefer tabs, but since many pop-ups make no sense as tabs (and may resize the viewing window(!)), I must leave my default browser setting to open windows. Managing separate windows, however is a pain, so I'm strongly opposed to forcing a new tab/window to be opened for input. It also adds to management problems when one is simultaneously researching and composing a response; I have enough tabs and windows to manage as it is!

Changing emulation could be streamlined by popping up a "change encoding" tab/window, like the one you're proposing for input. Then the entire thread would not have to be redisplayed, and the display of other messages would not be affected. Selecting a new encoding directly from the window would be much simpler and more efficient timewise (even with the round-trip to the server) than having to find the encoding using the browser's interface—most users don't even know how to do this! And the setting I proposed to "fix" the encoding (i.e., apply that encoding for other users thereafter) could be incorporated into that interface. Then, only one user has to go through the pain of finding the right encoding for a message. Since ISO-Latin-1 would be assumed as the emulation default, most legacy messages would display properly from the start (as they do now, to most users), even if they were improperly encoded. Without emulation, these messages would appear blotto in a page with UTF-8 encoding specified. Leaving the display pages with no encoding specified is to mire them in the obsolescent past. The sooner legacy messages are updated to Unicode, the better for Mudcat's future.