The Mudcat Café TM
Thread #112265   Message #2373957
Posted By: JohnInKansas
25-Jun-08 - 10:26 AM
Thread Name: Tech: htmlesc.py: Mac script to escape text
Subject: RE: Tech: htmlesc.py: Mac script to escape text
It's really ^#$@%! irritating when someone points out progress that I didn't notice, and makes me go back and study up. Maybe if I hadn't been so busy trying to make Office 2007 work I'd have noticed. (cheap excuse, obviously).

I found only 1,048 characters in the first 3,070 Unicode char numbers (  thru జ) that failed to display (in Preview) in my browser when coded in a post preview. That's a remarkable improvement over a few years ago when I tried out the first few thousand Unicode chars here. I think I was using Win98SE and IE5 way back then?

(Fortunately, for the reader, the preview function has been added since then, so I didn't need to actually post all of them to try them out. Maybe that thread was the reason they decided to add the preview?)

Since the charDict given contains only 141 characters, and these should all be "printable" ones, the objection to the possibility of a posted result being unreadable by significant numbers of people may be disregarded. (But you were gonna do that anyway.)

While the browsers have quite obviously been updated to handle a broader range of characters, I'm not sure whether our Win95/Win98 users can benefit from the improvement; but for the limited character set involved it's quite possible that they'll see truth and beauty in posts using the converter.

Surprisingly, when I copied the nonprinting characters from the mudcat preview and pasted them (unformatted) into Word to make my list of unprintables, a few (<100?) of them did display apparently good glyphs in Word. By accident I found that there appeared to be a very few that displayed in my browser, but not in Word, although I didn't search specifically for this effect, and the few that may have shown this behavior(u)r may have been just sloppiness on my part. Word 2007 is so slow that the ones that didn't display may have just been because it was still looking for the glyphs.

There also appeard to be one or two characters that might have displayed a different glyph in Word than in the browser. Since this was just a quicky test, it will take some additional checking to make sure both of the above weren't just operator hastiness effects.

While it's been pointed out before, probably few remember that in (PC) Word (someone with Mac Word might want to check this for the Mac version) if you place your cursor immediately to the right of an unknown character and hit Alt-X the Unicode char value will replace the character. It generally is not necessary that the glyph be displayed correctly to get the hex charvalue from it. You can also type the Unicode char number (hex of course), and with the cursor immediately to the right of the last character typed, hit Alt-X to transform it to the appropriate glyph – if it's one your computer knows about. Unfortunately this only works one char at a time.

I don't know how far back into old versions this worked, but it was good in Office 2002 and later for the versions we've had. I don't think Win98 knew about Unicode, so possibly it may not work in Word95/97 and other versions around that era. Maybe one of our ancient ones can try it and let us know.

Back to the drawin' board for some more study.

John