The Mudcat Café TM
Thread #112488   Message #2380638
Posted By: Artful Codger
03-Jul-08 - 10:12 PM
Thread Name: Tech: HtmlEsc.java: Convert special chars
Subject: Tech: HtmlEsc.java: Convert special chars

Program description updated 12 Feb 2011. HtmlEsc is now available as a downloadable, precompiled JAR file which can be run with just a double-click. The program may still be run from the command line, but this is no longer described.

Download from HERE (joeweb) or HERE (my SkyDrive).

Last JAR file update: 2011 July 3 (No longer converts ASCII single and double quotes)

-Artful Codger-


HtmlEsc is a simple Java program to encode text on the clipboard so that, once pasted into Mudcat messages, it will display the same way to all users. It converts all non-ASCII characters to HTML escapes (character references). To understand why this conversion is desirable, see the guide Entering special characters. Basically, if your text contains any characters outside the following list, encode it. Be particularly careful about quote, apostrophe, dash and ellipsis characters in text you copy/paste from other sources (like word processors and web pages). Often, those characters are not the same as the characters above.

This program is available as a pre-compiled Java JAR file, which should work across all platforms. To download the JAR file, go HERE, select Download, and save the file to your desktop.
(Actually, you can save it anywhere and just put a shortcut or alias on your desktop or similar quick-access location.)

To use HtmlEsc:

  1. Copy text to the clipboard.
  2. Double-click on the HtmlEsc.jar icon. Wait for the program to run and exit.
  3. Click on the Mudcat message entry area where you want to insert the text, then paste (Control-V on Windows, Command-V on Mac).
The program doesn't pop up a window (it doesn't need any user interaction or provide any feedback), so how do you know if it has run or finished? Usually, you'll see a Java helper application (Java Web Start) briefly start up, then go away.

To encode text you've already entered or pasted into the Mudcat message area, you do essentially the same thing: Select your text and copy it to the clipboard. Then double-click the HtmlEsc utility, let it run, then click the browser's title bar (returning focus to the message entry area without clearing the selection) and paste. This should replace the text you just copied with an encoded version.

NOTE: Any text styling is stripped, and any embedded HTML tags are treated as literal text. DO NOT run the converter on text which you've already converted, or on text which includes HTML tags.

Known problems: This utility only uses mnemonic escapes (character entity references) when they they are known to be supported across all major browsers. Numeric escapes should always work, but they are less human-readable in the text you are composing, so this utility uses mnemonic escapes when it they're viable.

The following message contains the full source code for the utility, in case you need to customize it for your system, or you're just curious how the conversion is performed. If you remove a mnemonic and its preceding value from the big table, that character will be converted to a numeric escape instead (except in the case of &, ", < and >, which would be copied literally--but you can always make their translations numeric).

See further compilation and usage notes at the start of the script. These notes are somewhat inaccurate, are oriented toward command-line execution, and do not describe how to create an executable JAR file. Note that three class files are produced, not just one, as stated in the source notes. You'll need all three: HtmlEsc.class, HtmlEscaper.class and MyClipboard.class.