Lyrics & Knowledge Personal Pages Record Shop Auction Links Radio & Media Kids Membership Help
The Mudcat Cafesj

Post to this Thread - Sort Descending - Printer Friendly - Home


Tech: HtmlEsc.java: Convert special chars

Related threads:
Mudcat HTML Guide PermaThread (64)
tech: HTML Ampersand Codes (33)
Tech: Non-ASCII character display problems (47) (closed)
Tech: Entering special characters (moderated) (18)
Tech: CopyUnicode: Create any char (17)
Tech - ALTKEY Codes on Laptop (28)
HTML Stuff II (126)
Tech: htmlesc.py: Mac script to escape text (12)
HTML Tables (19)
Clickable Links (14)
HTML Beginners Study Guide (3)


Artful Codger 03 Jul 08 - 10:12 PM
Artful Codger 03 Jul 08 - 10:15 PM
GUEST,Jon 05 Jul 08 - 06:55 PM
Artful Codger 05 Jul 08 - 10:48 PM
GUEST,Jon 06 Jul 08 - 07:38 AM
Artful Codger 03 Jul 11 - 05:25 PM
Share Thread
more
Lyrics & Knowledge Search [Advanced]
DT  Forum Child
Sort (Forum) by:relevance date
DT Lyrics:





Subject: Tech: HtmlEsc.java: Convert special chars
From: Artful Codger
Date: 03 Jul 08 - 10:12 PM


Program description updated 12 Feb 2011. HtmlEsc is now available as a downloadable, precompiled JAR file which can be run with just a double-click. The program may still be run from the command line, but this is no longer described.

Download from HERE (joeweb) or HERE (my SkyDrive).

Last JAR file update: 2011 July 3 (No longer converts ASCII single and double quotes)

-Artful Codger-


HtmlEsc is a simple Java program to encode text on the clipboard so that, once pasted into Mudcat messages, it will display the same way to all users. It converts all non-ASCII characters to HTML escapes (character references). To understand why this conversion is desirable, see the guide Entering special characters. Basically, if your text contains any characters outside the following list, encode it.
         A-Z a-z 0-9 ! " # $ % ' ( ) * + , - . / : ; = ? @ [ ] \ ^ _ ` { } | ~ space tab newline
Be particularly careful about quote, apostrophe, dash and ellipsis characters in text you copy/paste from other sources (like word processors and web pages). Often, those characters are not the same as the characters above.

This program is available as a pre-compiled Java JAR file, which should work across all platforms. To download the JAR file, go HERE, select Download, and save the file to your desktop.
(Actually, you can save it anywhere and just put a shortcut or alias on your desktop or similar quick-access location.)

To use HtmlEsc:

  1. Copy text to the clipboard.
  2. Double-click on the HtmlEsc.jar icon. Wait for the program to run and exit.
  3. Click on the Mudcat message entry area where you want to insert the text, then paste (Control-V on Windows, Command-V on Mac).
The program doesn't pop up a window (it doesn't need any user interaction or provide any feedback), so how do you know if it has run or finished? Usually, you'll see a Java helper application (Java Web Start) briefly start up, then go away.

To encode text you've already entered or pasted into the Mudcat message area, you do essentially the same thing: Select your text and copy it to the clipboard. Then double-click the HtmlEsc utility, let it run, then click the browser's title bar (returning focus to the message entry area without clearing the selection) and paste. This should replace the text you just copied with an encoded version.

NOTE: Any text styling is stripped, and any embedded HTML tags are treated as literal text. DO NOT run the converter on text which you've already converted, or on text which includes HTML tags.

Known problems:
  • If the clipboard is empty or the expected format can't be retrieved, an exception is thrown (the program crashes. But least you know no conversion was done.)
  • A clipboard interface problem on Suse Linux systems may prevent this utility from working there.
  • Bottom double quote and hi-bar are rendered as numeric escapes rather than mnemonic escapes (both work).
This utility only uses mnemonic escapes (character entity references) when they they are known to be supported across all major browsers. Numeric escapes should always work, but they are less human-readable in the text you are composing, so this utility uses mnemonic escapes when it they're viable.

The following message contains the full source code for the utility, in case you need to customize it for your system, or you're just curious how the conversion is performed. If you remove a mnemonic and its preceding value from the big table, that character will be converted to a numeric escape instead (except in the case of &, ", < and >, which would be copied literally--but you can always make their translations numeric).

See further compilation and usage notes at the start of the script. These notes are somewhat inaccurate, are oriented toward command-line execution, and do not describe how to create an executable JAR file. Note that three class files are produced, not just one, as stated in the source notes. You'll need all three: HtmlEsc.class, HtmlEscaper.class and MyClipboard.class.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: HtmlEsc.java: Convert special chars
From: Artful Codger
Date: 03 Jul 08 - 10:15 PM


Script updated 12 Feb 2011 to remove several mnemonics which aren't well-supported: hibar, Zcaron, zcaron and bdquote. The script supplies numeric escapes for these characters instead.

-Artful Codger-



/*
This program converts text on the clipboard to plain-text with all the non-ASCI and HTML special
characters converted to HTML entities or character references (like &eacute; or &#x1c0;).
This includes a number of word processing characters (like quotes and dashes) which aren't ASCII,
and thus get trashed when copied into postings. It is especially useful when posting text in
languages other than English.

The character reference values correspond to Unicode values (specifically, to UTF-16 code points.)

Installation:
(1) Save this script to a plain-text file named "HtmlEsc.java".
(2) In a command window, change to the directory where the script file resides and compile it:
       javac HtmlEsc.java
    This produces a class file named "HtmlEsc.class". This is all the Java interpreter needs to
    run the program.
See further compilation notes below.

Usage:
(1) Copy the text you wish to convert to the clipboard.
(2) In a command window, change to the directory where the class file resides and run:
       java HtmlEsc            Do Not include the .class extension.
    The clipboard will now contain the converted text as plain text.
(3) Paste the modified text from the clipboard to you post.
    Please preview your message (after any further editing) before you save it.

If you wish to execute the command from any directory, you will either have to supply a full path
to the class file (either on the command line or in a shell script, expressed in UNIX forward-slash
format) or place the class file in a directory you have defined in the CLASSPATH environment
variable. See Java tutorials or documentation for details.

Compilation notes:
This script can only be compiled on a system that has the Java Development Kit installed. The
minimum version is v1.5 (JDK 5). You should be able to copy and run the resulting class file on
any system that has an equivalent or later Java Runtime Environment (JRE). Both of these are
available as free downloads from http://java.sun.com/

Author: Artful Codger
*/

import java.util.HashMap;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.StringSelection;
import java.awt.datatransfer.DataFlavor;

public class HtmlEsc {
    public static void main(String[] args) {
       // Get the current clipboard text.
       MyClipboard cb = new MyClipboard();
       String cbText = cb.getText();
       String outText = HtmlEscaper.convertText(cbText);
       // Append some text and write it back to the clipboard.
       cb.setText(outText);
    }
}
//----------------------------------------------------------------------------

class MyClipboard {
   
    public String
    getText() {
       String rtn = null;
       Clipboard cb = getClipboard();
       DataFlavor flavor = DataFlavor.stringFlavor;
       if (cb.isDataFlavorAvailable(flavor)) {
            try {
                rtn = (String) cb.getData(flavor);
            } catch (Exception e) {} // UnsupportedFlavor will never be thrwosn.
       }
       return rtn;
    }
   
    public void
    setText(String text) {
       Clipboard cb = getClipboard();
       StringSelection sel = new StringSelection(text);
       cb.setContents(sel, null);
    }
   
    private Clipboard
    getClipboard() {
       return java.awt.Toolkit.getDefaultToolkit().getSystemClipboard();
    }
}
//----------------------------------------------------------------------------

class HtmlEscaper {
    private HtmlEscaper() {}    // Construction disallowed.
   
    static String convertChar(char ch) {
       String rslt;
       int ich = (int) ch;
       Integer Ich = ich;
       if (charMap.containsKey(Ich)) {
            rslt = charMap.get(Ich);
       } else {
            if (ich >= 0x80) {
                rslt = "&#x" + Integer.toHexString(ich) + ";";
            } else {
                rslt = Character.toString(ch);
            }
            charMap.put(Ich, rslt);
       }
       return rslt;
    }
   
    static String convertText(String inText) {
       StringBuilder buf = new StringBuilder();
       int textLen = inText.length();
       for (int ix = 0; ix < textLen; ix++) {
            buf.append(convertChar(inText.charAt(ix)));
       }
       return buf.toString();
    }
   
    // A map of Unicode values to HTML/XML char "entities" (named escape seqs).
    static private HashMap<Integer,String> charMap;
    // Key/value pairs to populate the charMap.
    static private final String[] charMapData = {
       "&", "&amp;", "<", "&lt;", ">", "&gt;",
       "\u00A0", "&nbsp;",
       "\u00A1", "&iexcl;", "\u00A2", "&cent;", "\u00A3", "&pound;", "\u00A4", "&curren;",
       "\u00A5", "&yen;", "\u00A6", "&brvbar;", "\u00A7", "&sect;", "\u00A8", "&uml;",
       "\u00A9", "&copy;", "\u00AA", "&ordf;", "\u00AB", "&laquo;", "\u00AC", "&not;",
       "\u00AD", "&shy;", "\u00AE", "&reg;", "\u00B0", "&deg;",
       "\u00B1", "&plusmn;", "\u00B2", "&sup2;", "\u00B3", "&sup3;", "\u00B4", "&acute;",
       "\u00B5", "&micro;", "\u00B6", "&para;", "\u00B7", "&middot;", "\u00B8", "&cedil;",
       "\u00B9", "&sup1;", "\u00BA", "&ordm;", "\u00BB", "&raquo;", "\u00BC", "&frac14;",
       "\u00BD", "&frac12;", "\u00BE", "&frac34;", "\u00BF", "&iquest;", "\u00C0", "&Agrave;",
       "\u00C1", "&Aacute;", "\u00C2", "&Acirc;", "\u00C3", "&Atilde;", "\u00C4", "&Auml;",
       "\u00C5", "&Aring;", "\u00C6", "&AElig;", "\u00C7", "&Ccedil;", "\u00C8", "&Egrave;",
       "\u00C9", "&Eacute;", "\u00CA", "&Ecirc;", "\u00CB", "&Euml;", "\u00CC", "&Igrave;",
       "\u00CD", "&Iacute;", "\u00CE", "&Icirc;", "\u00CF", "&Iuml;", "\u00D0", "&ETH;",
       "\u00D1", "&Ntilde;", "\u00D2", "&Ograve;", "\u00D3", "&Oacute;", "\u00D4", "&Ocirc;",
       "\u00D5", "&Otilde;", "\u00D6", "&Ouml;", "\u00D7", "&times;", "\u00D8", "&Oslash;",
       "\u00D9", "&Ugrave;", "\u00DA", "&Uacute;", "\u00DB", "&Ucirc;", "\u00DC", "&Uuml;",
       "\u00DD", "&Yacute;", "\u00DE", "&THORN;", "\u00DF", "&szlig;", "\u00E0", "&agrave;",
       "\u00E1", "&aacute;", "\u00E2", "&acirc;", "\u00E3", "&atilde;", "\u00E4", "&auml;",
       "\u00E5", "&aring;", "\u00E6", "&aelig;", "\u00E7", "&ccedil;", "\u00E8", "&egrave;",
       "\u00E9", "&eacute;", "\u00EA", "&ecirc;", "\u00EB", "&euml;", "\u00EC", "&igrave;",
       "\u00ED", "&iacute;", "\u00EE", "&icirc;", "\u00EF", "&iuml;", "\u00F0", "&eth;",
       "\u00F1", "&ntilde;", "\u00F2", "&ograve;", "\u00F3", "&oacute;", "\u00F4", "&ocirc;",
       "\u00F5", "&otilde;", "\u00F6", "&ouml;", "\u00F7", "&divide;", "\u00F8", "&oslash;",
       "\u00F9", "&ugrave;", "\u00FA", "&uacute;", "\u00FB", "&ucirc;", "\u00FC", "&uuml;",
       "\u00FD", "&yacute;", "\u00FE", "&thorn;", "\u00FF", "&yuml;",
       "\u0152", "&OElig;",
       "\u0153", "&oelig;", "\u0160", "&Scaron;", "\u0161", "&scaron;", "\u0178", "&Yuml;",
       "\u0192", "&fnof;", "\u02C6", "&circ;",
       "\u02DC", "&tilde;", "\u03A9", "&Omega;", "\u03C0", "&pi;",
       "\u2013", "&ndash;",
       "\u2014", "&mdash;", "\u2018", "&lsquo;", "\u2019", "&rsquo;", "\u201A", "&sbquo;",
       "\u201C", "&ldquo;", "\u201D", "&rdquo;", "\u2020", "&dagger;",
       "\u2021", "&Dagger;", "\u2022", "&bull;", "\u2026", "&hellip;", "\u2030", "&permil;",
       "\u2039", "&lsaquo;", "\u203A", "&rsaquo;", "\u2044", "&frasl;", "\u20AC", "&euro;",
       "\u2122", "&trade;", "\u2202", "&part;", "\u220F", "&prod;", "\u2211", "&sum;",
       "\u221A", "&radic;", "\u221E", "&infin;", "\u222B", "&int;", "\u2248", "&asymp;",
       "\u2260", "&ne;", "\u2264", "&le;", "\u2265", "&ge;", "\u25CA", "&loz;"
    };
    static {    // Static block to init the char map from the data array.
       charMap = new HashMap<Integer,String>();
       for (int ix = charMapData.length; ix > 0; ) {
            String val = charMapData[--ix];
            String sKey = charMapData[--ix];
            Integer key = (Integer) (sKey.codePointAt(0));
            charMap.put(key,val);
       }
    }
}


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: HtmlEsc.java: Convert special chars
From: GUEST,Jon
Date: 05 Jul 08 - 06:55 PM

I've just tried building it using NetBeans but it isn't working and at least some times it seems to empty the clipboard.

GetClipboard takes about 30 seconds to return. rtn in GetText gets set to an empty string ("").

OpenSuse 11.0/KDE 4/
java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: HtmlEsc.java: Convert special chars
From: Artful Codger
Date: 05 Jul 08 - 10:48 PM

I don't know what to tell you. I'm only using standard calls. If getClipboard takes that long (before any text is even requested!) it indicates a problem with the Suse/Java interface. I'd be curious whether users on other Linux systems encounter a similar problem.

Try this: compile specifying an execution version of 1.5. On Mac at least, the current 1.6 runtime is only available in a 64-bit version. That may somehow be related to the problem. I don't think I've used any constructs that aren't available in v1.5. FWIW, my environment is 1.6.0_05.

Another thought is that whatever application you're copying from doesn't actually make the data available on the clipboard, but rather waits to be notified when someone wants the data it offers, in some particular format. I don't do that kind of handshaking on the script end--none of my source apps have failed to put the text directly onto the clipboard.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: HtmlEsc.java: Convert special chars
From: GUEST,Jon
Date: 06 Jul 08 - 07:38 AM

I've had another look this am. The long delay only happens when I run in debug mode.

It didn't seem to be last night but it is reading the clipboard contents (btw if clipboard is empty it throws a null pointer exception) and is converting the text.

The setting of the new clipboard contents is random though. It can leave the contents unchanged, empty the clipboard or update correctly.

I'll try to have another play another day and see if I can work it out.


Post - Top - Home - Printer Friendly - Translate

Subject: RE: Tech: HtmlEsc.java: Convert special chars
From: Artful Codger
Date: 03 Jul 11 - 05:25 PM

2011 July 3: New JAR version.

JAR updated to remove conversion of ASCII double quotes and single quotes (the "apos" character reference is non-standard).

The only change is in the first entry line of the charMapData definition, which should now read (after indentation):
"&", "&amp;", "<", "&lt;", ">", "&gt;",


Post - Top - Home - Printer Friendly - Translate
  Share Thread:
More...

Reply to Thread
Subject:  Help
From:
Preview   Automatic Linebreaks   Make a link ("blue clicky")


Mudcat time: 3 October 6:24 PM EDT

[ Home ]

All original material is copyright © 2022 by the Mudcat Café Music Foundation. All photos, music, images, etc. are copyright © by their rightful owners. Every effort is taken to attribute appropriate copyright to images, content, music, etc. We are not a copyright resource.