The Mudcat Café TM
Thread #112265   Message #2373051
Posted By: Artful Codger
24-Jun-08 - 04:08 AM
Thread Name: Tech: htmlesc.py: Mac script to escape text
Subject: Tech: htmlesc.py: Mac script to escape text
Below is a Python script for Macs which converts text into a format suitable for pasting into messages here.

For a cross-platform Java version, see this thread: HtmlEsc.java
Why would you need this? Because:
  1. it encodes all the accented characters in your source text, so they show up properly when posted.
  2. it encodes characters in non-Roman character sets like Cyrillic and Arabic. If you can create or copy the text, the script can encode it.
  3. it encodes the non-ASCII characters that word processors sprinkle into your text but which Mudcat and other sites don't properly recognize except in escaped form.
  4. it operates directly on the text on your clipboard/pasteboard, so you can massage the text with only one extra step, right before you paste to the web site.

For the complete scoop on why you need a tool like this, see the guide "Entering special characters."

The bad news:
  1. It only works on Mac (due to proprietary clipboard interfaces, and the absence of Python on most Windows boxes), probably only with OS 10.4 or later. It requires the PyObjC bridge, which I believe is standard issue on Macs, but which might only be available if you've installed Developer Tools.
  2. There may be cases in which the script won't be able to access the clipboard text, because it wasn't made available in the standard format the script relies upon.
  3. The script wipes out all word processor formatting and treats HTML directives as literal text to be escaped.
  4. If you aren't used to entering UNIX commands in a Terminal window, you'll need to recruit a techie type (like your 10-year-old) to get things set up. I've kept my installation instructions terse because a techie will know instantly what I mean.

Here's the one-time setup:

  1. Start a Terminal window.
  2. Determine a local folder in your search path in which to create the script. Change to that folder.
         echo $PATH
         cd <folder>
  3. Copy and paste the script to a plain text file (NOT a word processor file) and name it "htmlesc.py". Actually, you can give it any name you want, as long as it ends in a .py extension. If your editor tacks on a .txt extension, rename the file after saving it.
  4. Give yourself execute privieges:
         chmod u+x htmlesc.py

And now you're ready to use it:

  1. Select the text that you want to encode and copy it to the clipboard.
  2. In a Terminal window, type:
         htmlesc.py
    Note: If you set things up right, it doesn't matter what directory you're in.
  3. Paste your text where you want it.

The first time you run the script, the Python interpreter will create a p-code file with the same name as your script file, but with a .pyc extension. Keep it around; if you delete it, the interpreter will just have to recreate it the next time you run the script.

I may add an option to preserve HTML tags, but you'd have to make sure any other angle brackets in your text were escaped.

My script follows in the next message.