The Mudcat Café TM
Thread #85210   Message #1579202
Posted By: JohnInKansas
09-Oct-05 - 04:11 AM
Thread Name: Tech: Copyright Sign? Windows
Subject: RE: Tech: Copyright Sign? Windows
Arne L offers a good point. It's not enough to know how to produce a symbol, you've also got to know when it's appropriate to do so. We've had many threads realating to copyright, and some of them even have apparently useful information posted.

A summary:

The copyright symbol can be typed (c) in Word or in many other Windows compliant programs, and will be "autocorrected" to ©.
In most office programs you can also use the Alt-169 method to insert the ASCII character number 169, which many Windows compliant programs will display as ©.
You can paste the © character from Word into HTML and it should work just fine.
In HTML, you can code © or © to display ©.

The "registered" mark can be typed (r) in Word or in many other Windows compliant programs and will be "autocorrected" to ®.
In most office programs you can also use the Alt-174 method to insert the ASCII character number 174, which many Windows compliant programs will display as ®.
You can paste the ® character from Word into HTML and it should work just fine.
In HTML, you can code ® or ® to display ®

The "trademark" symbol can be typed (tm) in Word or in many other Windows compliant programs, and will be "autocorrected" to ™.
In most office programs you can also use the Alt-153 method to insert the ASCII character number 153, which many Windows compliant programs will display as ™.
You can paste the ™ character from Word into HTML and it should work just fine.
In HTML, you can code &#153 or ™ to display ™.

Anything noted as "Word will … " or "Office programs will … " can be turned on or off, so your results may reflect settings you've changed in your setup. The above are usual results with default settings.

ASCII characters numbered 0 through 29 are generally reserved for "control characters" that have no consistent text representation. ASCII characters go only as far as character 127. The "extended ASCII," or ANSI character set extends this to character number 255, but the HTML standard does not officially recognize characters in the range from 130 through 159, so those characters may not be consistently rendered.

For characters 30 through 129, and 160 through 255, in HTML you should be able to safely use the &#xxx; coding, where the xxx is the ASCII/ANSI character number. (Truly dedicated propellorheads among us may wish to note that the format &#Xhh; may also be used, where the hh is the hexadecimal representation of the same character number.)

You may get by with using some or all of the character numbers 130 through 159, but the standard does not assure consistent rendering of these characters.

You may sometimes get by with using higher numbered characters from the UNICODE character charts, but given the uncertainty about how various browsers will render them their use should be minimized. Theoretically all IE 4.0 and later browsers, and all Netscape 4.0 and later are supposed to be able to render all UNICODE characters if a suitable font is installed. The use of full set UNICODE fonts is not widespread at present, so far as I know.

A few HTML characters in the ranges described above have "character entity names" that may be easier to remember. The officially adopted ones should be just as readily rendered as what you get with the character number coding, so by all means use them if you can remember them.

The only named entities for characters that can normally be typed on standard keyboards (for Western European or US keyboards) are the:

doublequote ("), coded " or "
ampersand (&), coded & or &
lessthan (<), coded &#060 or &lt;
greaterthan (>), coded &#062 or &gt;

By coincidence(?) these are the four characters that have "logical significance" in HTML, and it is recommended that they always be coded. In mudcat posts, an & that's followed by a space usually will be recognized as the ampersand character, rather than as the start of a code sequence, and double quotes don't seem to cause a problem when they appear in plain text; but you should be aware of the "reserved nature" of these four characters. If in doubt, code them when posting to HTML.

Characters from number 160 (nonbreaking space &nbsp;) through 255 (small y umlaut, &yuml;) all have named entity codes, and include most of the characters with diacritical markings that are commonly used in the ISO-Latin character set. The HTML spec actually gives names to about 256 "named entity" characters, with the highest numbered one I've noted at about 9830 (coded &#9830; = ♦ - a diamond if your browser renders it).

It may be of minor interest that these characters, (160 thru 255 (and beyond?)) along with characters from languages such as Russian and Japanese are called "international characters" by Microsoft, and are always embedded in html pages by either their named character code or by their UNICODE numerical code. (Generally this encoding applies to any character not in the char set you use when you create the page.) This means that even if you can't see a character, it's in the document and should be visible if you load a "language" for which it's glyph appears in the character page(s) associated with that language..

Clinkers:

1. Microsoft warns that inserting symbols using the "Insert|Symbol" method (esp in Word) may give variable results when text is pasted into HTML (as in 'cat posts), unless you are very careful about inserting them in the font that contains them as defined glyphs. My own recommendation would be that you use the CharMap, select the font that contains the glyph/symbol you want, and copy and paste into your document. Ignore the existence of the "Insert|Symbol" utility entirely, as it can cause serious problems in many places in addition to HTML posts. You can also safely use the Alt-NumPad method to insert a character number, and then format the individual character to a font where the correct glyph appears. With most fonts you're likely to use, you can paste such symbol characters into an HTML post and most readers' browsers should render them correctly.

2. Office programs automate a lot of quick-key routines for getting "special characters" but Word is the "Master Program" where you set up which of these routines you want to use. The macros that implement these are contained only in Word. All other Office programs, and some other Windows compliant programs, "inherit" autocorrect functions from Word. If for some reason you don't have Word installed, or you delete/uninstall it when you elect to use another word processor, it is likely that none of these functions will work in any Office programs or generally in other compliant programs. There simply is no way to tell Windows that the functions should be "turned on" for other programs except through settings in Word, and the required macros don't exist if Word is removed.

3. Several standard/default Word autocorrect substitutions change things to "noncompliant characters." Curly quotes (single and double), n-dashes, m-dashes, and the ellipsis character all appear in the "nondefined" range of characters between 130 and 159, and your results may be unpredictable if you paste text containing these characters into an HTML post. Curly quotes likely cause the most difficulty, since the double-quote is an HTML "escape character" used to define the beginning and end of text strings. HTML browsers may not recognize the curly ones, or – disaster at work – may recognize one end of the string and not the other if curlies are used.

There may be a quiz at our next meeting. (or not.)

John