mudcat.org: HTML Practice Thread

sj

Post to this Thread - Printer Friendly - Home Page: [1] [2] [3] [4] [5]

HTML Practice Thread

GUEST	30 Jan 12 - 04:56 PM
GUEST	30 Jan 12 - 04:53 PM
Jeri	30 Jan 12 - 09:37 AM
GeoffLawes	30 Jan 12 - 08:10 AM
GeoffLawes	30 Jan 12 - 08:06 AM
GeoffLawes	30 Jan 12 - 08:04 AM
JohnInKansas	29 Jan 12 - 09:44 PM
JohnInKansas	29 Jan 12 - 09:39 PM
GeoffLawes	29 Jan 12 - 08:05 PM
GeoffLawes	29 Jan 12 - 08:03 PM
GeoffLawes	29 Jan 12 - 06:39 PM
GeoffLawes	29 Jan 12 - 06:36 PM
GeoffLawes	29 Jan 12 - 06:26 PM
GeoffLawes	29 Jan 12 - 06:24 PM
Q (Frank Staplin)	29 Jan 12 - 03:55 PM
Q (Frank Staplin)	29 Jan 12 - 03:52 PM
JohnInKansas	22 Dec 10 - 06:26 AM
Joe Offer	22 Dec 10 - 01:37 AM
Artful Codger	21 Dec 10 - 11:56 PM
JohnInKansas	21 Dec 10 - 03:01 PM
Artful Codger	20 Dec 10 - 11:19 AM
Artful Codger	20 Dec 10 - 10:55 AM
JohnInKansas	20 Dec 10 - 03:46 AM
Artful Codger	19 Dec 10 - 06:25 PM
JohnInKansas	19 Dec 10 - 05:29 AM
MudGuard	19 Dec 10 - 02:47 AM
Joe Offer	19 Dec 10 - 02:24 AM
GUEST,.gargoyle	18 Dec 10 - 09:19 PM
Artful Codger	18 Dec 10 - 07:12 PM
JohnInKansas	18 Dec 10 - 02:47 AM
JohnInKansas	18 Dec 10 - 02:24 AM
Joe Offer	17 Dec 10 - 07:26 PM
Joe Offer	17 Dec 10 - 07:25 PM
wysiwyg	15 Apr 10 - 05:49 AM
Mr Red	20 Mar 10 - 06:01 AM
Mr Red	20 Mar 10 - 05:58 AM
Mr Red	17 Mar 10 - 09:31 AM
Mr Happy	16 Mar 10 - 09:45 AM
GUEST	15 Mar 10 - 02:06 PM
CapriUni	15 Mar 10 - 12:56 PM
topical tom	11 Mar 08 - 03:16 PM
topical tom	09 Mar 08 - 09:45 PM
Melbert	20 Oct 99 - 04:03 PM
Okiemockbird	19 Oct 99 - 04:41 PM
Joe Offer	19 Oct 99 - 04:32 PM
Okiemockbird	18 Oct 99 - 11:23 AM
Okiemockbird	14 Oct 99 - 01:47 PM
T in Oklahoma (Okiemockbird)	14 Oct 99 - 01:07 PM
bill\sables	14 Oct 99 - 07:13 AM
bill\sables	14 Oct 99 - 07:11 AM

Share Thread

Lyrics & Knowledge Search [Advanced]
DT Forum Child
Sort (Forum) by:relevance date

DT Lyrics:

Subject: RE: HTML Practice Thread
From: GUEST
Date: 30 Jan 12 - 04:56 PM

SONG TITLE, THE

By Writer's Name

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GUEST
Date: 30 Jan 12 - 04:53 PM

*** Songtitle,The.........(36spaces).............. SONGWRITER(S) …27 (spaces)… Performer(s) #

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Jeri
Date: 30 Jan 12 - 09:37 AM

There's no space between the hash(#) and the target. (If that's what you're taking about)

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 30 Jan 12 - 08:10 AM

Result! Now I have just got to spot which item of code I missed out on the previous attempts.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 30 Jan 12 - 08:06 AM

THE ABRAHAM LINCOLN BRIGADE

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 30 Jan 12 - 08:04 AM

Thanks John but its a link problem I am testing out rather than columns.
Here goes again
>Abraham Lincoln Brigade, The .......... JOHN McCUTCHEON ........... John McCutcheon #

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 29 Jan 12 - 09:44 PM

I forgot to code the code for the nonbreaking space, so the code I typed for the code printed a nonbreaking space.

Of course the code should be   which has to be coded &nbsp; to display  

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 29 Jan 12 - 09:39 PM

GeoffLawes -

It appears that you might be trying to "line things up," perhaps in columns. This can be tricky in html for a couple of reasons.

Most posts are done with common fonts, in which the width of individual characters varies. It's easier if you use a monospaced font like Courier.

One reason is that html ignores multiple "spaces" and a string of several spaces will only display one space. You can force all of the spaces to be displayed by using a non-breaking space, but you need to code it as . Of course when you insert the code in your layout the code takes more than one space and makes it difficult to get the same alingments as in the html when it displays on the web.

An alternative is to use a <pre> tag at the beginning of the stuff you want to line up, and of course a closing </pre> tag at the end of it all. The <pre> tag means "preformatted" and forces the text to display in a monospaced font and nearly always to display all the spaces.


  <pre>
  displays          text     like
  this in           sort     of columns
  among             other    things</pre>

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 08:05 PM

SONG TITLE, THE

By Writer's Name

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 08:03 PM

*** Songtitle,The.........(36spaces).............. SONGWRITER(S) …27 (spaces)… Performer(s) #

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 06:39 PM

*** Songtitle,The.........(36spaces).............. SONGWRITER(S) …27 (spaces)… Performer(s) #

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 06:36 PM

SONG TITLE, THE

By Writer's Name

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 06:26 PM

Songtitle,The

By Writer's name

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GeoffLawes
Date: 29 Jan 12 - 06:24 PM

*** Songtitle,The.........(36spaces).............. SONGWRITER(S) …27 (spaces)… Performer(s) #

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Q (Frank Staplin)
Date: 29 Jan 12 - 03:55 PM

Èr'

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Q (Frank Staplin)
Date: 29 Jan 12 - 03:52 PM

Èè
É

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 22 Dec 10 - 06:26 AM

Joe -

The disappearing posts feature was the reason I started this last little bit here. Since I really had little notion of what was happening, I sort of hoped my babble would disappear, as soon as someone with an explanation came along.

I find it easy enouth to save mudcat threads for the most part, so I've archived (for my own use) the thread.

I generally save as "archived page" (.mht) since it avoids the breakage that results if you move or rename a simpe save (.html) - in case that might be handy for anyone elss to explore.

It should also be noted that if you save an edited thread that gets edited, the next time you save that same edited thread it overwrites the previous save, since the thread name doesn't change. If you want to save and keep a particular version, you need to edit the name of the saved file you don't want overwritten.

(It's just like at the US Supreme Court decisions site, where EVERYTHING you save from there has the same filename unless you change each one as you tuck it away. (At least theirs are nearly always PDFs.) It also happens at a lot of web pages that offer a "print version." The "printable" page is nearly alwasy named the same - the name of the "make printable" machine rather than the article name - if you pull up the printable and then just save a copy of it.)

Also, if you go into a thread here "by pages" and save "page 1" and then "page 2" you'll find you only have the last one saved when you get done - unless you edit the filename for each one so that each group of 50 posts has a different filename on your machine.

And thanks to all who commented, and especially to Artful Codger (21 Dec 10 - 11:56 PM). I'm not sure what I can do with it, but that last one was a lot better than "explanations" I've been able to find elsewhere.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Joe Offer
Date: 22 Dec 10 - 01:37 AM

Some of this stuff you guys are posting looks very interesting. Remember that this is a practice thread, and messages are subject to deletion. If you come up with anything worth saving, please put a copy in the Mudcat HTML Guide. I may move some of your messages there myself.

-Joe-

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Artful Codger
Date: 21 Dec 10 - 11:56 PM

[Pardon if this is a duplicate post, but my previous posting of it appears to have disappeared in the Mudcat ether.]

If you view the page source for Amos's message, you'll see that the characters in question are unescaped--improper HTML. If you view the page using the Mac-Roman encoding, his message shows properly. (If your browser lacks the Mac-Roman encoding as an option, it is deficient. Use a real browser instead--one that recognizes a world beyond Windows.)

Similarly, if you source-view posts from Windows users, you'll often see literal Windows smart quotes rather than the escape sequences which are the only proper HTML for non-ASCII characters (when the default page encoding is used, as on Mudcat). If you view these posts with your browser's encoding set to Mac-Roman, which is a feasible setting for Mac users, these quotes will appear munged. Select a number of other feasible encodings for different locales, and the quotes will also appear munged in some.

So Amos has only done (from a Mac) what Windows users commonly do here. To him the pasted text appears and previews properly--he can't tell just from looking that there will be any problem for other users. No more than can Windows users, who see their own miscoded messages properly, in both previews and after posting. The problem is far more extensive than you've noticed.

I performed another little test. First, I created a WP document containing mixed text (as in my test above) followed by the string "hello", which gave me smart quotes. This I copied to a plain-text editor (TextEdit) and saved as UTF-8. Then I copied the hello string with quotes to another plain-text editor window and saved as Mac-Roman. (I didn't copy the other characters, as some would not be expressible in the Mac-Roman encoding.) Then I pasted from these three sources into a Mudcat entry box and previewed. Here is what I found.

Pasting from the Mac-Roman plain-text file when my browser's encoding setting for the page was Mac-Roman (as Amos probably has set by default) produced Mac-Roman smart quotes, which Mudcat failed to translate to HTML smart quote escapes. When I switched the view encoding to Win-1295 (as most Windows users would use), the quotes showed up as accented O's. So I've been able to easily duplicate what's happening in Amos's messages. NOTE: The quotes in the entry box itself continued to show as quotes; only the preview was affected.

More unexpectedly, if I previewed again after changing the encoding, but not the text, I got unescaped Windows smart quotes.

Pasting the Mac-Roman plain-text directly to the browser when the browser's encoding setting was Win-1295 produced unescaped Windows smart quotes as well.

Pasting the mixed text directly from the word processor (iWorks Pages, which is Unicode-based), with browser encoding Win-1295 gave me only two properly encoded characters: the Czech and the Russian; the accented e and the smart quotes remained as literals with the quotes converted to Windows smart quotes. Pasting with browser encoding Mac-Roman yielded Mac-Roman smart quotes. I got the same results pasting from the UTF-8 plain-text file.

Conclusions:
Regardless of the source, the text is being converted to a native encoding according to the browser's encoding setting. It would appear that for any character not within the selected encoding, the (16-bit) Unicode value is returned; otherwise, the 8-bit code for the encoding is returned.

The problem is not because Amos is doing anything odd; it's because Mudcat is inadequately handling the input. If Mudcat is going to allow unescaped characters outside the 7-bit ASCII range, then it needs to query for the encoding information, convert everything to Unicode and then convert the non-ASCII characters properly to HTML escapes. Note that the retrieved text might contain sections with different encodings. It may also contain characters (like certain wingdings) with no Unicode equivalents; these should just be dropped or replaced by a question mark.

As it stands, Mudcat doesn't escape any characters within the 8-bit range, that is, within your native encoding (or the view encoding active for the page). Thus, even though the Unicode glyphs for smart quotes, apostrophes, long dashes, copyright symbols and such are outside the 8-bit range, other encodings map them (inconsistently) to values within the 8-bit range, so they remain unescaped by Mudcat. The same applies to the most common accented characters a European might use, and to the native language characters for Slavic, Greek and Hebrew users--all remain unescaped, and hence munged for viewers in different locales!

Why don't you see the problem from other Mac users more often? Due to the Windows bias on the net, most of us end up setting our default browser encoding to Win-1295, so the majority of improperly encoded web pages will view as they were intended. Consequently, when we post text containing unescaped smart quotes, even though they're Unicode or Mac quotes in our source, they're converted to unescaped Windows quotes. I stress that this is no more "correct", even if fewer users are disconcerted. Only escaped smart quotes and such are universally viewed correctly.

Web page input widgets must be able to indicate both the encodings used and the switches between them. And I know most programming/scripting environments provide conversion functions to translate natively-encoded text to Unicode and back. So I would expect that a solution is rather straightforward to implement. If Max would give me read access to the source code, perhaps I could whip up a fix, or at least describe more specifically the changes to try.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 21 Dec 10 - 03:01 PM

At the link given in the post that started it all:

BS: Lies about MSNBC 18 Dec 10 - 11:07 AM

I see broken characters for all the "curly quotes" (double and single) and apostrophes in the post by Amos. In other of his posts hyphens appear to come through okay, but the (rare) n-dash and the more common m-dash are also broken. Since nobody has argued about whether they see them as broken, I must assume that others also see them displayed that way - or weren't interested enough to look at what was being discussed.

I can go to the source page he copied from and paste it into Word. When I copy it back into a mudcat "Reply to Thread" box and preview it (or post it) it posts normally with no broken characters.

Comments at by Grishka, at the link and one following a couple of posts below, may be close to explaining the "problem" I've been playing with.

At the second link, Guest,Grishka suggests changing encoding. If I change the encoding for my browser to UTF-8 the broken characters change to char FFFD (which normally represents an undefined char) and I can no longer tell whether they're left double quotes or right double quotes. A couple of other encodings show slightly different chars for the broken ones; but I can't get rid of them by using any of the different encodings I've tried. Of course I haven't tried all the possible languages (128 of them IIRC), or even the 18 different flavors of English that IE and Office offer. I didn't see "Canuck" as an encoding.

Note that although it's generally believed that Amos uses a Mac, what he does should not be assumed to be representative of Mac users in general. The broken pastes may actually be done by his weasel dogs when he's not looking. His posts, however, are generally consistent whenever he pastes from outside sources, with the same (and some other) char breaks as in the sample.

I've archived comments here that are of interest to me. Since this is an edited thread I assume it will all go away when the rest of you are finished playing with the problem.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: Input conversion
From: Artful Codger
Date: 20 Dec 10 - 11:19 AM

Now, John contends that only a few users (like Amos) are entering text which will misdisplay. As a Mac user, with different native codepage mappings from Windows users, I know this to be false. I see many pages where folks are posting raw 8-bit characters instead of escaped characters, and the result is munged text to non-Windows users (or users in different locales). I don't know when the most recent auto-conversion changes were implemented, but within the last month or so I have seen postings entered in Cyrillic which were not converted to proper escapes. Presumably, if they had been received as Unicode, they would have been (like the fourth character in my test). And presumably they appeared correct to the poster, both in the input box before posting and in the thread after posting. Conclusion: what YOU see (as an American or English Windows user), even in preview, is not necessarily what others will see. And a quick peek at source views will bear out what I'm saying. If it ain't escaped and it ain't 7-bit ASCII, it ain't right.

That said, why would text arrive natively encoded, in this age where Unicode transfer is the preferred method? I've got some ideas, but I don't have time to present them now.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Artful Codger
Date: 20 Dec 10 - 10:55 AM

I ran a simple conversion test, pasting mixed text from a source plain text file (Unicode by default, or I couldn't have mixed these particular characters). What you see below corresponds to what I saw; I've escaped characters to ensure this message will display properly.

Input:
e é č э

Output (per preview page):
e é č э

What do I conclude from this?

Since the text all appeared in the message box exactly as I pasted it from the source document, the text area widget must be accepting the text as Unicode--no single 8-bit code page contains all these characters; the only other alternative would have been for the text to contain embedded codepage change directives, which I seriously doubt.

The text widgets don't know they're specifically dealing with HTML or text to be converted to HTML, so the conversions which were performed (on the latter two characters) were done by Mudcat. This, by the way, is relatively new behavior; I've tried similar tests with Czech and Cyrillic before, and there was no automatic conversion.

Because the Czech and Cyrillic characters were converted automatically, Max must have realized that any character value greater than xFF must be a Unicode value, and hence safe to convert. Presumably, this includes all smart quotes, long dashes, Euro symbols, etc. whenever the text arrives in the message box as Unicode (but not as codepage-encoded text). I haven't tested what happens if you directly type in text, since my presumption is that all direct text entry will be Unicode by default.

The fact that Max has not employed a similar automatic conversion for characters in the 00-FF range suggests that he knows text is still arriving in the text area encoded according to locale-specific codepage mappings; it is not always being converted to Unicode by the user's paste operation. And because no information is transferred to indicate which encoding was used, it's unsafe to apply a default (like Unicode/ISO-Latin-1) which would munge text pasted as ISO-Latin-5, Win 1295 or KOI8-R. These would likely misdisplay to most users anyway, but at least one can get the right display by changing display settings. Once the characters are escaped, they will only display according to the Unicode characters for those escapes.

Note that, even when raw 8-bit characters are posted, they are likely converted to Unicode byte pairs, preserving the 8-bit value even if the corresponding Unicode mapping does not correspond to that character. It is up to the recipient of the entered text (here, Mudcat) to decide how to handle the values. On the other hand, for the text to be displayed "correctly" still to the user (prior to previewing) there must be some intelligence informing the browser what encoding was used. This is where I'm still confused, as the text box appears to behave according to both worlds simultaneously! Can it be that the input encoding is available from the text area widget, but Mudcat isn't asking for it?

Because the 80-FF range is not converted automatically, these "raw" characters constitute improper HTML, and will display according to locale and browser vagaries as previously indicated. It's the last big hole.

This doesn't get us much closer to solving Amos's problem, but it does clarify a few things about input handling.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 20 Dec 10 - 03:46 AM

david -

Your "explanation" appears to be quite correct, but it's not the explanation for the problem at hand.

When Amos copies from another website and pastes and posts it here, it looks like garbage to the rest of us. (Not a comment on content, just on the broken characters.)

If I go to the same website Amos copied from and copy the same thing he copied, and paste and post it here, it looks like the original site we both copied. This works even if I don't replace curly quotes with straight ones, although it's easy enough to do on my machine and I would if the post preview showed that I needed to.

We don't know whether Amos pastes the copied article into his wordprocessor and then copies from there to paste here, although it would be suspected that he does. IFF he pastes it into his Mac WP, his computer apparently translates the "real world curly quotes" into "Mac curly quotes" that have different code numbers than the rest of the world uses. That "translation" would be necessary in order for it to look right in his WP. But when he posts it here the "curlies" appear as junk characters.

The problem is NOT with mudcat or with the html interpreters at mudcat or on our individaul machines, since I can code "anything that works" and it's okay in the post. I can also code "anything that works," preview it here, copy the preview pane - converted to display html - back into the input pane, (I'd preview again) and post, and it ALWAYS WORKS. I can put all sorts of strange characters in a Windows Word document, and for nearly all of them I can copy from Word and paste directly into the mudcat reply box, and IT WORKS without coding "extended characters" in nearly all cases. Only characters that my computer codes in non-standard ways will break, and thus far the webdings font is the only one I've found with significant numbers of "non-standard" char codes on Windows machines.

In other words, everybody except Amos can copy from any html that displays correctly in their browser and paste to mudcat and it will display correctly here. Only (mostly) Amos always f*cks up his pastes. (There actually are a couple of other people who suffer from the same curse occasionally, but for the most part sometimes the others get one right.)

The puzzle here is to guess what Amos could do to make his paste-ups more legible for the rest of us, since I suspect that every so often one of his pastes might be interesting to read. then we'll have to decide whether to tell Amos, or just sit back and smirk at his warped chars.

Note that this isn't just a mudcat problem. Several years of professional wordprocessing and publishing layout/setup work have made it clear that there are differences between outputs from Mac and Win machines, and pinning down where the differences come from would make it a lot easier to handle them outside the real world at mudcat.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Artful Codger
Date: 19 Dec 10 - 06:25 PM

The reason is simple: the typical Windows and (older) Mac code pages don't correspond fully to the Unicode set (values in the 00-FF range correspond to ISO-Latin-1, not Win 1295 or Mac-Roman). And the default character encoding for HTML is 7-bit ASCII only, so any characters outside the standard ASCII set, regardless of encoding, are technically illegal HTML; their representation is left to the vagaries of the browser, the user's locale and other inconsistent factors across the user base.

Since Mudcat pages deal strictly with raw 8-bit characters, not 16-bit as for Unicode's UTF-16 sets, any text posted into Mudcat messages must be coerced by either the user's system or Mudcat itself into some 8-bit mapping. Unfortunately, the source system doesn't know what encoding Mudcat assumes or will apply once the text is submitted, and Mudcat doesn't know what encoding the source text is represented in--by the time it hits Mudcat, it's just a string of bytes (with, hopefully, ASCII as a subset of the source encoding).

Nowadays, when someone copies text onto their clipboard, it becomes available in multiple formats, and usually, Unicode is one of those options--any Unicode encoding can be converted on the fly to any other without confusion. And the problem would not exist if (1) all text pasted into the text area were converted to Unicode by the paste operation itself and (2) Mudcat automatically converted all text beyond 00FF to HTML escape sequences (7-bit ASCII chars only). But we know that neither of these requirements is being met. The characters aren't arriving in the entry box as Unicode because, if they were, the same encoding would always be used for the same source characters. (Is there an option to specify the input encoding expected for a text area widget?) And we know that Mudcat doesn't automatically convert (most( raw characters/Unicode codepoints to escaped HTML just from all the unescaped "foreign" characters visible in HTML source views. If specifying a Unicode input encoding would force incoming text to Unicode automatically, Max could force the HTML escaping unconditionally, without ambiguity, and the problem should never (or only rarely) arise again--Wingdings aside.

But for some reason--probably because the text entry box is configured (by default?) to accept only 8-bit character input--the browser paste operation has to bypass Unicode and default instead to some locale-specific 8-bit encoding, and there the major problem lies.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 19 Dec 10 - 05:29 AM

Practice/Test post for receational purposes only.
Not ready for publication.

From a web page where I found what appears to be the article pasted by Amos at BS: Lies about MSNBC 18 Dec 10 - 11:07 AM, I find the broken characters that appear in Amos' post, copied and pasted into Windows Word where they display the same, as:

Ò has hex code 00D2 and is intended to be a left double quote.
In Windows Word a left double quote is coded as hex code 201C which displays as “

Ó has hex code 00D3 and is intended to be a right double quote.
In Windows Word a right double quote is coded as hex code 201D which displays as ”

Ô has hex code 00D4 and is intended to be a left single quote.
In Windows Word a left single quote is coded as hex code 2018 which displays as ‘

Õ has hex code 00D5 and is intended to be a right single quote.
In Windows Word a right single quote is coded as hex code 2019 which displays as ’

Ð has hex code 00D0 and is intended to be an n-dash.
In Windows Word an n-dash is coded as hex code 2013 which displays as –

Windows Word codes as m-dash as hex code 2014 which displays as —
If the process Amos used in the paste viewed continues the same "character order" it would be expected that his source would code an m-dash as hex code 00D1, which would display as Ñ, but no example was found in the text I found for comparison, although I believe the Ñ appears in other of his paste jobs.

Unicode assigns a hex code to each typographical character using the range of numbers from 0000 through FFFF. The assignment is to a "typographical function" represented by the character name, and a "generic glyph" representing a common symbol that may be used to display the character is shown for reference. The glyph (picture) of the character is for reference only and is not a required part of the specification.

A "complete" Unicode font that includes all the printable characters runs to approximately 32MB per font, for the two known Windows fonts available. Loading one is NOT RECOMMENDED except where absolutely required for a specific purpose.

A font (or typeface for purists) may assign any "picture" to any hex code. Your computer loads character pages, usually two or three or up to nine or ten at a time, as necessary to make the characters in the font(s) you select available to a "document." It is normal to load additional char pages containing characters beyond the single font you select, since it is common to use "out of font characters" fairly frequently. If a code is entered that is not in one of the loaded char pages, the program should try to find and load a new char page for another font that contains a character assigned to that code number, and usually the glyph that results will resemble the illustrative generic glyph for the code number in Unicode, unless a char page that is loaded has assigned that code to a different glyph.

The codes assigned to the curly quotes and m and n-dashes in Windows Word correspond to the Unicode assignments for those characters. (But they still may not be recognized if they creep into html codings.)

The codes that Windows Word reads in the paste by Amos appear in a range assigned for non-printing control codes. Since these control codes are almost never used in desktop programs, this appearance suggests that the Mac has created font pages containing the glyphs at hex codes that would otherwise be "unused" in any normal Mac program. The glyphs displayed in Windows are "real" but it would take some searching to find each one at a "proper" Unicode hex position. (Lots of the glyphs look almost identical but have different uses.)

Since the page from which I copied the "original" that I believe Amos copied from displays normally in my Windows browser and pastes into a mudcat post (preview) that displays without corrupted characters, and I can copy his corrupted posts and they display on my machine as posted, it is reasonable to assume that the corruption happens on Amos' computer and is an artifact of his setup.

The Unicode specification permits assignment of arbitrary glyphs to unused codes, and in fact groups of code numbers are deliberately left unassigned for that purpose. Windows Word in US versions assigns the "euro" symbol to the decimal code 0128 (hex 0080) simply because it's unassigned in ASCI/ANSII and in Unicode, and permits the Alt-Numpad method of entry. The assigned Unicode euro is at hex 20AC €, and should be on most European keyboards. (not to be confused with the deprecated hex 20A0 ₠)

The Windows Character Map will show either a decimal (for ASCII/ANSI range) or hex code for each character when you hover over it. For common fonts, the lower numbered chars correspond to the the Unicode character assignments, but "out of range" char codes shown may not be the Unicode number for the glyph displayed/printed. A few fonts (as discovered for Webdings recently) have no codes that correspond to Unicode characters that even remotely resemble the Webdings chars, so you cannot use the code numbers alone to display those glyphs in a web posting.

The use of a code that's technically assigned to a Unicode typographical character or control function because of "unlikely use" is a bit shady, but may meet some purpose appropriate to Macs, and is in fact also done in some Windows fonts. The codes that Windows Word reads from the pasting show glyphs that may be similarly mapped in a Windows char page when displayed in Windows browsers, but may or may not display the intended characters or the ones displayed in Windows in a Mac browser.

It's also possible that Amos has font/char pages loaded that are not "standard Mac" that might account for him seeing something the rest of us don't. In that case we can only guess whether what he sees are "visions" or just "hallucinations."

Changing his setup to use straight quotes rather than curly quotes should eliminate the problem, since the necessary straight quotes all exist in the ASCII/ANSI range and are unlikely to be "substituted." There might still be a problem with the "typographical dashes" but he'd have to try it (and admit to it) for us to know, unless someone else with a Mac would care to try to check it out.

Comments welcome if received before I get tired of playing with it.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: MudGuard
Date: 19 Dec 10 - 02:47 AM

>> In the earliest Max Days ... an entire thread could be tinted ... by accident of course.

It was also possible to let disappear everything beyond your own text including the form for answering. So any user could "close" a thread.

This was possible by opening (but not closing) an html comment ( <!-- ).
Now that is no longer possible, when you submit such a comment opening, you get an error message "contains an invalid html element"

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Joe Offer
Date: 19 Dec 10 - 02:24 AM

Hi, Garg-
I thought that would work - paste the text into notepad and save it as a text file, and close it; then open the text file and copy it and paste it into Mudcat - but it didn't work. The curly apostrophes and quotation marks were still there, and turned out as garble on Mudcat. What I did above was copy the defective text from a browser, and then close the browser window and paste it into Mudcat in another browser window.

But now I have to wait for Amos to post another garbagy post so I can see if I can duplicate my solution.

-Joe-

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GUEST,.gargoyle
Date: 18 Dec 10 - 09:19 PM

Whatever you are posting...
Cut and Paste
Word Processer

Before uploading ... run it through a basic (any) TEXT editor.
Reduce it to the lowest common element ASCI....
Add the HTML afterwards.

There are multiple sources across the the net to "test" your HTML ... unfortunately, today ... few Net-A-Zins do.... In the "old days" bad HTML was like farting-in-an-elevator. Blessedly...the Mudcat has exhaust-fans in the form of Joe-Clones to exit the fumes.

Of course, MC has preview...which should, perhaps, maybe, become a default posting process.

Sincerely,
Gargoyle

In the earliest Max Days ... an entire thread could be tinted ... by accident of course.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Artful Codger
Date: 18 Dec 10 - 07:12 PM

The PRE tag doesn't solve the problem, since the raw characters are simply illegal characters for HTML (per the character encoding specified for Mudcat pages, which users here have no control over). In fact, the PRE tag may cause proper escape sequences not to be recognized as such, but to be printed literally as they were input, with the ampersands, digits and semicolons.

See these threads for my scripts (htmlesc.py and HtmlEsc.java) which encode text on the clipboard, and search for other text conversions threads for links to online utilities that will do roughly the same thing, if you don't mind a bit more copy-and-pasting. If you use them, your "smart" quotes, dashes, symbols, non-English characters and what have you should end up looking correct, though you'll still need to add formatting tags for bolding, italics and such.

This topic has been covered to F'ing death in other threads, though I'm too lazy to locate or index them. It would be nice to have a moderated PermaThread that summarizes the info and links to the other, more exhaustive threads, so that when this topic arises again and again (as it invariably will), one can simply link to the PermaThread.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 18 Dec 10 - 02:47 AM

Well, I left out a #x in the straight double quote code, but you get the message. (")

The reason Amos posts stuff full of curlies is that he's kinky and has his WP set to "Fancy Dan Mode." (I'm told.) The reason lots of his posts are full of "unintelligible meaningless characters" is because he uses a Mac and/or copies lots of his stuff from people who do, and he NEVER PROOFS HIS CUT-N-PASTE POSTS. . . . ; > }

(A more likely explanation is that he's copied from html that used quotes to identify the lines as "text strings." Those quotes would be usually be hidden in the "interpreted html" on his browser but would copy. If he pasted in his WP program and re-copied to a post, they might get converted to "curlies" which would (sometimes) show in his post.)

Since they didn't show in your test post, it's a guess as to where they were.

Macs do still have some deviant characters on the keyboards, although it varies with the OS version.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: JohnInKansas
Date: 18 Dec 10 - 02:24 AM

Joe - re quotes & apostrophes

The HTML specification uses straight quotes to mark "strings" of characters that don't require interpretation, and "curly quotes" don't work universally in code, although a few sites have added interpreters to read them in code.

Since it mucks up the code, it's best to "turn off curly quotes" in your Word, which turns them off in all your Office programs. When you turn them on/off for quotes, they're also turned on/off for apostrophes.

It's handy to know that in Word, if curly quotes are turned on or off, you can Edit|Replace Replace All " with " and all the quotes will be changed to whatever's the current setting. (i.e. Word Replace recognizes both forms, and replaces with whatevers the current setting.) You have to separately replace ' with ' if you want them changed also.

The straight double quote is hex code 0022, so " should print it, but it gets cumbersome to code all of them in codes, so it's better to set up to type them "straight."

Straight Single Quote can be coded '      '
Straight Double Quote can be coded &0022;        "
Left Single Quote can be coded ‘          ‘
Right Single Quote can be coded ’        ’
Left Double Quote can be coded “        “
Right Double Quote can be coded ”       ”

Of course you can convert the hex numbers to their decimal equivalents and leave out the "x" in the codes, but I'm too lazy to work the conversions at the moment.

You can also use "curlies" in preformatted text with <pre> in front of the text and </pre> at the end, and they should come across in a post, although using the preformat tag can also get clumsy.

John

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Joe Offer
Date: 17 Dec 10 - 07:26 PM

Hmmm. They all disappeared. No problems now.....

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Joe Offer
Date: 17 Dec 10 - 07:25 PM

I'd like to figure out how to deal with the apostrophes and quotation marks in this post. Mudcat will read apostrophes and quotation marks if they're straight up and down, but not if they're angled. I keep wondering if this is a Mac problem, since I most often see the problem in posts from Amos, our most infamous Mac user.
Anybody have a solution?

"Can't Understand" - A song composed about the devastating earthquake in Christchurch in New Zealand - where with stringent building codes and the early morning timing when city streets were empty, there was major property damage and some injuries, but nobody died.

http://www.youtube.com/watch?v=tUmov1ElyTY

At 4:36am on September the 4th 2010 Christchurch was woken by a huge 7.1 magnitude earthquake, causing buildings to collapse, roads to crack and power and water outages right throughout Christchurch and neighbouring areas.

Can't Understand written and sung by Jeremy Cooper
(Levin, New Zealand)

On the 4th of September 2010
4:35 in the morning
Christchurch New Zealand began to shake
Earthquake came without warning

Throughout the region new fault lines appeared
Cracks in the earth were alarming
Soil turned to mud, And bricks tumbled down
Flooding and horror abounding x2

I can't understand in the darkness that day
How nobody died.
Some people gave thanks to their God
And some, oh.... the tears they cried

Bob Parker the mayor was quick to the scene
Calm and cool as cucumber
John Key the leader of the land
Said to the queen: " Can't make it"
No time for tea 'cos Canterbury 's
Recovery needs funding.
Both were surprised, no-one was killed
Having viewed the quake's devastation

(NZ Prime Minister was due to fly to the UK to attend an a
fternoon tea function in London with the queen- cancelled it)

We can't understand in the darkness that day
How nobody died
Some people gave thanks to their god
And some, oh...... the tears they cried

Sad to say looters came the next day
But alongside heroes a-plenty
Some gave their money, some gave their time
Some with love did their duty
Many worked hour upon hour
Depriving the sandman his bounty
All of them had a cry in their hearts,
A question they kept on repeating,
A question they kept on repeating

We can't understand in the darkness that day
How nobody died
Some people gave thanks to their God
And some, oh...... the tears they cried

We can't understand in the darkness that day
How nobody died.. How nobody died....How nobody died

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: wysiwyg
Date: 15 Apr 10 - 05:49 AM

All the HTML I know, I learned at Mudcat. With it I built the Spirituals permathread. It's serving me well at our parish website, too:

New!

CLICK HERE
to see parish life in the summertime
(a short video montage).

And it's great to have a thread to practice little tricks before screwing up a project!

Thanks, Mudcat!

~Susan

Post - Top - Home - Printer Friendly - Translate

Subject: HashCodes-Я -Us
From: Mr Red
Date: 20 Mar 10 - 06:01 AM

Hey! It worked in the subject line,
see above.
though it does throw in spaces I didn't intend. Edit them out.
Best of Luck.

Post - Top - Home - Printer Friendly - Translate

Subject: HashCodes-Я-Us
From: Mr Red
Date: 20 Mar 10 - 05:58 AM

hash codes generator

My hash code page works fine on FF2 & FF3.6 and Mozilla 1.7.8 and IE5.5 but not IE 7.0.573.blah blah blah

Anyone who has an idea why I would appreciate a clue.

I am sure it is a case of JavaScript Objects and finding an attribute that still is valid. (Bloody Micro$oft still vainly trying to rule the world).

It is sort of useful for odd languages Like Russian or Tamil) and it is graphical. Copy the code and paste in a text editor (eg the Mudcat text Box). I may add a convert address to hash codes for copy & paste purposes.

but the raison dêtre is to hide txt from spammers. Like addresses. I have done it for years and it works, because they trawl for addresses on web pages and inspect the text. Translation from codes would take too long for the billions of pages they trawl.

Now did the Subject line translate?

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Mr Red
Date: 17 Mar 10 - 09:31 AM

To get any the characters using std fonts, I will put a page on my site tonight.

I used it for the Russian page and the Norske page in the meantime - if you want to look for hash codes.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Mr Happy
Date: 16 Mar 10 - 09:45 AM

‚»‚ê‚Íaim'‚Ì‚æ‚¤‚É‚»‚±‚ÉŒ©‚¦‚é; tŠJ‚¢‚½쳌³•ûŒ`- htmlhelp.com‚ÍŒ¾‚¤: —›‚Ì"ñ"ï‚³‚ê‚½Œ^'®쳌«ƒŠƒXƒg쳌€–Úƒ}쳌[ƒJ쳌[‚ÌƒŒƒ"ƒ_ƒŠƒ"ƒO‚ð'ñˆÄ‚·‚é쳌B ‰Â"\‚È‰¿'l‚ÍŽŸ‚Ì'Ê‚è‚ ‚é: UL쳌ADIR쳌A‚Ü‚½‚Íƒ쳌ƒjƒ…쳌["à‚Ì—›‚Ì‚½‚ß‚Ì•C‚Ìas–³Š´Šo‚È‰¿'l: ◦disc (‰~–ž‚½‚Ìa) ◦square (쳌³•ûŒ`‚Ì—ÖŠs) ◦circle (‰~‚Ì—ÖŠs) OL"à‚Ì—›‚Ì‚½‚ß‚Ì•C‚Ìas•qŠ´‚È‰¿'l: ◦1 (10쳌i쳌": 1쳌A2쳌A3쳌A4쳌A5쳌A쳌c) ◦a (쳌¬•¶Žš‚ÌƒAƒ‹ƒtƒ@ƒxƒbƒg: a쳌Ab쳌Ac쳌Ad쳌Ae쳌A쳌c) ◦A ('å•¶Žš‚ÌƒAƒ‹ƒtƒ@ƒxƒbƒg: A쳌AB쳌AC쳌AD쳌AE쳌A쳌c) ◦I (쳌¬•¶Žš‚Ìƒ쳌쳌[ƒ}쳌"Žš: I쳌AII쳌Aiii쳌Aiv쳌Av쳌A쳌c) ◦I ('å•¶Žš‚Ìƒ쳌쳌[ƒ}쳌"Žš: I쳌AII쳌AIII쳌AIV쳌AV쳌A쳌c) ƒXƒ^ƒCƒ‹ƒV쳌[ƒg‚ÍƒŠƒXƒg쳌€–Ú—lŽ®‚Ì'ñˆÄ‚Ì‚æ‚è'å‚«‚¢쳌_"î쳌«‚ð'ñ‹Ÿ‚·‚é쳌B CSS‚ÌƒŠƒXƒg—lŽ®‚Ì"Á쳌«‚ÍƒŠƒXƒg쳌€–Úƒ}쳌[ƒJ쳌[‚ð—}쳌§‚·‚é‰Á‚¦‚ç‚ê‚½‹@"\‚ðŠÜ‚Ýƒ}쳌[ƒJ쳌[‚Æ‚µ‚ÄƒCƒ쳌쳌[ƒW쳌A‚»‚µ‚Ä'½‚‚ðŽg—p‚µ쳌B

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: GUEST
Date: 15 Mar 10 - 02:06 PM

It looks like there ain't an open square - htmlhelp.com says:

The deprecated TYPE attribute of LI suggests the rendering of the list item marker. Possible values are as follows:

•Case-insensitive values for LI within a UL, DIR, or MENU:
◦disc (a filled-in circle)
◦square (a square outline)
◦circle (a circle outline)
•Case-sensitive values for LI within an OL:
◦1 (decimal numbers: 1, 2, 3, 4, 5, ...)
◦a (lowercase alphabetic: a, b, c, d, e, ...)
◦A (uppercase alphabetic: A, B, C, D, E, ...)
◦i (lowercase Roman numerals: i, ii, iii, iv, v, ...)
◦I (uppercase Roman numerals: I, II, III, IV, V, ...)

Style sheets provide greater flexibility in suggesting list item styles. The list-style property of CSS includes the added abilities to suppress list item markers, use images as markers, and more.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: CapriUni
Date: 15 Mar 10 - 12:56 PM

Trying to keep track of Unordered Lists, and the names for their "bullets."

List item type="circle"

List item type="square"

List item Type="disc"

List Item type="box"

second

did

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: topical tom
Date: 11 Mar 08 - 03:16 PM

Hope this works

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: topical tom
Date: 09 Mar 08 - 09:45 PM

(font colours)

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Melbert
Date: 20 Oct 99 - 04:03 PM

Mary had a little lamb The shepherd go five years Did that work?

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Okiemockbird
Date: 19 Oct 99 - 04:41 PM

That was me, Joe. Just wanted to make sure my first success with an image wasn't a fluke. T.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Joe Offer
Date: 19 Oct 99 - 04:32 PM

I'd say you've got the technique down, Anonymous. Just remember that Max doesn't want images in the regular threads - it can make the threads very slow to load. If you want to refer to images elsewhere on the Web, it's better to use clickable links (but practicing in this thread is considered admirable, and you did well indeed.
-Joe Offer-

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Okiemockbird
Date: 18 Oct 99 - 11:23 AM

try again,

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: Okiemockbird
Date: 14 Oct 99 - 01:47 PM

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: T in Oklahoma (Okiemockbird)
Date: 14 Oct 99 - 01:07 PM

MudGuard, it was a typo! Sheesh! T.

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: bill\sables
Date: 14 Oct 99 - 07:13 AM

Many thanks Mudguard and Allan I have got them to work now Cheers Bill

Post - Top - Home - Printer Friendly - Translate

Subject: RE: HTML Practice Thread
From: bill\sables
Date: 14 Oct 99 - 07:11 AM

Now for the web site oneclick here

Post - Top - Home - Printer Friendly - Translate

Next Page

Share Thread:

Reply to Thread

Subject:	Help
From:

Preview Automatic Linebreaks Make a link ("blue clicky")

Mudcat time: 23 April 8:42 AM EDT

[ Home ]

All original material is copyright © 2022 by the Mudcat Café Music Foundation. All photos, music, images, etc. are copyright © by their rightful owners. Every effort is taken to attribute appropriate copyright to images, content, music, etc. We are not a copyright resource.