The Mudcat Café TM
Thread #6390   Message #3890488
Posted By: GUEST,Grishka
26-Nov-17 - 08:57 AM
Thread Name: tech: HTML Ampersand Codes
Subject: RE: tech: HTML Ampersand Codes
The "charset" problem is even more complicated. The backlog has been accumulated without any specification of codepage, with the effect that many codepages are used simultaneously, all of which are 8-bit.

Max seems to be experimenting right now, once again. The setting "charset=iso-8859-1" has been added quite recently; he tried UTF-8 about a year ago. Either of these has well-known consequences, some good, some bad. We have discussed "ad nauseam" methods to work up the backlog.

More importantly: Future handling should be such that the result is independent of the posters' codepage settings, and relieve them from the burden of having to replace special characters (such as those prettified apostrophes) by their HTML ampersand codes.

Max has now found a method which may be a good solution of this problem:

<FORM ACTION="ThreadNewMess-Sub.cfm" METHOD="POST" accept-charset="utf-8">

This should ensure that any (Unicode) character ever posted arrives at the server script in UTF-8 format. It is up to that script to transform it to an ampersand code if necessary (- for umlauts, it may choose to transform them to iso-8859-1 to save space.)

For some hours the other day, this actually seemed to work, but only for posts from and to the Preview page, in other words: when clicking "Preview" on the Preview page.

When we pointed out the discrepancy, Max removed the "accept-charset" clause from the Preview page, so that it now works even worse than before, because "charset=iso-8859-1" is still present.

I have the strong feeling that accept-charset="utf-8" is the right thing to have on both pages. Since they call the very same script "ThreadNewMess-Sub.cfm" on the server, things may be getting wrong inside it, and only in some cases. Another possible explanation is that our browsers do it, in some obscure way.

We have offered Max more than once to analyze that script collectively, although real experts seem to be absent. (I myself am not even a semi-expert, alas. "Google is my scholarship.")

Any insider information and progress report will be appreciated.

?