The Mudcat Café TM
Thread #173516   Message #4207806
Posted By: GUEST,Grishka
02-Sep-24 - 05:19 AM
Thread Name: Tech: obscure characters app native to Win10+
Subject: RE: Tech: obscure characters app native to Win10+
Dave, "why they exist" is not difficult to understand. The actual question is: why do browsers combine them correctly when entered via UTF-8 but fail to combine them when entered as entities, such as ��. The answer, I guess, is simply that emojis and entities come from different millennia.

I asked ChatGPT to write a converter for me. The first version had the same error as MrRed's. After my complaint "It should work for codepoints as well", this is what it came up with – just copy to a text file and save it as "GrishkaConverter.html"; feel free to add conversion of < to &lt; etc.:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Grishka's Text to HTML Code Converter</title>
    <style>
       body {
            font-family: Arial, sans-serif;
            margin: 20px;
       }
       #output {
            white-space: pre-wrap;
            border: 1px solid #ccc;
            padding: 10px;
            margin-top: 10px;
       }
    </style>
</head>
<body>
    <h1>Grishka's Text to HTML Code Converter</h1>
    <textarea id="input" rows="10" cols="50" placeholder="Paste your text here..."></textarea>
    <div id="output"></div>

    <script>
       document.getElementById('input').addEventListener('input', function() {
            const inputText = this.value;
            let outputText = ';

            for (let i = 0; i < inputText.length; i++) {
                const charCode = inputText.codePointAt(i);
                if (charCode <= 127) {
                   outputText += String.fromCodePoint(charCode);
                } else {
                   outputText += '&#x' + charCode.toString(16) + ';';
                }
                if (charCode > 0xFFFF) {
                   i++; // Skip the next code unit if it's a surrogate pair
                }
            }

            document.getElementById('output').innerText = outputText;
       });
    </script>
</body>