The Mudcat Café TM
Thread #67676   Message #1133104
Posted By: HuwG
10-Mar-04 - 11:47 AM
Thread Name: Tech: Copy and paste
Subject: RE: Tech: Copy and paste
Jim Dixon>, normal text is encoded as one byte (8 digits) of binary code. This allows only 255 characters in a given character set, some of which are reserved (e.g. 0x08 is the "tab" character etc). This doesn't allow for the all the combinations of letter and accent, and special punctuation marks to be found in all Western European scripts, and is hopelessly inadequate for many non-Western European languages or character sets (e.g. kanji, katakana, etc).

Unicode characters are encoded as one word (2 bytes). This allows for 65535 possible characters in a given character set. The values 0x0000 to 0x00ff (all right, 0 to 255) in a Unicode character set are usually reserved for the corresponding ASCII values, so for most copy and paste operations from normal displayed text, there is no difference as far as the user is concerned.