Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
(This message is encoded in Latin-1, ISO-8859-1, btw. ISO-8859-15 won't show it correctly.) On Sun, 02 Oct 2005 12:17:38 -0400, David Backeberg <dave at math.mit.edu> wrote: > My real problem is in KDE, with the ??/?? key, which may look funny in > your mailer. When you read my message, the line above might well look even more strange. Beware of repeated quoting; the short, oddball char. strings you see might double in length for every quote. This is, believe it or not, my fourth draft of a reply! Should have hit the pad a lot earlier in the wee hours, and gone for a good walk, to air out the old brain. After staring at the evidence, and blaming my software, I finally got a clue... David's message has no spec. for content-transfer encoding, unless I'm really out of it. Afaik, it's a very good idea to see that outgoing messages do have such a line in their headers. David probably thought that he was typing ['] and ["], but what actually appeared in his message (I looked at it in a hex editor to confirm) were [)B?] and [?], which at small pixel/point sizes can look similar to the intended characters. Why the substitution happened is likely to be an interesting, and perhaps extended, sleuthing session. ['] is U+0027, to use the hex unicode designation; it can be called "apostrophe-quote", and it's an "overloaded" character, standing in for at least two other typographic-quality chars. ["] is U+0022, a "neutral" quotation mark; again, it stands in for two separate typographic quotes, U+201C (opening) and U+201D (closing) quotes (in English! Not necessarily so in other well-known langs.) [)B?] is U+00B4, acute accent, a spacing character. (It's decimal 0180.) I found this in David's message (consistently). [)B?] is U+00A8, dieresis, spacing, decimal 0168, also in David's message. (My ref: Unicode 3.0 book, p. 336 and following (code charts); they are also online at <unicode.org>) All four are in the Latin-1 (ISO-8859-1) repertoire, but not necessarily in other encodings. (A dieresis usually appears above a vowel, as in German "f)B?r", also spelled "fuer", meaning "for". It does change the sound of the vowel. When M)B?tley Cr?e appeared on stage in Germany, story goes that the audience chanted the name of the band happily and in usison, following the German pronunciation. Would love to know what the band thought!) I got sidetracked, thinking that KDE was substituting typographic quotes (as with MS "smart quotes", removable with the De-Moronizer, the latter being a script, I think.) Barking quietly up the wrong tree. This reminds us of the recent hacks of Web addresses where similar-looking characters are substituted, such as a Cyrillic "o" for a Latin "o" in the middle of a Latin-alphabet char. string. (I typed Latin o's for both, there.) Even within Latin-1, there are [)B?] and [?], which might look similar in some fonts (not in Verdana, though!). Fwiw, one of my hobbies is writing systems and character encodings; also interested in typography (and too much else :) .) HTH, -- Nicholas Bodley /*|*\ Waltham, Mass. (Not "MA") The curious hermit -- autodidact and polymath Opera: No more banner ads in its free version. Midnight "hacker" in 1960 (DIP, Colo. Springs)
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |