Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

stupid keyboard question



(This message is encoded in Latin-1, ISO-8859-1, btw.
  ISO-8859-15 won't show it correctly.)

On Sun, 02 Oct 2005 12:17:38 -0400, David Backeberg <dave at math.mit.edu>  
wrote:

> My real problem is in KDE, with the ??/?? key, which may look funny in  
> your mailer.

When you read my message, the line above might well look even more  
strange. Beware of repeated quoting; the short, oddball char. strings you  
see might double in length for every quote.

This is, believe it or not, my fourth draft of a reply! Should have hit  
the pad a lot earlier in the wee hours, and gone for a good walk, to air  
out the old brain.

After staring at the evidence, and blaming my software, I finally got a  
clue...

David's message has no spec. for content-transfer encoding, unless I'm  
really out of it. Afaik, it's a very good idea to see that outgoing  
messages do have such a line in their headers.

David probably thought that he was typing ['] and ["], but what actually  
appeared in his message (I looked at it in a hex editor to confirm) were  
[)B?] and [?], which at small pixel/point sizes can look similar to the  
intended characters. Why the substitution happened is likely to be an  
interesting, and perhaps extended, sleuthing session.

['] is U+0027, to use the hex unicode designation; it can be called  
"apostrophe-quote", and it's an "overloaded" character, standing in for
at least two other typographic-quality chars.

["] is U+0022, a "neutral" quotation mark; again, it stands in for two  
separate typographic quotes, U+201C (opening) and U+201D (closing) quotes  
(in English! Not necessarily so in other well-known langs.)

[)B?] is U+00B4, acute accent, a spacing character. (It's decimal 0180.) I  
found this in David's message (consistently).

[)B?] is U+00A8, dieresis, spacing, decimal 0168, also in David's message.

(My ref: Unicode 3.0 book, p. 336 and following (code charts); they are  
also online at <unicode.org>)

All four are in the Latin-1 (ISO-8859-1) repertoire, but not necessarily  
in other encodings.

(A dieresis usually appears above a vowel, as in German "f)B?r", also  
spelled "fuer", meaning "for". It does change the sound of the vowel. When  
M)B?tley Cr?e appeared on stage in Germany, story goes that the audience  
chanted the name of the band happily and in usison, following the German  
pronunciation. Would love to know what the band thought!)

I got sidetracked, thinking that KDE was substituting typographic quotes  
(as with MS "smart quotes", removable with the De-Moronizer, the latter  
being a script, I think.) Barking quietly up the wrong tree.

This reminds us of the recent hacks of Web addresses where similar-looking  
characters are substituted, such as a Cyrillic "o" for a Latin "o" in the  
middle of a Latin-alphabet char. string. (I typed Latin o's for both,  
there.) Even within Latin-1, there are [)B?] and [?], which might look  
similar in some fonts (not in Verdana, though!).

Fwiw, one of my hobbies is writing systems and character encodings; also  
interested in typography (and too much else :) .)

HTH,

-- 
Nicholas Bodley  /*|*\ Waltham, Mass. (Not "MA")
The curious hermit -- autodidact and polymath
Opera: No more banner ads in its free version.
Midnight "hacker" in 1960 (DIP, Colo. Springs)







BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org