Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Ed Hill wrote: >> >>The problem with Unix/Linux is that it is still based on 8-bit characters, >>and an internationalized program must be set up to use either 16-bit or >>wider. Java was written where it's native character type is 16-bits which >>is sufficient for a majority of languages, but not for Asian languages. > > The above, as written, is simply not true. UTF-8 is a perfectly valid > Unicode encoding and, for the characters that match the ASCII 0x00 to > 0x7F, it uses the *identical* 8bits/character encoding and is therefore > largely (read: as much as possible) backwards-compatible with older > programs, text files, etc. The standard Unix string-handling libraries don't know from UTF-8, so, for example, they will assume that every character is one byte wide. You could encode "avi)B?n.txt" in UTF-8 and use it as a file name, and a terminal window configured to use UTF-8 would be able to display that name. But in order for "ls avi?n.txt" to work, the shell's globbing algorithm would have to recognize that "\xc3\xb3" is the single UTF-8 character ")B?" (and not, say, the two ISO-8859-1 characters "$)B??").
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |