Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

i18n



On Friday 17 March 2006 12:52 pm, David Hummel wrote:
> Jerry's statement above is misleading. ?It's not really a problem, since
> the user interacts not with the kernel, but with applications. ?glibc
> has had support for utf-8 and multi-byte locales for years now (since
> 2.2 I believe).
I specifically noted that Linux is still 8-bit and that it was based on the 
C language where the standard char data type is 8-bits. You are absolutely 
correct about applications. As I mentioned WRT lint(1), back in the early 
days of OSF1, we did not use printf, but we use a message catalog that had 
its own printing functions that were set up for the wider character sets. 

The fact is that Unix and Linux are still based on character strings that 
are composed of 8-bit characters. Certainly, most of the standard C 
functions, such as printf(3) use locales. The local not only contains the 
character sets, but also contains information such as how to format numbers 
and dates. And even the character sets themselves contain information on 
how they are sorted. In the original C language, there were functions in 
the ctype.h, such as isupper(), islower() et al. Before C89, these were 
generally all simple macros. For instance, upper case A-Z is in the range 
of 65 (0x41) through 90 (0x5A) and the lower case are in the range between 
97 (0x61) through 122 (0x7A). To convert from upper to lower, all you 
needed to do was to or in 0x20 and to go the other way you'd mask out that 
bit. But, once locales were supported, ctype.h had to resort to a set of 
tables. This is the type of thing that had to be done in glibc to implement 
locales. (Note that I am talking about locales and not specifically about 
UTF-8 or UTF-16 in this case). 

But it is always up to the application developer. A well written application 
should be able to utilize 
-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org