i18n

Jerry Feldman gaf at blu.org
Fri Mar 17 13:25:25 EST 2006


On Friday 17 March 2006 12:52 pm, David Hummel wrote:
> Jerry's statement above is misleading.  It's not really a problem, since
> the user interacts not with the kernel, but with applications.  glibc
> has had support for utf-8 and multi-byte locales for years now (since
> 2.2 I believe).
I specifically noted that Linux is still 8-bit and that it was based on the 
C language where the standard char data type is 8-bits. You are absolutely 
correct about applications. As I mentioned WRT lint(1), back in the early 
days of OSF1, we did not use printf, but we use a message catalog that had 
its own printing functions that were set up for the wider character sets. 

The fact is that Unix and Linux are still based on character strings that 
are composed of 8-bit characters. Certainly, most of the standard C 
functions, such as printf(3) use locales. The local not only contains the 
character sets, but also contains information such as how to format numbers 
and dates. And even the character sets themselves contain information on 
how they are sorted. In the original C language, there were functions in 
the ctype.h, such as isupper(), islower() et al. Before C89, these were 
generally all simple macros. For instance, upper case A-Z is in the range 
of 65 (0x41) through 90 (0x5A) and the lower case are in the range between 
97 (0x61) through 122 (0x7A). To convert from upper to lower, all you 
needed to do was to or in 0x20 and to go the other way you'd mask out that 
bit. But, once locales were supported, ctype.h had to resort to a set of 
tables. This is the type of thing that had to be done in glibc to implement 
locales. (Note that I am talking about locales and not specifically about 
UTF-8 or UTF-16 in this case). 

But it is always up to the application developer. A well written application 
should be able to utilize 
-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9



More information about the Discuss mailing list