![]() |
Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
#To: <discuss-bounces at blu.org> (Robert La Ferla) | | BTW - One of the most powerful features of Mac OS X (and it's ancestor | NextStep) is the NSText class (and related classes) in the | ApplicationKit API. The NSText class is what makes OS X applications | easy to localize. It supports East Asian, Arabic/Hebrew (right to left | scripts), etc... natively. It was a monster to program but the results | are fantastic. Nearly all OS X apps use it for everything from simple | text fields (shared text object) or whole documents. As a result, you | can enter text in any language nearly anywhere in an app and copy/paste | it to other apps, print it, etc... Hmmm ... You must have a very different instantiation of OS X than what's on my wife's and my Powerbooks. Shelley and I have done a fair bit of experimenting with text in languages with non-Roman charsets, and our experience is very different. I can hardly find any apps that correctly implement copy/paste for anything but English text. Most of them produce gibberish for non-Latin1 charsets at least part of the time. They can hardly even handle simple Cyrillic text sanely. As a test case, I recently set up a stress-test example in my online music collection: http://trillian.mit.edu/~jc/music/abc/China/GIS_4T_D_W.utf8.abc http://trillian.mit.edu/~jc/music/abc/China/GIS_4T_D_W.utf8.txt http://trillian.mit.edu/~jc/music/abc/China/GIS_4T_D_W.utf8.pdf http://trillian.mit.edu/~jc/music/abc/China/GIS_4T_D_W.utf8.ps The .abc and .txt files are the same file, with different MIME types. If you look at the .txt file in firefox, you'll see several versions of the title, in the original Chinese, in a Pinyin transliteration (note the U-umlaut in the first word), in English, and in Arabic. With firefox, all are displayed correctly. I can also cat this file in a Mac Terminal window, and it the titles render correctly. I can ssh to this FreeBSD system or to my linux box and cat the file, and they come out correct. I can also ssh from an xterm on my linux box th the other machines, cat the file, and it's fine. This seems to prove that Unicode fonts are installed, the terminal emulators understand UTF-8, and none of the software in the sss+tcsh+cat chain break anything. With Safari, which is an Apple browser, the Arabic is rendered badly. The right-left order is ok, but the letters are all initial forms, and they aren't connected. This isn't acceptable (in civilized Arabic society ;-). Getting the letter forms wrong makes the text very difficult to read. The .ps file is an interesting case that fails badly with all the PS renderers that I have. Download it to disk and run the command: : head -1115 GIS_4T_D_W.utf8.ps | tail -20 You'll see the PS versions of the titles, and if your Mac or linux box is like mine, the Chinese, Pinyin and Arabic titles will all be correct, if not very aesthetic. This proves that the abc2ps translator got the titles correct (and that the terminal emulator can handle the UTF-8 encoding when it isn't explicitly labelled as such). The .pdf file was derived from the .ps file by feeding it to ps2pdf, which comes with FreeBsd and linux; there's a pstopdf on Macs that seems similar. I don't know how to verify that the .pdf file has the titles correct. But try downloading either the .ps or .pdf file via a browser. I just did it with Safari, which renders PDF inside a window. The three non-English titles are all trashed. The Chinese and Arabic are Latin-1 gibberish. If you download them with firefox, they get fed to various renderers. I've got them on my Mac's screen with both Preview and Acrobat, and both show the same Latin-1 gibberish as does Safari. So they're all making similar mistakes. The evidence for what has gone wrong is in the Pinyin title, where the second letter, u-umlaut, comes out as A1/4, with a tilde above the A. So what they're apparently doing is interpreting the charset as Latin-1 (ISO 8859-1). Why would this be? Well, the main way that files downloaded via HTTP is from the HTTP headers. Here are the headers that this machine's apache server delivered to me: HTTP/1.1 200 OK Date: Sun, 19 Mar 2006 14:46:27 GMT Server: Apache/1.3.34 (Unix) Last-Modified: Sat, 11 Mar 2006 01:07:51 GMT ETag: "321f4b-4f1-441222e7" Accept-Ranges: bytes Content-Length: 1265 Connection: close Content-Type: text/vnd.abc; charset=utf-8 We can see here that the content is clearly labelled "charset=utf-8", as we'd expect from the ".utf8." in the file name. So the browsers know that the text is UTF-8. But all the attempts to display the .ps file, whether via Preview, Acrobat, or the Safari browser, all garble the titles the same way. Anyway, I'm not too impressed by all this. It's not just that Apple gets it so wrong; so do Adobe and Mozilla programs. What's really the annoying part is that the Apple crowd just keeps chanting "It Just Works", even when I post examples like this and try to get them to exlain why it fails so badly on our Powerbooks. My wife has been using the Middle-East news as an excuse to improve her Arabic, and she does things like reads Al Jazeera's Arabic pages. There are also a lot of local blogs in Arabic, only a few of which are also in English. Interesting stuff. But there's an ongoing frustration with getting software to work right. She also has a Windows box. It does some things right that her Mac messes up, but the MS software also garbles some other stuff that is good on the Mac. Both can be summarized as "not quite there and frustrating as all hell". One of my motives here is to extend this musical example. I'd like to have music files like this that mix languages, not just in the titles, but also in the lyrics. Getting even the simplest examples to display right is an ongoing nightmare. Thus, I have a lot of songs in Hebrew and Yiddish, which aren't quite the typographical nightmare that Arabic is, but have similar problems. I used Arabic in the above song title simply because it's known to be the worst case, so it's a slightly better test case than Hebrew would have been. BTW, Textedit seems to work fine on Macs with mutiple languages, other that the usual problem of getting the charset right at the start. But copying from a Textedit window to other windows fails so often that I'm often surprised when it works correctly. It'd be interesting to see this example work on linux. I've been considering testing Ubuntu, since it's aimed at exactly this sort of multilingual user population. But it would be interesting to find some good info on how to do such things right on any system. And Macs are nice in some ways; it's too bad that the ad claims fall down so badly. -- _, O John Chambers <:#/> <jc at trillian.mit.edu> + <jc1742 at gmail.com> /#\ in Waltham, Massachusetts, USA, Earth | |
![]() |
|
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |