Saturday, April 28, 2007

The Mystery of the Bogus Hebrew

I often field questions on how to get Hebrew to input and display correctly, but rarely does anyone ask me how to get rid of it. The primary example of the latter is a strange problem that occurs with iTunes, where Hebrew replaces normal English in some circumstances. You can see an example here.

I've never been able to figure out the details of why this occurs, but the cure, which someone else found by chance, is normally to remove any copies of the font Lucida Grande.ttf (NOT Lucida Grande.dfont) found on your machine.

Solving a Cyrillic Puzzle

Recently someone in the Apple forums had a problem with text he knew was in Russian but which would only display in Latin. It was just one word, "Dybvfybt". Since no encoding currently used for Cyrillic maps this script to ASCII, the text must have been made with some kind of custom font that did just that. But normally anyone making such a font would map the Cyrillic characters to their Latin equivalent, so the text should be recognizable as transliterated Russian, which it wasn't.

This stumped me, until I wondered whether someone had made a font designed especially to allow typing according to the standard Russian keyboard layout but on a Western keyboard. Instead of the QWERTY used in the U.S., the Russian layout goes YTsUKEN. Sure enough, typing Dybvfybt according to my U.S. keyboard, but with the layout set to Russian, produced a recognizable word, Внимание = Vnimanie = Attention.

It's the first time I've seen Cyrillic text encoded that way, and I haven't yet found an available font that supports it.

Tuesday, April 24, 2007

Typing Kyrgyz

Kyrgyz is a Turkic language spoken in Kyrgyzstan and a few other areas. While both the Arabic and Latin scripts have been used to write it in the past, Cyrillic is now the standard. The alphabet is essentially the same as Russian but with 3 additional characters: Ң, Ү, Ө.

OS X comes with fonts that cover Kyrgyz, but with no keyboard layout. On my iDisk you can find two versions: KyrgyzCYR, which is the same as used on Windows machines, and KyrgyzPH, which is modeled on the Apple Russian-Phonetic layout and may be easier for people used to QWERTY. The extra three characters are found on the Option and Option + Shift levels for н, у, and о.

Switching from Cyrillic back to Latin (which was used for a number of years before 1940) is reportedly being considered.

Tuesday, April 17, 2007

Pitfalls of Working with Complex Scripts

It's not unusual to have to do publishing or design work in languages and scripts you do not understand. In some scripts, if you use the wrong font or app, you can easily generate nonsense without realizing it. So having a native speaker (or at least someone who knows the script) check things can be important. Below are some examples of common pitfalls you may run into. In particular, be aware that MS Word for Mac, even the 2008 version, does not yet support correct display of these scripts.

Arabic script, used for many other languages than just Arabic, can wind up disconnected or backwards.

Indic scripts, used for example in Hindi/Sanskrit, Gurmukhi, Gujarati, Tamil, and Tibetan can easily wind up with letters in the wrong order, overlapping, or uncombined (when combination is mandatory).

Thai and other S.E. Asian languages don't use spaces to separate words, so line breaking can occur in totally wrong places. Apple Cocoa apps can access a dictionary built into OS X that enables them to do Thai line breaking fairly well, but MS, Adobe, and other apps cannot.

Monday, April 16, 2007

Typing Catalan

Catalan is a co-official language along with Spanish in the region on the Eastern edge of Spain. It uses the same alphabet as Spanish except it has no ñ and adds the digraph l·l (ela geminada).

OS X 10.3 had a Catalan keyboard layout which was identical to Spanish-ISO except for the flag icon. This seems to be no longer present in my 10.4. From my iDisk you can download the Catalan folder, which contains the .keylayout file and the .icns file with the right flag. The layout is almost the same as Spanish-ISO. The middle dot · (U+00B7) which is the standard for producing l·l is at shift + 3. But also included are ŀ (U+0140) and Ŀ (U+013F) at option + o and option + shift + o, in case you want to use these versions for local printing or similar purposes.

Thursday, April 12, 2007

Typing Kazakh

Kazakh is a Turkic language spoken in Kazakhstan and a few other areas. While both the Arabic and Latin scripts have been used to write it in the past, Cyrillic is now the standard. The alphabet is essentially the same as Russian but with 9 additional characters: Ә, Ғ, Қ, Ң, Ө, Ұ, Ү, Һ, І.

OS X comes with fonts that cover Kazakh, but with no keyboard layout. On my iDisk you can find two versions: KazakhCYR, which is the same as used on Windows machines, and KazakhPH, which is modeled on the Apple Russian-Phonetic layout and may be easier for people used to QWERTY.

Switching from Cyrillic back to Latin (which was used for a number of years before 1940) is reportedly under consideration by Kazakhstan.

Sunday, April 1, 2007

Publishing with Complex Scripts

Those wishing to do serious desktop publishing, including books, in scripts like Arabic, Devanagari, or Tamil in OS X are handicapped by the fact that standard applications like Word, InDesign, and Quark cannot yet render OS X Unicode fonts for these languages correctly. Alternatives are the limited TextEdit or Pages (which is buggy for Arabic), or using Windows fonts with OpenOffice. There does exist a special ME version of InDesign for Arabic/Hebrew.

Recently I came across a DTP program called iCalamus that appears to combine a wide range of DTP capabilities with proper rendering of complex scripts. Users with this kind of requirement may want to download the free trial here and give it a try. Unfortunately iCalamus does not yet support direct input and editing of Indic -- you have to copy/paste your text from TextEdit or another app where this can be done. A somewhat similar app which does support direct input is Create.