Tuesday, January 30, 2007

Typing Kurdish

Kurdish (كوردی) is the official language of the Kurdistan Region of Iraq, and ever-increasing amounts of Kurdish text are being produced in Arabic script. While OS X 10.4 does not include a Kurdish keyboard layout, there is one you an download on my iDisk. A large number of free kurdish fonts have recently been made available from KurdITGroup. Unfortunately these are for Windows, but you should be able to use them correctly on a Mac with the applications Mellel or OpenOffice. For other apps, the normal OS X default font, Geeza Pro, probably does not do totally correct Kurdish orthography, and it is better to use one of the SIL fonts, Lateef AAT or Scheherazade AAT.

Friday, January 26, 2007

Want Your Browser in Your Own Language?

For most people, the web browser is the most frequently used piece of software, and having it in one's native language would be nice. Different browsers offer different possibilities in that regard. Safari comes with the 15 localizations standard in OS X, and you may be able to find other non-official versions by searching MacUpdate. Opera offers the same 15. FireFox has the best selection, with over 3 dozen localizations available, including Arabic, Hebrew, Basque, Kurdish, Mongolian, Greek, Russian, Georgian, Lithuanian, Punjabi, and Turkish. OmniWeb has only Danish, Dutch, English, French, German, Japanese and Swedish. Camino has Danish, Dutch, French, German, Italian, Japanese, Korean, Lithuanian, Polish, Portuguese, Russian, Slovak, Spanish, and Swedish. Mozilla Suite has Traditional Chinese, French, German, Polish, Swedish, and Turkish. iCab has 8 of the OS X languages plus Russian. Netscape and AOL seem to be only in English.

Thursday, January 25, 2007

Translation and Localization Tools

If you are a professional translator using a Mac, you may be interested in tools for CAT (Computer Assisted Translation). One of these is a specialized word processor that makes provision for keeping source and target text carefully organized and has a "translation memory" to automatically help the translator use earlier work to process new text. Two such applications for OS X users are AppleTrans and OmegaT. An excellent survey of the field can be found at the Wikipedia CAT page.

One important area for translation work is the localization of OS X applications, i.e. providing the means for the menus and dialogues of a program to be displayed in the user's native language. For this special tools are required to extract and replace the parts of the application containing the texts. Apple provides the programs AppleGlot and ADViewer for this purpose. There are also 3rd-party alternatives including LocFactoryEditor, iLocalize, and Localization Suite.

Monday, January 22, 2007

Spell-Checking in Other Languages

A long time ago in a galaxy far, far away, elementary education drilled correct spelling into you so throughly it was hard to forget it. These days computer spell-checking allows those brain cells to be used for other tasks. OS X comes with spell-checking for Australian, British, and Canadian English, German, Spanish, French, Italian, Dutch, Portuguese, and Swedish. (Note: Leopard adds Danish and Russian.) But what if you need a different language?

CocoAspell is one answer. First you install it, then download the dictionary you want from the list of several dozen. Decompress the file with Stuffit Expander, then just put the resulting folder in /Library/Application Support/cocoAspell/. Enable your dictionary by going to System Preferences/Spelling (a new item created by CocoAspell).

Another option for Hebrew is Hebrew Spelling Service. For Finnish you can try Soikko.

A commercial alternative is SpellCatcher X, which has a dictionary/thesaurus for English, French, German, Italian, Spanish, Swedish, Dutch, Portuguese, and Danish.

Sunday, January 21, 2007

Getting Your Mac to Speak Other Languages

OS X includes excellent text-to-speech capabilities, activated in System Preferences/Speech/Text-To-Speech or doing Command + F5 to turn on VoiceOver (System Preferences/Universal Access/Seeing/VoiceOver). Unfortunately the supplied voices only do English. Users who need other languages need to acquire third-party voices and sometimes use non-Apple applications.

Speechissimo and Cepstral offer French, German, Italian, and Spanish.

AssistiveWare has several different products. TextParrot offers French, German, Italian, Danish, Dutch, Finnish, Flemish, Spanish, Portuguese, Norwegian, and Swedish. For more features, there are Infovox iVox, VisoVoice, and Proloquo.
The application Key5 has a Chinese text-to-speech module.

DTalker is said to be able to do Japanese text-to-speech.

OS X 10.5 Leopard, to be released the spring of 2007, is expected to include expanded support for foreign language add-on speech synthesizers, including Chinese and Japanese.

Friday, January 19, 2007

Advanced Japanese Input

Japanese is no doubt one of the most complex of all languages to write, since it can use no less that four different scripts: Kanji (Chinese characters), Hiragana, Katakana, and Latin.

The Hiragana and Katakana syllabaries play an especially important role, because their Latin equivalents are used for Japanese computer input (with subsequent conversion to Kanji as appropriate) via a Latin keyboard, and they are also used to represent the pronunciation of non-Japanese words or possibly unfamiliar Kanji.

There are well over 150 kana syllables which can be created by the Mac Kotoeri Japanese Input Method, some fairly rare, and info on how to make all of them is buried in the Japanese-only Kotoeri Help. Anyone who needs this in more usable form can find a copy of the list here, which you can enlarge in your browser or drag onto your desktop to print for reference.

Sunday, January 14, 2007

Typing Navajo

Navajo uses the Latin letters of English plus a number of extras that make it not that easy to render in ordinary print: Łł Ńń Áá Éé Íí Óó Ąą Ęę Įį Ǫǫ Ą́ą́ Ę́ę́ Į́į́ Ǫ́ǫ́. The last four do not have any precomposed version in Unicode, which means some apps will probably not place the two required diacritics quite correctly. In addition, Navajo makes heavy use of the glottal stop character, ʼ , which should probably best be represented by the modifier letter apostrophe (U+02BC). Some people have used the ascii apostrophe (U+0027) or the right single quote (U+2019) instead, but these are punctuation marks rather than real letters and it is better to avoid them.

Typing Navajo can be done using the US Extended keyboard layout in OS X. The glottal stop ʼ is made via Option + i, then space.

Navajo characters (with the addition of ṉ) can also be used for the closely related language Western Apache.

For an example of Navajo text, see this copy of a 1956 school reader.

For an example of Navajo on the Web, see here.

For a specialized keyboard, try Languagegeek's.

Wednesday, January 10, 2007

Multilingual iPhone?

Watching the very impressive demo of the new Apple iPhone, I was wondering whether the version of OS X (as well as the email client and the browser) incorporated in it will support the same multilingual features that 10.4 does. I haven't seen any mention of this in reports from MacWorld or elsewhere so far. It would be cool to point the browser at a test site like this one and see what all can at least be displayed.

Monday, January 8, 2007

Typing Tagbanwa

Tagbanwa is a language spoken by a few thousand people in the Philippines. Although its script became part of Unicode with version 3.2, only recently do we have a font, thanks to Samuel Thibault. You can also find an OS X keyboard for Tagbanwa on my iDisk.

Saturday, January 6, 2007

Your Multilingual iPod

While the iPod still lacks the capability to display Arabic, Hebrew, Hindi, Thai, Vietnamese, and various other complex scripts, it is fully Unicode-savvy for the 28 languages that are listed in its tech specs. You can display all of them on a single page of plain text as long as it is encoded in UTF-16. If you want to be able to demonstrate this for yourself, download the file ipodlangs16.txt from my iDisk, and put it in the Notes folder of your iPod. This is a selection from the UTF-8 Sampler Page, which provides the phrase "I can eat glass and it doesn't hurt me" in many languages.

The iPod can also display multilingual text in UTF-8 encoding as long as the text starts with a BOM (Byte Order Mark). TextEdit does not create UTF-8 with a BOM, so you need to save the text with a program like TextWranger which has that option. If you are curious about what a BOM does, see here.

iPod notes can only be 4K in length, but you can break longer texts into pieces of the right size with links from page to page using a program like Book2Pod.

Monday, January 1, 2007

Comparing MS Windows Internationalization

Info on the current relatively good state of OS X localization can be found here. MS Win XP has, by contrast, only been available to consumers in one language at a time. But Win Vista, which should be out at the end of January, 2007, promises a new Multilingual User Interface (MUI) which is likely to equal OS X capabilities and also go further by adding Arabic, Hebrew, Russian, Czech, Hungarian, Polish, Turkish, Greek, and possibly more. On the other hand, the new MUI will apparently only be provided with Vista Ultimate (retail price $400) and not with the cheaper Home Basic, Home Premium, and Business editions.

A list of various types of Vista language packs and input keyboards can be found here.

As for the Zune, I understand that its current software only does Latin script, which makes the device very limited indeed compared to the iPod.

Typing Multilingual Text in Terminal

I personally don't have much need to read/write Unicode in OS X's Unix command-line environment, but some people do find this useful or essential. Whether you can get it work satisfactorily depends on the languages and apps you are using and probably also on the fonts you have. Getting started involves setting Terminal's preferences correctly and creating .inputrc and .profile files in your Home directory to change the bash shell default behavior. Details on how to do this can be found here. That may be enough for some purposes. If not, some suggestions for further refinements are here.