Friday, December 29, 2006

????'s When You Post in a Forum With Safari?

Someone in the Apple discussions asked why his Greek and Cyrillic postings in another forum turned into question marks when he used Safari but displayed correctly when he used FireFox.

This occurs because some forums have the encoding of their web pages set to Latin-1 even though it is understood that members will post in languages that cannot be covered by that charset. When that is the case, a non-Latin character has to be converted into a "Numerical Character Reference (NCR)" escape code. For example, Greek Alpha α becomes &#945, where the number is the Unicode code point (decimal) for the character. This is essentially an html kludge developed for the situation years ago when computer and internet technology was so limited that the only safe way to display non-Latin characters was to convert them to such ASCII-only codes.

It happens that FireFox, when faced with a page where the encoding is totally wrong for the characters being input, will automatically produce these NCR's instead of the real character. Safari does not do that, so what it puts into the forum appears as question marks when viewed as Latin-1.

If you must use Safari, download/install UnicodeChecker, set its Preferences/XHTML to Decimal, and use it to convert your non-Latin text into NCR's before posting. This can be done by selecting the text and going to Safari > Services > Unicode > Unicode to HTML Entities.

Forums intended to accommodate languages beyond English and those of W. Europe should, if at all possible, have the encoding UTF-8 and not Latin-1. The Apple forums themselves are a good example of the correct approach, where Safari works perfectly to input any language you want.

Tuesday, December 26, 2006

Foreign Language Broadcasts via Internet

For listening to foreign language news and other broadcasts, internet radio is taking the place that used to be occupied by short-wave listening for many people. A Google search will quickly bring you to the web pages of a huge number of stations that offer listening to their live broadcasts in RealAudio, WindowsMedia, or other formats. If you want a more systematic way of doing this, I recommend the Reciva Radio Portal, which has over 5000 stations in its database and lets you create customized lists of the ones you want to listen to.

For something more radio-like, check out the AE Wifi Internet Radio, which is essentially a small dedicated computer that connects to the Reciva Portal over your local wireless network. It's available from various sources in the U.S. I got one for Christmas and it works beautifully.

Sunday, December 24, 2006

Writing Phags-Pa

Phags-pa is a script once used for Mongolian and Chinese and occasionally today for Tibetan. Andrew West has just produced a font for this which can be downloaded here. A keyboard layout for OS X which mirrors Andrew's version for Windows can be gotten from my iDisk.

Unfortunately as far as I can tell neither OS X nor X11 apps can yet display Phags-pa totally correctly, only Windows. Here is a test page.

Update Nov. 2007: OS X 10.5 Leopard can display Phags-pa correctly in Safari and Pages, but not in TextEdit.

Thursday, December 21, 2006

Where Is The Romanian S-Comma?

A Mac user in California asked today where they could find the ș (s-comma) needed for Romanian in the OS X keyboard layout for that language. The answer is that this character happens to be on a key which does not exist on keyboards sold in the US (known as ANSI or 101 keyboards), but only on keyboards sold in Europe (known as ISO or 102).

The solution is to download and install an alternative Romanian layout. Two sources for one can be found here.

How could Apple do this? I don't have the details, but I think it may be because Macs sold in Romania have a physical keyboard labeled for use in more than one country/language in the region, which required some compromises in key placement. The ș wound up on the extra key, and the layout provided with OS X has to follow that, so Romanians in Romania can type their own language.

Wednesday, December 20, 2006

Problems Using PC Keyboards?

Recently someone was trying to type on his Mac with a PC Arabic keyboard and reported that the layout did not match what OS X was using. Unfortunately that's true for a lot of languages other than US English -- Mac and PC keyboards may have somewhat different layouts, and OS X only has software for the Mac versions. Since the creation of the Mac Mini more people seem to be using PC keyboards.

The solution is to install custom layouts that match the PC models. On my iDisk you can find some for Arabic, Russian, Azeri, Urdu, Mongolian, and Tamil. For various European languages, you can get layouts that may work better with PC keyboards from Logitech, as explained here. Beyond that, you can use Ukelele to make your own.

Monday, December 18, 2006

Macedonian Keyboard Error

Yesterday a poster in the Apple Forums pointed out that the Macedonian keyboard layout supplied by Apple has a mistake in it, where the character з (U+0437) has been replaced by э (U+044D). The uppercase version is OK, however. Also the layout has as deadkey which does not belong there. A corrected keyboard, Macedonianz.keylayout, is available on my iDisk.

I checked the Panther layout and found the same errors. It seems strange I've never seen this reported earlier. Is it possible that before now no one has used OS X to type Macedonian since Panther was released?

Friday, December 15, 2006

Writing Ancient Egyptian

The most common ways of representing Ancient Egyptian are hieroglyphs and their Latin transcription.

The total number of hieroglyphs, for example recorded in Hieroglyphica, is nearly 7000. They have not yet been put in Unicode, though a proposal to cover a basic set of about 1200 of them is in the works. In the meantime, the solution is to use custom non-Unicode fonts along with special editing programs that let you arrange the symbols in the various ways they are found naturally. On OS X, MacScribe can be used for this. An example of how it works can be seen here.

For Latin transcription alone, Unicode is possible. The most common standard alphabet currently used has 24 letters (all consonents, no vowels):

ȝ ỉ y ʿ w b p f m n r h ḥ ḫ ẖ s š ḳ k g t ṯ d ḏ

10 of these are not found in English. They can all be entered via the Character Palette in OS X, but it is a lot easier to use a custom keyboard, such as the EgyptTrans.keylayout you can download from my iDisk. There are also transcription systems which use only ASCII, such as found in the Manuel de Codage. MacScribe uses a system like this for input.

Egyptian was also written in the hieratic and demotic scripts and in Coptic. For the first two I am unaware of any Unicode proposals or fonts, but Coptic is in Unicode 4.1 and you can download a Coptic2005.keylayout from my iDisk. You also need one of the fonts that covers Coptic -- ALPHABETUM Unicode, Code2000, MPH 2B Damase, or New Athena Unicode.

Friday, December 8, 2006

Writing Esperanto

Esperanto is the most popular of various artificial languages, invented in 1887 by L.L. Zamenhof. For good info, check out the Wikipedia article.

Esperanto uses essentially the same alphabet as English, but with the extra letters ĉ, ĝ, ĥ, ĵ, ŝ and ŭ. To type these in OS X, you can activate the the US Extended keyboard layout in System Preferences/International/Input menu. The letters with the ^ (circumflex) over them can be typed by doing Option + 6 followed by the letter itself. The ŭ (u-breve) is made by doing Option + b followed by u.

You can also download an Esperanto keyboard layout from my iDisk, which will let you type the special letters more easily (the accented characters are at Option + the base character).

Wednesday, November 29, 2006

Typing Tifinagh

Tifinagh is a script used for some Berber languages, in particular Tamazight in Morocco. For more info see this page. An experimental keyboard, Tifinagh.keylayout, is available on my iDisk.

Fonts which contain Tifinagh include Code2000, Hapax Berbère, Hapax Touareg, Hapax Touareg DàG, MPH 2B Damase.

Tuesday, November 28, 2006

Typing Shavian

Shavian is a script named after G. B. Shaw designed to represent English with simple, phonetic characters. For more info see this page. "Shavian" written in Shavian looks like this:



There are only three fonts that contain the Unicode Shavian range, Code2001, Andagii, and MPH 2B Damase. For input I have made an experimental Shavian.keylayout available on my iDisk.

Monday, November 27, 2006

Typing Mongolian (Cyrillic)

Mongolian Cyrillic uses the same alphabet as Russian, but with two extra characters, Өө and Үү. Unfortunately these are not included as options in any of the Cyrillic keyboard layouts included with OS X, so you need to use the Character Palette or install a custom layout. You can download a MongolianCYR and MongolianQWERTY layouts that do have them here.

Lucida Grande is the only font that comes with OS X that has the two extra characters. If you have Office2004, then the Arial, Monaco, Times, and Times New Roman that come with it should also work. Others you can download are Everson Mono, Charis SIL, Doulos SIL, Code2000, and Gandhari Unicode. Some MS Office Chinese fonts have them, but these are double-width.

Some users have reported that the keyboard works in every app except MS Word. If that is a problem for you, try typing some Mongolian in another app and then copy/pasting into Word. This may force it to recognize the keyboard.

Saturday, November 25, 2006

Typing Yi/Lolo

Yi (or Lolo) is spoken by 4-5 million people in SW China and is written using 1165 characters representing individual syllables. For some interesting info, including lists of the syllables and sample text, see the Babelstone Yi Page.

The logical way to type Yi is with an input method like those used for pinyin Chinese. OS X includes a facility for creating custom IM's, so I made an experimental one for Yi. You type in the Latin letters for the syllable, hit return, and the right character is placed in your text.

To install this IM, download the file yi16.txt.dat from my iDisk. Then use the "Generate IM Plug-in" command in the Traditional Chinese input method menu, and select Yi in the latter. If you are using something earlier than Tiger, you follow a different procedure -- see this page for details.

Friday, November 24, 2006

For Unicode Sanskrit, Try OpenOffice

Doing Unicode Sanskrit in OS X apps like TextEdit or Pages faces two problems: You are restricted to the one font Devanagari MT, and this font cannot do some conjunct forms or handle the stress marks for Vedic texts correctly. My tests indicate that the app OpenOffice/X11 (not OpenOffice 3) can use Windows fonts to display correct Devanagari and some of these, like Sanskrit 2003, will position stress marks as they should be and also create proper conjuncts. A screenshot of the first line of the Rig Veda showing the stress marks can be found here.

Aurabesh?

Aurabesh is not a language or script, but an alternative alphabet for English which is used in the Star Wars Saga. It is not in Unicode and never will be, but you can play with it by downloading a font like this one, and using it while typing normally with the US keyboard layout. My name in Aurabesh looks like this:



Here is some more info.

Thursday, November 23, 2006

Doing Sanskrit with Dvorak

Anyone who wants to input Devanagari and transliterated Sanskrit using Dvorak keyboard layouts is welcome to try those located here in the folder sanskritdvorak. Our thanks to Paul Alix and also David Mundie.

Wednesday, November 22, 2006

Why Does My French Turn Into Chinese?

One of the common questions I see in the Apple Mail forum is from people using European languages who find their messages contain strange Chinese characters when received on a PC with Windows Outlook.

Here is my understanding of how this happens.

Certain kinds of messages are sent by Mail with two copies -- one in plain text with the charset UTF-8, and one in html with the charset Latin-1. There appear to be two bugs in Outlook. The first one causes it to confuse the two encodings and read Latin-1 characters beyond ascii in the html copy as if they were UTF-8. So, for example in the French phrase

pensé qu'il

it sees the é + space + q as a series of 3 bytes, E9 20 71, forming one character. (In UTF-8 a byte beginning with E signals a 3 byte character.)

E9 20 71 is not in fact a valid UTF-8 sequence, but Windows or Outlook has another bug: It doesn't care whether the sequence is valid or not. It looks at the binary for the last two bytes this sequence, which is

(E9) 00100000 01110001

and only reads the last 6 bits of each of them, assuming that the first 2 are 10 (which is what valid UTF-8 should normally have) instead of 00 and 01. So it interprets this as (E9) 10100000 10110001 or E9 A0 B1, which is valid UTF-8 for 頱. Thus "pensé qu'il" becomes "pens頱u'il."

Other accented characters may give different results, including question marks or complete absence of the character.

I don't know whether Vista will have the same behavior.

For fixes for this problem, see this note.

Unicode CJK Extension C

Are you dying to know what Chinese characters will likely be added to Unicode when CJK Extension C is approved? The contents of the current proposal for 4000+ new characters can be found at these urls:

http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134.doc
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A1.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A2.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134A3.pdf
http://std.dkuug.dk/JTC1/SC2/WG2/docs/N3134AB.xls

Vietnamese in OpenOffice/X11

A poster in the Apple Forums has pointed out that none of the standard keyboards used for Vietnamese input will work in OpenOffice, because it doesn't recognize their deadkeys, which are essential for typing the large number of diacritics used in this language. It turns out that you have to make a custom X11 keymapping file in order to do Vietnamese in this app. The basic info is contained in this note:

Making a Keyboard for OpenOffice/X11

And more details can be found here.

Tibetan in OS X

Until recently, I thought that the only way to do correct Unicode Tibetan in OS X was to purchase the Tibetan language kit from XenoTypeTech. This is because OS X requires an AAT font for this script, and that was the only source. But I have discovered that the program OpenOffice/X11 can display correct Tibetan using free Windows OpenType fonts. Also that there are some free keyboard layouts which work with OpenOffice. For full info, see my note at

Typing Tibetan

Missing Keyboards

Scripts included in Unicode 5.0 for which fonts exist but there is as yet no Mac input keyboard are:

Syloti Nagri, Kharoshthi, Tagalog, Hanunoo, Buhid, Deseret

Unicode 5 Scripts

Unicode 5, released August 2006, includes 5 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboard, the characters can of course still be entered directly from the Character Palette.

N'ko

Fonts: Code2000
Keyboards: Xenotypetech

Phoenician

Fonts: ALPHABETUM Unicode, Code2001 and MPH 2B Damase
Keyboards: Phoenician.keylayout

Phags-Pa

Fonts: Babelstone
Keyboards: Phags-pa.keylayout

Balinese

Fonts: None
Keyboards: None

Sumero-Akkadian Cuneiform

Fonts: (Hittite glyphs) FreeIdg
Keyboards: None

Unicode 4.1 Scripts

Unicode 4.1, released in the Spring of 2005, included 8 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette. But it seems like we should have more keyboards by now.

Buginese

Fonts: Code2000, MPH 2B Damase
Keyboards: Xenotypetech

Glagolitic

Fonts: Dilyana, MPH 2B Damase
Keyboards: Redlers.com

Coptic

Fonts: ALPHABETUM Unicode, Code2000, MPH 2B Damase, New Athena Unicode
Keyboards: Coptic2005

Tifinagh

Fonts: Code2000, Hapax Berbère, Hapax Touareg, Hapax Touareg DàG, MPH 2B Damase
Keyboards: Tifinagh.keylayout

Syloti Nagri

Fonts: MPH 2B Damase
Keyboards: none

Old Persian

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase
Keyboards: OPersian

Kharoshthi

Fonts: ALPHABETUM Unicode, MPH 2B Damase
Keyboards: none

New Tai Lue

Fonts: none
Keyboards: none

Unicode 4.0 Scripts

Unicode 4.0, released in the Spring of 2003, included 5 new scripts. Here is the current status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Limbu

Fonts: Code2000, MPH 2B Damase
Keyboards: Xenotypetech


Tai Le

Fonts: Fixedsys Excelsior, MPH 2B Damase, Tai Le Valentinium
Keyboards: Xenotypetech


Linear B

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase, Penuturesu
Keyboards: LinearB.keylayout


Cypriot

Fonts: ALPHABETUM Unicode, Code2001, MPH 2B Damase
Keyboards: Cypriot.keylayout

Ugaritic

Fonts: ALPHABETUM Unicode, Andagii, Code2001, MPH 2B Damase
Keyboards: Ugaritic.keylayout

Osmanya

Fonts: Andagii, Code2001, MPH 2B Damase
Keyboards: Xenotypetech

Shavian

Fonts: Andagii, Code2001, MPH 2B Damase
Keyboards: Shavian.keylayout

Unicode 3.2 Scripts

Unicode 3.2, released in March 2002, included 4 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Tagalog

Fonts: Baybayin Lopez, Bikol Mintz, Bisaya Hervas, Fixedsys Excelsior, Tagalog Doctrina 1593, Tagalog Stylized
Keyboards: none


Hanunoo

Fonts: MPH 2B Damase
Keyboards: none


Buhid

Fonts: Code2000
Keyboards: none


Tagbanwa

Fonts: Tagbanwa Font
Keyboards: Tagbanwa.keylayout

Unicode 3.1 Scripts

Unicode 3.1, released in March 2001, included 3 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Old Italic

Fonts: ALPHABETUM Unicode, Cardo, Code2001, MPH 2B Damase
Keyboards: Redlers.com


Gothic

Fonts: ALPHABETUM Unicode, Cardo, Code2001, MPH 2B Damase, Vulcanius
Keyboards: Gothic.keylayout


Deseret

Fonts: Code2001, MPH 2B Damase, Apple Symbols
Keyboards: None

Unicode 3.0 Scripts

Unicode 3.0, released in September 1999, included 10 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Syriac

Fonts: Beth Mardutho
Keyboards: Pormann and AramaicNT


Thaana

Fonts: Code2000, Free Serif, MPH 2B Damase, MV Boli, Mv Elaaf, Mv GroupX Avas, Mv Iyyu, Mv Lady Luck, Mv MAG Round, Mv Sega, Thaana Unicode Akeh, TITUS Cyberbit Basic
Keyboards: Quinon


Sinhala

Fonts: Xenotypetech
Keyboards: Xenotypetech


Myanmar

Fonts: Xenotypetech
Keyboards: Xenotypetech

Ethiopic

Fonts: SIL
Keyboards: SIL

Ogham

Fonts: ALPHABETUM Unicode, Caslon, Code2000, Everson Mono Unicode, Fixedsys Excelsior, TITUS Cyberbit Basic, Beith-Luis-Nion, Beth-Luis-Nion, Cog, Craobh Ruadh, Crosta, Everson Mono Ogham, Maigh Nuad, Pollach, Ragnarok Ogham, TITUS Ogham
Keyboards: Evertype

Runic

Fonts: ALPHABETUM Unicode, Cardo, Caslon, Chrysanthi Unicode, Code2000, Everson Mono Unicode, Fixedsys Excelsior, Free Monospaced, Hnias, Junicode, TITUS Cyberbit Basic, Junicode
Keyboards: Thomaswebb Rune Keyboard

Khmer

Fonts: Xenotypetech
Keyboards: Xenotypetech

Mongolian

Fonts: Code2000, NSimSun-18030, SimSun-18030, STFangsong, STHeiti, STKaiti, STSong
Keyboards: Manchu Keyboard

Yi

Fonts: Code2000, NSimSun-18030, SIL Yi, SimSun-18030, STFangsong, STHeiti, STKaiti, STSong
Keyboards: yi16.txt.dat

Unicode 1.0 and 2.0 Scripts

Unicode 1.0, released in June 1993, and 2.0, released in July 1996, included 8 scripts not currently part of OS X. Here is the status of their usability on the Mac as far as I know. Where there is a font but no keyboards, characters can of course still be entered directly using the Character Palette.

Bengali

Fonts: Ekushey
Keyboards: Ekushey


Oriya

Fonts: None
Keyboards: None


Telugu

Fonts: Nick Shanks and Xenotypetech
Keyboards: Xenotypetech and Telugu.keylayout.


Kannada

Fonts: Nick Shanks and Xenotypetech
Keyboards: Nick Shanks and Xenotypetech

Malayam

Fonts: Xenotypetech
Keyboards: Xenotypetech, and here.

Lao

Fonts: Alice0–Alice5, Arial Unicode MS, Code2000, JG Basic Lao, JG Chantabouli Lao, JG Lao Old Arial, JG Lao Oldface, JG LaoTimes, Lao Unicode, Phetsarath OT, Saysettha OT, Saysettha Unicode, VanVieng Unicode, XiengThong Unicode
Keyboards: lao.keylayout

Georgian

Fonts: Arial Unicode MS, BPG Classic 99U, BPG Paata Khutsuri U, Code2000, Everson Mono Unicode, MPH 2B Damase, Sylfaen, TITUS Cyberbit Basic
Keyboards: Apple Georgia and Quinon

Tibetan

Fonts: Xenotypetech
Keyboards: Xenotypetech, and here.

Missing Scripts

Below is a list of the scripts included in Unicode 5.0 but not yet part of the fonts/keyboards supplied with OS X 10.4. In many cases these scripts can nonetheless be used on the Mac by downloading or purchasing components from the Internet.

N'ko, Phoenician, Balinese, Phags-Pa, Sumero-Akkadian Cuneiform, Buginese, Glagolitic, Coptic, Tifinagh, Syloti Nagri, Old Persian, Kharoshthi, New Tai Lue, Limbu, Tai Le, Linear B, Cypriot, Ugaritic, Osmanya, Shavian, Tagalog, Hanunoo, Buhid, Tagbanwa, Old Italic, Gothic, Deseret, Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Ogham, Runic, Khmer, Mongolian, Yi, Bengali, Oriya, Telugu, Kannada, Malayam, Lao, Georgian, Tibetan.

Apple Internationalization Status (2006)

The OS X user interface supports the following language localizations for menus and dialogues: English, Japanese, French, German, Spanish, Italian, Dutch, Swedish, Danish, Norwegian, Finnish, Traditional Chinese, Simplified Chinese, Korean, and Brazilian Portuguese. Russian is also available for download from apple.ru.

OS X display can handle any language covered by Unicode and for which an appropriate font has been installed, although individual apps may have lesser capabilities. Apple itself provides input keyboards and fonts for Arabic, Azeri, Armenian, Bulgarian, Byelorussian, Catalan, Cherokee, Chinese (simplified and traditional), Croatian, Czech, Danish, Dari, Devanagari, Dutch, English, Estonian, Faroese, Finnish, French, German, Greek (regular and polytonic), Gujarati, Gurmurkhi (Punjabi), Hawaiian, Hebrew, Hungarian, Icelandic, Inuktitut, Irish, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Maori, Nepali, Northern Sami, Norwegian, Pashto, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Uzbek, Vietnamese, and Welsh.

The iPod user interface supports Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Simplified Chinese, Spanish, Swedish, Traditional Chinese and Turkish.

iPod display of song info and notes covers the interface languages plus Bulgarian, Croatian, Romanian, Serbian, Slovak, Slovenian and Ukrainian.