Friday, December 29, 2006

????'s When You Post in a Forum With Safari?

Someone in the Apple discussions asked why his Greek and Cyrillic postings in another forum turned into question marks when he used Safari but displayed correctly when he used FireFox.

This occurs because some forums have the encoding of their web pages set to Latin-1 even though it is understood that members will post in languages that cannot be covered by that charset. When that is the case, a non-Latin character has to be converted into a "Numerical Character Reference (NCR)" escape code. For example, Greek Alpha α becomes &#945, where the number is the Unicode code point (decimal) for the character. This is essentially an html kludge developed for the situation years ago when computer and internet technology was so limited that the only safe way to display non-Latin characters was to convert them to such ASCII-only codes.

It happens that FireFox, when faced with a page where the encoding is totally wrong for the characters being input, will automatically produce these NCR's instead of the real character. Safari does not do that, so what it puts into the forum appears as question marks when viewed as Latin-1.

If you must use Safari, download/install UnicodeChecker, set its Preferences/XHTML to Decimal, and use it to convert your non-Latin text into NCR's before posting. This can be done by selecting the text and going to Safari > Services > Unicode > Unicode to HTML Entities.

Forums intended to accommodate languages beyond English and those of W. Europe should, if at all possible, have the encoding UTF-8 and not Latin-1. The Apple forums themselves are a good example of the correct approach, where Safari works perfectly to input any language you want.

No comments: