National languages support

The ASCII set of 128 characters includes upper-case and lower-case letters of the English alphabet, numbers, punctuation, and 33 control codes. Numerous extensions to ASCII have been devised and quite a few have become national or international standards. Notable among them is a family of international standards, ISO-8859, that defines extensions appropriate to certain language groups.

Unicode version 2 uses 16 bits instead 7 or 8 bits of ASCII or extended ASCII. The  character repertory and the codes assigned in Unicode are identical to those specified by ISO 10646, the international Universal Character Set (UCS) standard. 

If wml deck contains text in some different languages, specify utf-8 encoding, for instance:

<?xml version="1.0" encoding="utf-8"?>

In this sample content encoding specifies a particular Unicode encoding of wml deck. It allows use any unicode characters. Otherwise, you can indicate one of non- unicode encodings, for instance:

<?xml version="1.0" encoding="iso-8859-2"?>

In this case you are limited by iso-8859-2 specification. The editor displays Unicode characters well so there are no reasons to use some iso-8859 international standards instead more flexible utf-8 encoding. If you prefer to use a particular code page, please see list of  Supported code pages. In the bottom- right edge of the code editor window you can see current wml deck encoding, as shown below. 

While you compile wml deck to wmlc, specified in wml deck encoding (more strictly, xml description tag <?xml>) is used. If you do not set up encoding attribute, unknown encoding is used. But the editor must to know what encoding will be used. 

To change default character set encoding (unknown), select Options - Preferences menu. The Preferences dialog appears. Select General tab, select one of characters set as shown below.

Note the editor font must supports national symbols of languages you uses. Therefore do not change default Courier New font in editor preferences to non- true type font like Courier. Windows true- type fonts Times New Roman, Courier New and some others can contain different sets of unicode characters depending what localization of the Widows do you have. If you use just ANSI and OEM characters set (depending on system localization) and no others, you can use any fonts like FixedSys, MS SansSerif.

Some elements like go element uses accept-charset attribute, for instance: 

<go accept-charset="iso-8859-2" ...>

In this sample attribute accept-charset specifies the list of character encodings for data that the web server must accept when processing input. The default value is
unknown. The user agent uses the character encoding that was used to transmit the WML deck containing this attribute.

In addition to unicode encoding support feature in xml, wml implements internal national- language support mechanism. Following this way, you can create different parts of the wml deck in different character set using xml:lang attribute. To do this, set up xml:lang  attributes of wml elements to language shortcut, for instance, en-us for english- american.

To simplify entering most useful language and country abbreviation in the Attribute window, you can select them from list of countries and languages supports by your version of Windows in the Preferences dialog window:

Select tab Languages support and mark up languages do you want to use in abbreviations of xml:lang attribute. Then select element in Element tree window, click on cobo box of xml:lang attribute in Attributes window and select abbreviation as shown below:


See also: Supported code pages Automatic document conversion