International Messaging Associates

Internet Exchange Documents

An Introduction to Japanese Language Support Using Internet Exchange for cc:Mail (Version 2.x)

p>Introduction
From version 2.03beta3 onwards, Internet Exchange for cc:Mail includes Japanese language support. This paper summarises the changes that have been effected in successive releases, and describes the status at the stage of the final release of Version 2.11.

Operating System Requirements
In order to run Internet Exchange with Japanese language support one of the following settings has to be chosen.

For DB6 Post Office:

English Windows 3.1 + Unionway + English VIM 2.x or 6.x
English Windows 95 + Unionway + English VIM 2.x or 6.x
Japanese Windows 3.1x / 95 / NT 3.5x / NT 4.0 + Japanese VIM 2.x or 6.x

For DB8 Post Office:

Japanese Windows 3.1 / 95 / NT 3.5x / NT 4.0 + Japanese VIM 6.x

UnionWay's AsianSuite software is a suite of multilingual Windows applications which run on Microsoft Windows products and offer simultaneous support for Chinese, Japanese, and Korean: more information on Unionway may be found at the Unionway URL: http://www.unionway.com

Japanese Character Set Standards
There are three basic Japanese encoding methods: JIS, Shifted-JIS and EUC. JIS encoding is typically used for electronic transmission, Shifted-JIS encoding on MS-DOS based machines, and EUC encoding on UNIX-based machines. Therefore, Internet Exchange only needs to support the conversion between JIS and Shifted-JIS.

The standard for JIS encoding is known as ISO-2022-JP and handles several types of character sets. For text that contains embedded Japanese characters, the encoding of the Japanese characters will start through a special escape sequence upon detection of the first Japanese character. To switch back to ASCII, another escape sequence is used again to signal the end of the sequence of Japanese characters. The following table shows the escape sequences and the corresponding character sets.

Escape Sequence	Character Set
ESC(B	ASCII
ESC(J	JIS X 0201-1976 ("Roman" set)
ESC$@	JIS X 0208-1978
ESC$B	JIS X 0208-1983

JIS X 0212-1990 is the newest character set and it has two more characters than the older character sets. However, it is recommended not to make use of this new character set in ISO-2022-JP text messages. Moreover, current Shifted-JIS encoding does not support the new characters defined in JIS X 0212-1990. The Roman character set of JIS X 0201-1976 is the same as ASCII except for backslash and tilde. The backslash is replaced by the Yen sign, and the tilde is replaced by overline. The JIS X 0208 character set consists of Kanji, Hiragana, Katakana, and some other symbols and characters, all of which are 7-bit. Each character takes two bytes. Shifted-JIS encoding consists of 8-bit characters and no escape sequences are involved in switching between ASCII and Japanese. Both JIS and Shifted-JIS support half-width katakana, which is not frequently used.

Conversion criteria
In order to support Japanese language, Internet Exchange for cc:Mail converts between the Shift-JIS character set used in Japanese Post Offices (Codepage 932) and the ISO-2022-JP used on the Internet, as recommended by RFC1468.

Limitations of the Conversions

From cc:Mail to the Internet
Messages originating from a Japanese cc:Mail Post Office are Shifted-JIS encoded when they are composed in cc:Mail. Therefore, a conversion from Shifted-JIS to JIS is required in order for the message to be transmitted through the Internet. If a Japanese message created in cc:Mail contains half-width Katakana, this will be changed to full-width Katakana before the full conversion takes place. Since not all 8-bit characters are Shifted-JIS encoded Japanese, they will be checked against the valid range of the Shifted-JIS encoding. However some characters (for example, some European characters) share the same coding range of Shifted-JIS encoding. In that case, those overlapping encoded characters may be mistakenly interpreted as Japanese. If any 8-bit non-Japanese characters are detected in a message, either they will be copied out for further processing or some specified error messages will be issued. In order to achieve a unique standard, all the Shifted-JIS encoded messages are converted into JIS encoded messages having JIS X 0208-1983 as their character set. Therefore, only the escape sequence of ESC $ B and ESC ( J are seen in the properly encoded outgoing messages.

From the Internet to cc:Mail
All messages must be 7-bit when they are transmitted through the Internet. Therefore, a conversion from JIS to Shifted-JIS is required in order for the message to be transported into cc:Mail for proper reading. The JIS encoded messages are checked to see whether they are JIS X 0208-1978 or JIS X 0208-1983 by examining the escape sequences. Then all the escape sequences are removed and the messages are converted into proper Shifted-JIS encoded messages. The function will also check if there are any incomplete escape sequences indicating the existence of JIS-encoded characters. Since every line must end in ASCII with a carriage return and newline character, scanning for incomplete escape sequences is done on a per-line basis. If "$ B" and "( J" or "$ @" and "( J" are detected at the same time within a line, the text between the pair is treated as JIS encoded Japanese and converted into Shifted-JIS encoded Japanese. The broken escape sequence is removed. So if it happens that the message is intended to have "$ B" and "( J" in the same line as ASCII characters only, the text is erroneously treated as Japanese.

Outbound (cc:Mail->Internet) messages
If "ISO-2022-JP" is selected in the gateway setup as the local character set, the gateway assumes that all the messages in the Internet queue are encoded in a mixture of ASCII and Shift-JIS. Codes higher than 128 trigger a Shift-JIS to ISO-2022-JP conversion, instead of the QP encoding used when the character set is ASCII or ISO-8859-X. NOTE: This implies that it is not possible to export from the same Internet PO mixes of messages containing Japanese characters and messages containing ISO-8859-X characters. At present the only solution for mixed Japanese/European environments is the use of two gateways serving one Internet PO each (say, INTERNET-I for messages whose 8-bit codes should be QP-encoded, and INTERNET-J for messages whose 8-bit codes should be transcoded into ISO-2022-JP 7-bit JIS).

Invalid Shift-JIS codes can be removed from the outbound message with the option:

InvalidCharacterMark=

or they can be mapped to an arbitrary string (e.g.,*INVALID*) with:

InvalidCharacterMark=*INVALID*

Inbound (Internet->cc:Mail) messages
MIME messages are converted from 7-bit JIS to Shift-JIS if they bear the "charset=ISO-2022-JP" parameter in the "Content-type" header, and if under the [Options] section of IMA.INI there is the line:

ConvertJISinMIME=YES

The default is YES if the local character set is ISO-2022-JP, NO otherwise. Non-MIME inbound messages will be converted only if the file IMA.INI (in the windows directory) contains, under the [Options] paragraph, the line:

ScanForJIS=YES

The default value for this option is YES if the local character set is ISO-2022-JP; NO if the case is otherwise.

In addition, if the line

RepairDamagedJIS=YES

is present, the gateway will attempt to recover JIS messages without ESC characters (some mail transports, unfortunately, filter them out). This feature uses heuristic criteria, and might produce incorrect results in certain cases. This option is applicable to both MIME as well as non-MIME message types.

Headers
The Shift-JIS<->ISO-2022-JP conversion also affects the headers. Should this cause problems, it is possible to disable it (in both directions) by adding to IMA.INI, under the [Options] section, the following line:

DisableJISinHeaders=YES

The default value for this option is NO if the local character set is ISO-2022-JP; YES otherwise.

By default, the automatic recovery of ISO-2022-JP escape sequences lacking the escape character will not be attempted. It is possible to enable this feature by adding to IMA.INI, under the [Options] section, the following line:

RepairDamagedJISinHeaders=YES

Using Japanese VIM
The recommended platform for Japanese language support is based on the Japanese VIM library and Japanese Microsoft Windows (3.1, 95 or NT). The Japanese VIM is not available for download, but can be put in place installing the Japanese version of Lotus cc:Mail. An update kit containing the latest version of some of the DLL's for VIM 2.x can be downloaded from:

http://www.lotus.co.jp/ftpmodul/214e.htm

Japanese cc:Mail Client version 6.x may be purchased from Lotus.
The local Post Office should be created by Japanese ADMIN

IMPORTANT: in order to use the Japanese VIM, the line:

VIMCharSet=CP932

must be added to IMA.INI under the [Gateway] section.

Using English VIM 2.x
A kit for the latest version 2.21 is available free from Lotus's ftp site:

ftp://ftp.ccmail.com/pub/comm/ccmail/dev_tools/vdlwin.zip

This version of VIM has been found to handle Japanese characters only on the following platforms:

Japanese Windows 3.1
English Windows 3.1 with Unionway add-on
English Windows95 with Unionway add-on

We discourage this configuration, and recommend the use of the Japanese VIM instead.

Published: August 1997