Internet Exchange Documents
White Papers
An Introduction to Japanese Language Support Using Internet Exchange for cc:Mail (Version 2.x)
p>Introduction
From version 2.03beta3 onwards, Internet Exchange for cc:Mail includes
Japanese language support. This paper summarises the changes that have been effected in
successive releases, and describes the status at the stage of the final release of Version
2.11.
Operating System Requirements
In order to run Internet Exchange with Japanese language support one of
the following settings has to be chosen.
For DB6 Post Office:
- English Windows 3.1 + Unionway + English VIM 2.x or 6.x
- English Windows 95 + Unionway + English VIM 2.x or 6.x
- Japanese Windows 3.1x / 95 / NT 3.5x / NT 4.0 + Japanese VIM 2.x or 6.x
For DB8 Post Office:
- Japanese Windows 3.1 / 95 / NT 3.5x / NT 4.0 + Japanese VIM 6.x
UnionWay's AsianSuite software is a suite of multilingual Windows applications which
run on Microsoft Windows products and offer simultaneous support for Chinese, Japanese,
and Korean: more information on Unionway may be found at the Unionway URL: http://www.unionway.com
Japanese Character Set Standards
There are three basic Japanese encoding methods: JIS, Shifted-JIS and EUC.
JIS encoding is typically used for electronic transmission, Shifted-JIS encoding on MS-DOS
based machines, and EUC encoding on UNIX-based machines. Therefore, Internet Exchange only
needs to support the conversion between JIS and Shifted-JIS.
The standard for JIS encoding is known as ISO-2022-JP and handles several types of
character sets. For text that contains embedded Japanese characters, the encoding of the
Japanese characters will start through a special escape sequence upon detection of the
first Japanese character. To switch back to ASCII, another escape sequence is used again
to signal the end of the sequence of Japanese characters. The following table shows the
escape sequences and the corresponding character sets.
Escape Sequence |
Character Set |
ESC(B |
ASCII |
ESC(J |
JIS X 0201-1976 ("Roman" set) |
ESC$@ |
JIS X 0208-1978 |
ESC$B |
JIS X 0208-1983 |
JIS X 0212-1990 is the newest character set and it has
two more characters than the older character sets. However, it is recommended not to make
use of this new character set in ISO-2022-JP text messages. Moreover, current Shifted-JIS
encoding does not support the new characters defined in JIS X 0212-1990. The Roman
character set of JIS X 0201-1976 is the same as ASCII except for backslash and tilde. The
backslash is replaced by the Yen sign, and the tilde is replaced by overline. The JIS X
0208 character set consists of Kanji, Hiragana, Katakana, and some other symbols and
characters, all of which are 7-bit. Each character takes two bytes. Shifted-JIS encoding
consists of 8-bit characters and no escape sequences are involved in switching between
ASCII and Japanese. Both JIS and Shifted-JIS support half-width katakana, which is not
frequently used.
Conversion criteria
In order to support Japanese language, Internet Exchange for cc:Mail
converts between the Shift-JIS character set used in Japanese Post Offices (Codepage 932)
and the ISO-2022-JP used on the Internet, as recommended by RFC1468.
Limitations of the Conversions
From cc:Mail to the Internet
Messages originating from a Japanese cc:Mail Post Office are Shifted-JIS
encoded when they are composed in cc:Mail. Therefore, a conversion from Shifted-JIS to JIS
is required in order for the message to be transmitted through the Internet. If a Japanese
message created in cc:Mail contains half-width Katakana, this will be changed to
full-width Katakana before the full conversion takes place. Since not all 8-bit characters
are Shifted-JIS encoded Japanese, they will be checked against the valid range of the
Shifted-JIS encoding. However some characters (for example, some European characters)
share the same coding range of Shifted-JIS encoding. In that case, those overlapping
encoded characters may be mistakenly interpreted as Japanese. If any 8-bit non-Japanese
characters are detected in a message, either they will be copied out for further
processing or some specified error messages will be issued. In order to achieve a unique
standard, all the Shifted-JIS encoded messages are converted into JIS encoded messages
having JIS X 0208-1983 as their character set. Therefore, only the escape sequence of ESC
$ B and ESC ( J are seen in the properly encoded outgoing messages.
From the Internet to cc:Mail
All messages must be 7-bit when they are transmitted through the Internet.
Therefore, a conversion from JIS to Shifted-JIS is required in order for the message to be
transported into cc:Mail for proper reading. The JIS encoded messages are checked to see
whether they are JIS X 0208-1978 or JIS X 0208-1983 by examining the escape sequences.
Then all the escape sequences are removed and the messages are converted into proper
Shifted-JIS encoded messages. The function will also check if there are any incomplete
escape sequences indicating the existence of JIS-encoded characters. Since every line must
end in ASCII with a carriage return and newline character, scanning for incomplete escape
sequences is done on a per-line basis. If "$ B" and "( J" or "$
@" and "( J" are detected at the same time within a line, the text between
the pair is treated as JIS encoded Japanese and converted into Shifted-JIS encoded
Japanese. The broken escape sequence is removed. So if it happens that the message is
intended to have "$ B" and "( J" in the same line as ASCII characters
only, the text is erroneously treated as Japanese.
Outbound (cc:Mail->Internet) messages
If "ISO-2022-JP" is selected in the gateway setup as the local
character set, the gateway assumes that all the messages in the Internet queue are encoded
in a mixture of ASCII and Shift-JIS. Codes higher than 128 trigger a Shift-JIS to
ISO-2022-JP conversion, instead of the QP encoding used when the character set is ASCII or
ISO-8859-X. NOTE: This implies that it is not possible to export from the
same Internet PO mixes of messages containing Japanese characters and messages containing
ISO-8859-X characters. At present the only solution for mixed Japanese/European
environments is the use of two gateways serving one Internet PO each (say, INTERNET-I for
messages whose 8-bit codes should be QP-encoded, and INTERNET-J for messages whose 8-bit
codes should be transcoded into ISO-2022-JP 7-bit JIS).
Invalid Shift-JIS codes can be removed from the outbound message with the option:
InvalidCharacterMark=
or they can be mapped to an arbitrary string (e.g.,*INVALID*) with:
InvalidCharacterMark=*INVALID*
Inbound (Internet->cc:Mail) messages
MIME messages are converted from 7-bit JIS to Shift-JIS if they bear the
"charset=ISO-2022-JP" parameter in the "Content-type" header, and if
under the [Options] section of IMA.INI there is the line:
ConvertJISinMIME=YES
The default is YES if the local character set is ISO-2022-JP, NO otherwise. Non-MIME
inbound messages will be converted only if the file IMA.INI (in the windows directory)
contains, under the [Options] paragraph, the line:
ScanForJIS=YES
The default value for this option is YES if the local character set is ISO-2022-JP; NO
if the case is otherwise.
In addition, if the line
RepairDamagedJIS=YES
is present, the gateway will attempt to recover JIS messages without ESC characters
(some mail transports, unfortunately, filter them out). This feature uses heuristic
criteria, and might produce incorrect results in certain cases. This option is applicable
to both MIME as well as non-MIME message types.
Headers
The Shift-JIS<->ISO-2022-JP conversion also affects the headers. Should this
cause problems, it is possible to disable it (in both directions) by adding to IMA.INI,
under the [Options] section, the following line:
DisableJISinHeaders=YES
The default value for this option is NO if the local character set is ISO-2022-JP; YES
otherwise.
By default, the automatic recovery of ISO-2022-JP escape sequences lacking the escape
character will not be attempted. It is possible to enable this feature by adding to
IMA.INI, under the [Options] section, the following line:
RepairDamagedJISinHeaders=YES
Using Japanese VIM
The recommended platform for Japanese language support is based on the
Japanese VIM library and Japanese Microsoft Windows (3.1, 95 or NT). The Japanese VIM is
not available for download, but can be put in place installing the Japanese version of
Lotus cc:Mail. An update kit containing the latest version of some of the DLL's for VIM
2.x can be downloaded from:
http://www.lotus.co.jp/ftpmodul/214e.htm
Japanese cc:Mail Client version 6.x may be purchased from Lotus.
The local Post Office should be created by Japanese ADMIN
IMPORTANT: in order to use the Japanese VIM, the line:
VIMCharSet=CP932
must be added to IMA.INI under the [Gateway] section.
Using English VIM 2.x
A kit for the latest version 2.21 is available free from Lotus's ftp site:
ftp://ftp.ccmail.com/pub/comm/ccmail/dev_tools/vdlwin.zip
This version of VIM has been found to handle Japanese characters only on the following
platforms:
- Japanese Windows 3.1
- English Windows 3.1 with Unionway add-on
- English Windows95 with Unionway add-on
We discourage this configuration, and recommend the use of the Japanese VIM instead.
Published: August 1997
|