[Up] [Previous]

This page contains Chinese characters specified using UTF-8 encoding, therefore it may not display correctly if you do not have suitable fonts installed. Also, it includes an example of vertical text; this will only display correctly in Internet Explorer as of this writing, as this is a proposed feature of HTML that has not yet become a part of the official standard.

Chinese Language Support

Now, I am going to tackle one of the more difficult areas of web page design.

Between the

<html>

and

</head>

commands on your web page, you can place a command like:

<meta http-equiv="Content-Type" content="text/html; charset=big5">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<meta http-equiv="Content-Type" content="text/html; charset=shift-jis">
<meta http-equiv="Content-Type" content="text/html; charset=euc-kr">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This indicates that pairs - or, in the case of UTF-8, larger groups - of 8-bit characters are in fact used for indicating a single character, in order to allow a larger character repertoire.

In addition to Big 5, used in Taiwan and Hong Kong for traditional Chinese, and GB-2312 used in the People's Republic of China for simplified Chinese, Shift-JIS, used for Japanese, and EUC-Korea, used for Korean, are listed here, as well as UTF-8, which serves for the entire Unicode character repertoire.

This is the first step. And, it may be enough in many cases - given that one can suitably prepare a web page in the character coding scheme specified. In the case of UTF-8, Notepad, included with Microsoft Windows, is now able to handle special characters and save files in UTF-8 coding. For both Big 5 and GB-2312, there are utilities for Windows and even DOS.


But it may be desired to get fancy.

If one is using UTF-8 as one's character set, and one's intent is to display Chinese text on one's web site, an additional complication arises.

Some simplified Chinese characters are different characters from their traditional Chinese counterparts; for example, 個 and 个 are the two versions of the classifier rendered as "piece" in Chinese-English pigdin.

In other cases, though, a simplified Chinese character and a traditional Chinese character only differ in how a part of the character is written. In this case, I had thought that those two characters are given the same codepoint, and are treated as stylistic variants.

However, it seems that I am mistaken. For example, 訃 is U+8A03, while 讣 is U+8BA3. There are, though, some fonts that display simplified characters as their traditional equivalents, and vice-versa, and the technique I am about to describe might at least avoid having those fonts used by accident, if they include proper internal labelling.

While pairs where a component is written in a significantly different way are not unified in Unicode, such as 語 and 语, I have now found out that there are cases where a minor factor, such as the overshoot of a stroke, is visibly different between traditional and simplified characters, or, for that matter, in Japanese kanji, and the technique I will be describing I had hoped would deal with that; such as in and , to use an example from an excellent overview of the issue by Richard Ishida... but, as you probably can see, the use of different fonts for the two versions of the same character was not triggered by the use of the lang attribute; at least, it wasn't in any of my browsers.

One way to ensure a character displays as intended would be to specify which font is used. However, this only works if the person viewing the page has exactly that font.

There is a way, however, to be more specific. One can use the span command in HTML to control which character repertoire is used.

Thus, place

<span lang="zh_CN"> ... </span>

around text you wish to display in simplified characters, and

<span lang="zh_TW"> ... </span>

around text that you wish to display in traditional characters.


And now we're going to get really fancy.

學而時習之,不亦說乎?有朋自遠方來,不亦樂乎?人不知而不慍,不亦君子乎?

If your browser is Internet Explorer, the text above, the first paragraph of the Analects of Confucius, will have displayed in the traditional Chinese fashion of vertical columns going from right to left.

However, I noted one problem. I had to choose the height of the table cell carefully, because it will break lines one character early to avoid making a punctuation mark the first character of a column.

This was achieved by using the span command, along with some CSS markup.

Between the /head and body tags, I placed the following code:

<style>
<!--
 .test {writing-mode:tb-rl; font-size:24pt; height:7cm}
-->
</style>

The height was specified in centimeters because it was advised by my source for this technique that there apparently are problems if other units are used.

Then, I placed the text I wished to have render vertically inside a cell within a table, and then within div tags:

<table>
<tr>
<td>
<span class="test">
學而時習之,不亦說乎?有朋自遠方來,不亦樂乎?人不知而不慍,不亦君子乎?
</span>
</td>
</tr>
</table>

The source for this technique did not use the span command, but instead illustrated another feature of CSS. In addition to defining styles for HTML elements, like p, pre, h1, and so on, or for classes, .yourname, which apply to whatever is within <span class="yourname"> ... </span> tags. one can define a style for p.yourname, which then will only apply to those paragraphs which begin with <p class="yourname"> instead of just <p> - and, of course, other attributes besides the class can apply to the paragraph in both cases.


[Up] [Previous]