Note to UNIX xterm users: XFree86 4 includes Unicode fonts, and xterm now supports UTF-8. If you log in through xdm, put
LANG=en_US.UTF-8 export LANGat the top of your .xsession to turn on the UTF-8 support in xterm and several other programs. (You can use a locale other than en_US, of course, but make sure to add the UTF_8.) You may also have to change
xtermto
xterm -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1if your system's fonts have not been set up correctly. Some text-mode programs need their own configuration: for example, lynx has a separate ``Display character set'' option that has to be set to UTF-8.
See Markus Kuhn's UTF-8 and Unicode FAQ for more information.
The HTML source for the first pi is the two-byte UTF-8 sequence \317\200; the head of this page says
<meta http-equiv="content-type" content="text/html; charset=utf-8">to specify UTF-8 as the character set. The HTML source for the second pi is π.
If you see a double-dotted I and a block (or Euro) instead of a pi, your browser (or terminal) is misinterpreting bytes as ISO 8859-1 instead of UTF-8.
Similarly, you should be able to paste the address bounce@π.cr.yp.to into your mail client, and compose mail involving that address, with the pi visible everywhere.
Summary of results reported so far:
Client | OS | Results |
---|---|---|
CDE Mail | Solaris 8 | Bad displays, 8859-1 everywhere |
KMail 1.2 | Linux | Bad displays, 8859-1 everywhere |
Netscape 4.75 | MacOS 8.5 | Mostly correct displays; 8859-1 in location bar |
Netscape 4.75 | Digital UNIX 4.0d | Mostly correct displays; 8859-1 in location bar |
Netscape 4.78 | Win98 | Mostly correct displays; 8859-1 in location bar |
Eudora 5.1 | WinME | Bad displays, 8859-1 everywhere |
Explorer 5 | MacOS 8.5 | Mostly correct displays; 8859-1 in location bar and input bar |
Opera 5.00 | Linux | Bad displays |
Opera 5.12 | Win98 | Bad displays |
Opera 5.12 | Win2000 | Bad displays |
Your browser should also be able to follow links to a host name that uses the π pi, by converting π to UTF-8: http://π.cr.yp.to and http://π.cr.yp.to/index.html.
Summary of results reported so far:
Browser | OS | Results |
---|---|---|
links 0.95 | Solaris 6 | Failure: BIND bug |
lynx 2.8.3rel.1 | Solaris 6 | Failure: BIND bug |
Netscape 4.75 | MacOS 8.5 | Success |
Netscape 4.75 | Solaris 6 | Failure: BIND bug |
Netscape 4.75 | Solaris 8 | Failure: BIND bug |
Netscape 4.75 | Digital UNIX 4.0d | Success |
Netscape 4.77 | Linux | Failure: BIND bug |
Netscape 4.78 | Win98 | Success |
Netscape 4.78 | Win2000P SP2 | Failure: internal bug? |
Mozilla 2001062815 | Win98 | Failure: internal bug? |
Explorer 5 | MacOS 8.5 | Success |
Explorer 5.1 | MacOS X | Success |
Explorer 5.50 | Win98 | Failure: \357 bug |
Explorer 5.50 | Win2000 | Failure: \357 bug |
Explorer 5.50 | Win2000P SP2 | Failure: \303 bug |
Konqueror 2.1.1 | Linux | Failure: internal bug? |
Opera 5.12 | Win98 | Failure: internal bug? |
Opera 5.12 | Win2000 | Failure: internal bug? |
24 \317\200.cr.yp.to (correct) 4 \045cf\04580.cr.yp.to (% encoding, not valid in DNS) 4 \357\200.cr.yp.to (IE \357 bug) 6 \303\257\342\202\254.cr.yp.to (IE \303 bug) 2 \357\276\217\302\200.cr.yp.to (huh?) 1 \201.cr.yp.to (huh?)
The following bug needs to be fixed: The UNIX BIND gethostbyname/res_*/dn_* client library deliberately rejects names with non-ASCII characters. Impact of this bug: Programs using the BIND client library incorrectly believe that π.cr.yp.to doesn't exist.
With some BIND versions you can work around this by putting
options allow_special allor
options no-check-namesinto /etc/resolv.conf. I've asked the BIND company to make this the default, but they've refused; apparently they don't care about international users.
In contrast, my djbdns client library has no problems with non-ASCII characters.
You can also try djb-pibounce-π@cr.yp.to. This avoids DNS issues, such as the BIND bug described above.
Summary of MUA results reported so far:
Software | OS | Results |
---|---|---|
VM 6.43 | Linux | Success |
CDE Mail | Solaris 8 | Unclear report: success? |
Netscape 4.75 | MacOS 8.5 | Unclear report: success? |
Netscape 4.76 | Linux | Success |
KMail 1.2 | Linux | Unclear report: success? |
Mozilla | Windows | Failure: converts address to quoted-printable |
Eudora 5.1 | Windows | Success |
Outlook Express 5 | MacOS 8.5 | Failure: name rejected internally |
Outlook Express 5 | Windows | Failure: converts \317\200 to ASCII p when address is pasted |
The following bug needs to be fixed: The UNIX sendmail program throws away bytes \200 through \237 on input, because it uses those bytes for internal macros. Impact of this bug: sendmail will corrupt the address, changing \317\200 to \317.
I suggested to Eric Allman in February 1999 that he convert \200 to \377\240, ..., \237 to \377\277, and \377 to \377\377, and do the opposite conversion on output. He ignored the suggestion. Apparently, like the BIND company, he doesn't care about international users.
+π.cr.yp.to:131.193.178.181 @π.cr.yp.to:131.193.178.181to /etc/tinydns/root/data. My DNS software, djbdns, handles UTF-8 names without trouble.
Your DNS software should let you create UTF-8 domain names.
π.cr.yp.toto /var/qmail/control/rcpthosts. I directed the mail appropriately by adding a similar entry to /var/qmail/control/virtualdomains. My mail software, qmail, handles UTF-8 names without trouble.
Your MTA should let you accept mail for UTF-8 domain names.