UTF-8/NLS dilemma

H.Merijn Brand h.m.brand at xs4all.nl
Thu Nov 23 10:12:15 EET 2006


On Thu, 23 Nov 2006 03:29:08 +0100, Miloslav Trmac <mitr at volny.cz> wrote:

> Per Hedeland napsal(a):
> >> While this adds a "feature" to tcsh, you are lying to the shell about
> >> its environment.
> > I don't think so - the *shell's* environment is US-ASCII, a.k.a. LANG=C
> > a.k.a. the default.
> No, it isn't.  The "environment" includes the set of allowed characters
> and their encoding.  If you send other characters to the shell, the
> shell has incorrect information about the environment.
> 
> > It has no business second-guessing why I typed
> > something that was octal 326
> tcsh needs to correctly display the character and move the cursor over
> it (even if the character is double-width).  This can't be done without
> interpreting the bytes as characters in a specific encoding.
> 
> >> But really, if you want the "beep" feature, the straightforward way to
> >> get it is to use the real locale settings, add a "beep" operation and
> >> bind the keys to it.  Handling invalid bytes by beeping IMHO belongs in
> >> the "bug-for-bug compatibility" category.
> > Nope - handling 8-bit characters as meta-commands was in tcsh long
> > before all this NLS nonsense!:-)
> Handling meta-commands as ESC x is in tcsh long as well.
> 
> Environments, assumptions and requirements change; it is not sufficient
> to be 8-bit clean nowadays, tcsh must be multibyte-aware and handle
> multiple-column characters.

Vital for me in certain environments. One of the reasons why I have the
latest tcsh in production environment.

> The assumption that 0x80 | x is a meta command or an extended character
> that can be just output to the tty is no longer valid.

Sooo right :)

> The requirement to handle e.g. multibyte characters can be fulfilled
> only by assuming LC_CTYPE is correctly set and libc supports the used
> encoding, or by reimplementing the necessary libc functionality (which
> is rather impractical, considering that glibc currently supports 240
> different character encodings).

It is not only $LANG and/or $LC_ALL. It also depends on the font you chose.
It is very confusing to have a iso8859-1 xterm with iso-10646-1 font or vise
versa. Things start to fail.

Maybe somewhere somehow there should be a check on wide-char enabled tcsh
running on utf8 enabled xterm with iso10646 (or other utf8 enabled) font.

I've chosen for the static library approach,

-- 
H.Merijn Brand         Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using & porting perl 5.6.2, 5.8.x, 5.9.x   on HP-UX 10.20, 11.00, 11.11,
& 11.23, SuSE 10.0 & 10.1, AIX 4.3 & 5.2, and Cygwin. http://qa.perl.org
http://mirrors.develooper.com/hpux/            http://www.test-smoke.org
                        http://www.goldmark.org/jeff/stupid-disclaimers/



More information about the Tcsh mailing list