Re: New rules for literal characters in source code?

From:

=?ISO-8859-1?Q?Arne_Vajh=F8j?= <arne@vajhoej.dk>

Newsgroups:

comp.lang.java.programmer

Date:

Sat, 19 Feb 2011 16:49:55 -0500

Message-ID:

<4d603aff$0$23752$14726298@news.sunsite.dk>

On 19-02-2011 16:36, BGB wrote:

On 2/19/2011 1:30 PM, Ian Pilcher wrote:

On 02/19/2011 02:23 PM, Stefan Ram wrote:

When you use a Windows-1252 editor to edit Java source and
then the Java process prints it to a Windows CP-850 console,
umlauts, like ???, will not be rendered correctly, because
the process will print the character ??? that has the code
in CP 850 that ??? has in Windows 1252.

Windows still isn't using UTF-8?

most things in Windows are done 1 of 2 ways:
using ASCII and codepages;
using UTF-16.

granted, it wouldn't likely be all that difficult to write a UTF-8 ->
UTF-16 console printer, but it will involve the relevant parts of the
Win32 API.

so, the issue may not be so much Windows, but more what the particular
JVM does regarding console output.

most likely, it does the least effort thing, which is to directly emit
bytes, which in turn means ASCII.

if it really matters, there is always JNI and the ability to overload
the PrintStream class...

There are also UTF-8 support.

Even notepad can read and write UTF-8.

But the console is special. MS wanted it to be DOS compatible.
So it is typical CP-437 or CP-850.

Arne