Re: Read utf-8 char one by one

From:

RedGrittyBrick <RedGrittyBrick@spamweary.invalid>

Newsgroups:

comp.lang.java.programmer

Date:

Thu, 28 Jan 2010 17:57:55 +0000

Message-ID:

<4b61d025$0$2524$da0feed9@news.zen.co.uk>

PLEASE DON'T TOP-POST, PLEASE PUT YOUR REPLY AT THE BOTTOM, BELOW ANY
QUOTED TEXT. THANKS!

moonhkt wrote:

RedGrittyBrick wrote:

moonhkt wrote:

Hi All I want output the Character in the string one by one.
Now,codePointAt just print the Code points value.

Why not use String's length() and CharAt() methods?

I assume you can disregard characters outside Unicode's Base
Multilingual Plane (BMP) - if not, I think you'll have to check for
surrogate pairs. Characters outside the BMP are too big for a char.

-------------------------------------8<-----------------------------------
public class UnicodeChars {
   public static void main(String[] args)
       throws UnsupportedEncodingException {

     // I want console output in UTF-8
     PrintStream sysout = new PrintStream(System.out, true, "UTF-8");

     // \u00fc is LATIN SMALL LETTER U WITH DIAERESIS;
     // \u34d7 is a character in CJK Unified Ideographs Extension A.
     // \uD834\uDD1E" are the surrogate pair for character U+1D11E.
     // U+1D11E is MUSICAL SYMBOL G CLEF;
     String a = "\u00fc\u34d7Welcome to Rose India \uD834\uDD1E.";

     int n = a.length();
     sysout.println("GIVEN STRING IS=" + a);
     sysout.printf("Length of string is %d%n", n);
     sysout.printf("CodePoints in string is %d%n",
         a.codePointCount(0,n));
     for (int i = 0; i < n; i++) {
       sysout.printf("Character[%d] is %s%n", i, a.charAt(i));
     }
   }}

-------------------------------------8<-----------------------------------
GIVEN STRING IS=?????Welcome to Rose India ????.
Length of string is 27
CodePoints in string is 26
Character[0] is ??
Character[1] is ???
Character[2] is W
Character[3] is e

[...]

Character[23] is
Character[24] is ?
Character[25] is ?
Character[26] is .

Yes. This is my want.

But my output is not same with you. You are correct.

Run in Jcreator 4.5 version

I am using Eclipse. To display UTF-8 encoded Unicode characters written
to the console, I had to configure Eclipse. Perhaps you need to
configure JCreator so that you can display Unicode characters?

GIVEN STRING IS=???????elcome to Rose India ??.
Length of string is 27
CodePoints in string is 26
Character[0] is ???
Character[1] is ??
Character[2] is W
Character[3] is e

[...]

Character[23] is
Character[24] is ?
Character[25] is ?
Character[26] is .

You used Google Groups to post. It seems Google Groups uses
quoted-printable to encode non-ASCII characters.
E.g. ==E7=BE=B9?=EE=A2=ADelcome ...
I find it hard to fathom how that sequence of octets was derived.
AFAIK \u00fc\uc3c should encode to octets c3 bc e3 93 97.
Perhaps Google Groups is hampering communications - As you seem to be a
user of Mozilla Firebird, have you tried using Mozilla Thunderbird to
read this newsgroup directly from your ISPs NNTP service?

I suspect your remaining problems are due to the configuration of
JCreator or your operating system.

--
RGB