Re: isupper and islower for wstring

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

microsoft.public.vc.mfc,microsoft.public.vc.language,comp.lang.c++

Date:

Fri, 10 Dec 2010 07:26:11 -0800 (PST)

Message-ID:

<29399c42-1684-4f7e-b997-7ff9cbc6252d@g26g2000vbi.googlegroups.com>

On Dec 10, 1:15 pm, Goran <goran.pu...@gmail.com> wrote:

On Dec 10, 11:43 am, James Kanze <james.ka...@gmail.com> wrote:

On Dec 9, 7:38 am, Rahul <rsharma.ch...@gmail.com> wrote:

I have a std::wstring and I want to find which character are upper
case and which ones are lowercase. the std::isupper and islower seems
to work on ASCII characters only but I want to be able to find out al=

kinds of uppercase and lowercase characters
e.g. =E1 is an "Latin small letter a with acute"
and =C1 is an "Latin capital letter A with acute"
Is there any function (mfc, boost or in any other library) which I ca=

use to find out the above said difference? My application is a native
VC++ program.

What's wrong with iswupper (in wctype.h)? (Like isupper, it is
locale dependent.) Or using the equivalent functionality in
<locale>?

If character is outside basic multilingual plane, how do you plan to
put it in a wint_t? (I don't know if case matters for languages
outside BMP, but why wouldn't it?).

That is a general problem with all such functions which take
a single code point (even in UTF-32, although it probably only
affects very, very few characters with UTF-32).

That's why I proposed GetStringType or ICU.

GetStringType seems to have a fairly complicated interface; I'm
not too sure about ICU. But you're right. And the complicated
interface is due to the fact that the problem itself is more
complicated than it might appear at first glance. (Somewhere
floating around, I've got code which implements the functions in
ctype for UTF-8. Obviously, it takes two iterators to bytes,
rather than a single int, as argument. And the actual tables it
uses are generated from the UnicodeData.txt file. But one of
the things I learned while doing it is that some obvious
definitions, like isupper, are far from obvious once you leave
the usual Western European conventions. And that it still isn't
really correct, because it ignores composed characters, and only
treats single code points.)

--
James Kanze