Re: Problem with using char* to return string by reference

From:

"Igor Tandetnik" <itandetnik@mvps.org>

Newsgroups:

microsoft.public.vc.language

Date:

Mon, 16 Jun 2008 08:11:41 -0400

Message-ID:

<OJC3no6zIHA.5108@TK2MSFTNGP05.phx.gbl>

"Hendrik Schober" <Spamtrap@gmx.de> wrote in message
news:%23RqLMy4zIHA.4816@TK2MSFTNGP03.phx.gbl

  There is, however, one problem with all this:
  'std::basic_string<>' was not designed for multi-byte
  encodings. Therefor, when you put multi-byte encoded
  strings into it (and if we're talking Unicode, except
  for UTF-32 all encodings are multi-byte, since Unicode
  specifies >2^16 characters), you're on your own. (For
  example, 'wstring::size()' always gives you the number
  of 'wchar_t' objects in the string, which might be
  larger than the number of displayable characters.

Note that there's no one-to-one correspondence between "displayable
characters" and Unicode codepoints, what with combining diacritics,
ligatures, control characters and such.

It can be argued that the size of the string in Unicode codepoints is
useless for all practical purposes. E.g. for memory allocation purposes
you want the size in bytes (or some other fixed-size units). For text
editing purposes you want the size in glyphs (which may be less than the
number of codepoints).
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925