Re: Fat String Class
On Jul 23, 8:40 pm, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
wrote:
Le Chaud Lapin ha scritto:
Second, the example I used comparing French "exasperation" to English
"exasperation" was poor. I was probably tired. I am again tired, so
no good examples come to mind right now, but my gut feeling is that
the class should have at least what I have been calling "locale", even
I guess "language" is a more appropriate term than locale here, unless
with "locale" you mean other kind of contextual metadata, which is bound
to be more complex.
I finally had a chance today to look more at your ICU link:
http://www.icu-project.org/userguide/intro.html
It seems that locale includes language/country/script, and probably
others. I would probably grab as much information as possible.
though that might not be the correct term. All of you have warned
against putting intelligence in the string class. I wonder if this bit
of extra information would count as too much intelligence. While I
have not read enough about Unicode to know the path I will follow, I
will probably include this bit of information anyway.
<snip>
String<> s1 = "mein"; // German for English "mine"
String<> s2 = "mein"; // English stolen from Chinese for type of
noodle.
Ok, so... what about this:
"Let's eat spaghetti, a bratwurst and a cr?me br?l?e"
is this English, Italian, German or French? You can't simply attach
metadata to the *whole* string, you have to consider substrings too. You
have two choices, either you can store metadata in a struct separate
from the textual data, or you store them in the textual data itself by
effectively introducing some form of tagging. XML (with xml:lang) and
Unicode language tags (see 16.9 inhttp://www.unicode.org/versions/Unicode5.0.0/ch16.pdf) follow in the
latter category. Despite the obvious added complexity in parsing and
traversing the string, the tag approach has a lot of advantages.
However, according to Unicode terminology, attached metadata are
responsibility of a "higher level protocol" and so, IMHO, should not be
addressed by the basic container.
This is a good example. My answer: I don't know. :) I have to learn
more about Unicode.
If there is some super-code that will include all Latin scripts, I
would use that for this example to represent the string. Then all
would depend on what needed to be done. As everyone mentioned, this
will depend on the context in which the operations are to be
performed.
I will most likely fatten my String<> class sufficiently to allow the
programmer to insert semantic indicators into the string objects
themselves, at run-time, so that operator == will know what to do when
it is applied to two string objects.
-Le Chaud Lapin-
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]