Re: std::string and case insensitive comparison
On Jul 20, 6:20 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
James Kanze wrote:
On Jul 20, 11:04 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
[snip]
For the use in std::transform(), I would suggest a function object like
this:
[...]
I'd suggest extracting the ctype facet once up front, since
that's what std::tolower is going to do anyway.
That's a good idea. I did some benchmarking, at it appears
that pulling this step out of the inner loop really pays off.
I didn't want to stress it too much, because you do want to get
the code working first. But in this case, it will come up,
sooner or later. And once you're familiar with the way facets
work, it's just as natural, or even more so, to use them
directly.
This has a potential problem with the lifetime of the facet if
the user passes it a temporary locale, or changes the locale
while instance of the class is alive. A perfectly robust
solution requires keeping a copy of the locale in the object as
well (which in turn makes copying it significantly more
expensive).
One can use a shared_ptr<> to circumvent this.
std::locale itself is more or less required to use the
equivalent of an intrusive shared_ptr for each of its facettes.
It seems somehow abherent to then be required to allocate the
std::locale itself dynamically, and use a shared_ptr for it.
But I suspect that if there is much copying going on, this might
be the fastest way to go.
I tested both versions, and they seem to be equally efficient
(in a single-threaded environment, that is).
A priori:
-- If you can be sure that at least one locale using the facet
will stay alive, the fastest solution is to not worry about
it. For an application specific to_lower, this is likely
the case; not many applications play with locales once
they've set the global locale in main. For a generic
to_lower, on the other hand, it's playing with fire, even if
you document the restriction.
-- Copying a to_lower object is probably a lot faster using the
shared_ptr, rather than copying a complete std::locale
object; the shared_ptr requires updating one counter,
copying locale requires updating a counter for each facet
the locale contains.
-- Creating a to_lower object is probably a lot faster using
the complete copy of the std::locale object, since that
doesn't entail any dynamic allocation, just updating
something like 7 counters. (I am assuming that on most
systems, a dynamic allocation will require significantly
more work.)
-- Various compiler optimizations may change the number of
copies, so any measurements you make may be invalidated with
the next release of the compiler:-).
[...]
In professional code, I agree that using <locale> is the way to
go. But <locale> was designed to make it particularly difficult
to use. For a beginner, I'd suggest writing your own functional
object with the tolower in <ctype>, and casting the char to
unsigned char. While less flexible as a solution based on
<locale>, it's an order of magnitude (or more) simpler to write
and understand.
Agreed. However, this question comes up regularly on this
group. Maybe it would be worthwhile to have a truly robust and
efficient solution in the archives or the FAQ. As of now, I
would vote for to_lower_a<>.
That's basically I'm currently using:-).
I only recently got around to putting this into my library, and
I'm having problems with my provider (nothing new---I'm trying
to use ADSL, I'm five kilometers from the switch, and the line
is old, runs next to a high density electric rail line, and very
noisy), and will not be updating the version on the web
immediately. I'm looking into alternatives, however, and hope,
sometime soon. (But these sort of things always end up taking
more time than I imagine.)
--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34