Re: std iostreams design question, why not like java stream wrappers?
On Aug 27, 8:52 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
On Aug 26, 8:07 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
In article <fdb1cf7f-9851-49e1-90a4-7adb771fdad2
@o9g2000prg.googlegroups.com>, joshuamaur...@gmail.com says...
I've always found the C++ std iostreams interface to be
convoluted, not well documented, and most non-standard
uses to be far out of the reach of novice C++ programmers,
and dare I say most competent C++ programmers. (When's the
last time you've done anything with facets, locales,
etc.?)
Last week, though I realize I'm somewhat unusual in that respect.
Ok. Also, do you use wstream in any real code? Or wstring?
I don't, because I generally use UTF-8, and have my own
libraries, etc., for doing so. But talking around, I'd say that
wstring, wistream, etc. are fairly widespread---at least as
frequently as there narrow character equivalents.
And applications which don't use locale are really the
exception, although the use is generally limited to
std::locale::global( std::locale( "" ) ) ;
at the start of main. Unless the application doesn't do any
text handling.
Formatting is divided between a locale and an iostream. A
locale contains all the details about things like how to
format numbers (including what characters to use for digit
grouping and such). The iostream mostly keeps track of flags
(e.g. set by manipulators) to decide what locale to use, and
how to use it.
But only very basic formatting, and only formatting which the
standard library thought useful. You add another kind of
formatting, just tweak existing formatting rules.
One of the most important principles of iostream is that the
formatting (and the sink/sources) should be user extendable.
It's easy to add additional formatting options---just overload
operator<< and operator>>. I've yet to see an application
anywhere that didn't do this.
Also, the std streams internationalization support is at
best pisspoor. The existence of locales and their meanings
are implementation defined. One cannot rely upon any of
the C++ standard locale + facet stuff for a portable
program.
Yes and no. The only piece that's implementation defined is
exactly what locales will exist (and what name will be given
to each).
So, basically entirely implementation defined, and as there is
no particular standard in use beyond the C++ standard, they're
basically worthless for portable code.
The most important one is "", which is portable. That handles
all of the real "locale" stuff. The only real problem I've
experienced has been encoding.
An awful lot of programs can get by quite nicely with just
using whatever locale the user wants, and C++ makes it
pretty easy to access that one -- an empty name gets it for
you.
Only if you enjoy serving English speakers.
We use it all the time for French, with no problem. Under Unix,
locale( "" ) means pick up the correct locale from the user's
environment variables.
The iostream library is woefully insufficient on its own for
anything but that. When you start making products in which
English is not the language of choice, iostreams become little
more than binary mode byte streams.
Having written extensive projects in both France and Germany,
using iostream for all our input and output, I can definitely
say that that's false. About the only place I've found the
standard deficient here in terms of provided functionality is in
formatting complex (where it imposes a comma between the real
and the imaginary part).
It's also entirely convoluted and complex, and doesn't
support simple things like changing from one encoding to
another.
I beg your pardon?
[...]
At least they didn't do like Java and decree that wide
characters were, and would always remain, 16 bits. A C++
implementation can get things right or wrong, but a Java
implementation is stuck with being wrong.
No. Java may have gotten it "wrong" quote unquote, but a wrong
answer now is infinitely better than "implementation defined".
How many real successful Java programs are there out there
making use of built-in support of Unicode? Lots. How many real
successful C++ programs are there out there making use of
wstring and wiostream? I would assume basically none.
I don't know. Their use seems rather common here.
Don't get me wrong: I'm not trying to defend iostreams as
being the perfect design, or anything like that -- but it
seems to me that the design is a bit better than you're
portraying it.
This is what I want:
1- A cleaner and simpler interface than iostreams wrapping
streambuf, and the relatively nasty interface of a streambuf
itself. When some standard library's implementations are 100x
slower than its printf library, I'd say that yea, it's pretty
complex.
With the exception of the encoding issues, the only problem I
see with the streambuf interface is the naming. The encoding
issues are poorly designed; what is needed is a filtering
streambuf which reads from a byte oriented streambuf, and
presents the interface of a wchar_t streambuf. (That's also how
I implement things in my own code... except that my filtering
streambuf still returns char, but guarantees legal UTF-8.)
As for the speed... people have been optimizing printf for 30
years now; when printf appeared, it was very important. The
best iostream implementations (e.g. the one by Dietmar K=FChl)
beat printf/scanf in speed; if they haven't been widely adopted
in commercial libraries, it's because the current
implementations are felt to be "fast enough", and there's no
presure for more speed from them (with a few exceptions).
I'd also say that it's convoluted given that it doesn't really
solve any problems. Sure, it correctly handles the systems end
of line, and it correctly uses the right code points for
converting an integer to string representation for the locale,
and cool stuff like a comma or a period for the thousand
separators and "1 to tenths" separator. However, the entire
iostream library is overkill if those are the only problems it
solves, hence convoluted.
You seem to be confusing the issues. The iostream library isn't
concerned about much of that. The iostream library provides:
-- a standard interface for data sinks and sources
(std::streambuf), along with two sample implementations
covering the most frequence cases (files and strings),
-- a standard interface for formatting, using the strategy
pattern to cleanly separate sinking and sourcing bytes from
the formatting---the formatting interface uses overloading
so that client code can extend it for types it's never heard
of,
-- a standard error handling strategy (which is rather
simplistic), and
-- a couple of wrapper classes to make it easier to set up the
most common cases (reading or writing from a file or a
string).
For all localization issues, it defers to <locale>, which is
overly complicated for what it does (which isn't really enough).
2- No virtual overhead if it's not required. I should not have
to go through a virtual function to write to a file or to a
string with a stringstream.
And the alternative is? The alternative is simply an
unacceptable amount of code bloat, with every single function
which does any I/O a template. Without the virtual functions,
iostream becomes almost unusable, like printf and company.
I'd like to be able to use it in performance critical aspects
of code without invoking the holy war of printf vs cout. Also
see point 1 for why it might be slow.
3- Actual support for changing between encodings like UTF 16 and UTF
32. Ex:
raw_ofstream out("some_file");
buffered_ostream<raw_ofstream> out_2(out);
out_encoder<buffered_ostream<raw_ofstream>> out_3(out_2, "UTF-16");
//you have some data in utf8 format, like in a utf8string
utf8string str;
out_3 << str;
//or perhaps some raw data which you know the encoding
//(Yes, a string literal may not be ASCII. I know.)
out_3.writeString("foo", "ASCII");
Or more likely, a helper class would exist:
ofstream_with_encoding /*for want of a better name*/ out
("some_file", "ASCII");
utf8str str;
out << str;
utf16str str2;
out << str2;
with whatever failure mechanism by default if it can't do the
encoding translation, be that by exception or setting the fail
bit, and of course it would be configurable on each iostream
like setw.
That's less obvious than you think, because of buffering. You
can change the encoding on the fly, e.g.:
std::cin.rdbuf()->imbue( localeWithNewEncoding ) ;
or
std::cin.imbue( std::locale( std::cin.getloc(),
newEncodingFacet ) ) ;
but there are serious practical limitations (which affect e.g.
the Java solution as well). Buffering and changing encodings on
the fly don't work well together.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34