Re: C++ way to convert ASCII digits to Integer?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 1 Jun 2009 04:10:43 -0700 (PDT)

Message-ID:

<898755be-1c6e-4fec-ac72-5c72e905e2f1@n4g2000vba.googlegroups.com>

On May 31, 4:52 pm, "Alf P. Steinbach" <al...@start.no> wrote:

* osmium:

James Kanze wrote:

On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:

James Kanze wrote:

(Given that ASCII is for all intents and purposes dead, it's
highly unlikely that they really want ASCII.)

I'm not sure, but I think in the USA there is quite a number
of programmers who don't think beyond ASCII when thinking of
text manipulation.

In just about every country, there are quite a number of
programmers who don't think:-). The fact remains that the
default encoding used by the system, even when configured for
the US, is not ASCII. Even if you're not "thinking" beyond
ASCII, your program must be capable of reading non-ASCII
characters (if only to recognize them and signal the error).

Is it your point that an ASCII compliant environment would
have to signal an error if the topmost bit in a byte was
something other than 0?

I think James is perhaps referring to routines like isdigit
family.

Some of them take int argument and have UB if the argument
value is outside 0...(unsigned char)(-1).

s/Some/All/

The standard says 0...UCHAR_MAX or EOF. But UCHAR_MAX and
(unsigned char)(-1) are, of course, guaranteed to be equal. And
EOF is guaranteed to be negative, so there can never be any
ambiguity between one of the characters and EOF.

So with most implementations you get UB if you simply pass a
char directly as argument and that char is beyond ASCII range,
because then it will be negative.

That is, of course, something that you always have to consider.
The "official" answer, in C++, is to use the corresponding
functions in <locale>. Which have been carefully designed to be
even more verbose than the C function with the cast, and to run
several orders of magnitude slower. (But other than that,
they're fine.)

After all, the people that wrote the standard were well
aware that there was no such thing as a seven-bit machine.

On the contrary, the seven bit nature of ASCII was to
facilitate communication over e.g. serial links with software
parity check, where each byte was effectively seven bits
(since one bit was used for parity).

I'm not sure what the original rationale was. One mustn't
forget that at the time, six bit codes were quite commun.
Moving to seven bits probably seemed to be the minimum solution
to support both upper case and lower. And that given the
transmission speeds at the time (110 baud for a teletype), every
bit gained helped. But the fact that you could put the
character with parity into an octet was probably a consideration
as well.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34