Re: converting floating point types round off error ....

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 1 Dec 2008 01:55:01 -0800 (PST)

Message-ID:

<6efd5965-5a13-4ef0-a958-8f683e718779@y18g2000yqn.googlegroups.com>

On Dec 1, 4:11 am, forums...@hotmail.com wrote:

On Oct 6, 10:06 am, James Kanze <james.ka...@gmail.com> wrote:

On Oct 6, 2:38 pm,ma740988<ma740...@gmail.com> wrote:

   char* put_swapped_data(char* out, const float& source) {
     *(float *)out = source;
     Swap(*(float *)out);
     return out + sizeof(float);
   }
   char* GetSwappedData(float& target, char* source) {
     target = *(float *)source;
     Swap(target);
     return source + sizeof(float);
   }

As above, except these will only core dump 3 times in 4,
rather than 7 in 8.

Curiosity question. How were you able to arrive at the ratios
3/4 (float), 7/8(double)?

Alignment considerations. A float must be aligned on a multiple
of four, a double on a multiple of eight.

You can't take a char*, and assign a float or a double to
it; there's no guarantee that it is a legal address for a
float or a double.

What do you mean by 'no guarantee it is a legal address for a
float or a double'?

Just that. The value in a char* may not be a legal address for
a float or a double. (Of course, if you're messing around with
reinterpret_cast, the value in a float* or a double* may not be
a legal address for a float or a double. Don't use
reinterpret_cast unless you really know what you're doing.)

Now assume the problem is binary serialization, I suspect
converting to an unsigned integer large enough for a float or
double then playing games with bit shifting might work?

That's the way I usually do it:-). Strictly speaking, it's not
100% portable; for starters, you're not even guaranteed that
such an unsigned integral type exists. (There is, in fact, at
least one platform where it doesn't.) And even if it does,
there's no guarantee concerning the format of a float. For
maximum portability, you should define your serialized floating
point format, and play games with frexp and ldexp to create it.
Something like:

    bool isNeg = source < 0 ;
    if ( isNeg ) {
        source = - source ;
    }
    int exp ;
    if ( source == 0.0 ) {
        exp = 0 ;
    } else {
        source = ldexp( frexp( source, &exp ), 24 ) ;
        exp += 126 ;
    }
    uint32_t mant = source ;
    dest.put( (isNeg ? 0x80 : 0x00) | exp >> 1 ) ;
    dest.put( ((exp << 7) & 0x80) | ((mant >> 16) & 0x7F) ) ;
    dest.put( mant >> 8 ) ;
    dest.put( mant ) ;

and

    uint32_t tmp ;
    operator>>( tmp ) ; // shifts and or's...
    if ( *this ) {
        float f = 0.0 ;
        if ( (tmp & 0x7FFFFFFF) != 0 ) {
            f = ldexp( ((tmp & 0x007FFFFF) | 0x00800000),
                       (int)((tmp & 0x7F800000) >> 23) - 126 - 24 ) ;
        }
        if ( (tmp & 0x80000000) != 0 ) {
            f = -f ;
        }
        dest = f ;
    }

(This results in XDR representation for floats.)

If your portability needs are limited to machines supporting
IEEE floating point, however, memcpy'ing the floating point
value into an unsigned integral type of the same size, then
shifting an or'ing, is sufficient, and may be slightly faster.
(At least on a Sparc, however, the above is not outrageously
slow.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34