Re: Big Endian - Little Endian

From:

peter koch <peter.koch.larsen@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 2 Sep 2008 11:29:43 -0700 (PDT)

Message-ID:

<a74db2a1-5618-4a75-8424-40804c6ae4bc@y38g2000hsy.googlegroups.com>

On 2 Sep., 17:28, James Kanze <james.ka...@gmail.com> wrote:

On Sep 2, 12:59 pm, peter koch <peter.koch.lar...@gmail.com> wrote:

On 2 Sep., 11:13, James Kanze <james.ka...@gmail.com> wrote:

On Sep 1, 10:51 pm, peter koch <peter.koch.lar...@gmail.com> wrote:

On 1 Sep., 22:24, James Kanze <james.ka...@gmail.com> wrote:

On Sep 1, 4:54 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

James Kanze wrote:

but I can't imagine the need today, with 64 bits virtual
address space).

Clearly you have never needed to read/write humongous amounts
of data as fast as possible.

You'd be surprised:-).
In the good old days, when we had to fit the application into
64KB, just dumping the bits was a very efficient way to
implement overlays of data (and of code, for that matter, if the
system and the linker supported it). Today, however, the
address space of virtual memory is larger than the biggest disks
I can get my hands on, so the only reason I would explicitly
write to disk (as opposed to paging) is because I need to be
able to reread it later. Which means that it must have a
defined format, and just dumping the bits doesn't work.

I mostly agree, but there are exceptions, and they are not
THAT few. One apparant exception is databases: if you want
high performance and reliability, there is no way out of
writing data explicitly to disk and doing so in a binary
format.

Nobody said to not use a binary format. But a binary format is
not "unformatted". All of the high performance data bases I
know use a proprietary internal format, with little impact on
performance.

Of course, portability suffers but you don't want to port to
exoteric machines anyway.

You might, however, want to be able to reread the data with a
program compiled with a newer version of the compiler. Or with
a protgram compiled with different compiler options.

Certainly. This requires you to have some knowledge of the
CPU(s) you target, and you must hope for some sensible
strategy from your compiler vendor.

My experience has been that that is a rather vain hope:-). And
realistically, you do expect to use larger integers sooner or
later, especially where file systems are concerned. (The size
of the integers in a data base will in some cases depend on the
size of the largest disk you support.)

Or on the size of the database, you plan to support (considering that
you might allow databases spanning more than one disk).

I do not believe that "plugging the database" right into
another kind of processor is needed. As an example, if the
next generation of Intel processor (or compiler) skipped
eight-bit characters entirely, I would consider this a new
architecture and would find it sensible to require an upgrade
from the users (e.g. doing a backup and restore rather than a
raw file copy).

Yep, and in order to update and migrate, you need a known and
specified format.

I've seen
byte order change from one version of the compiler to the next
(Microsoft, for PC's---not the most exoteric machine around),
and all of the compilers I use have options which change the
size of some of the basic types, padding in structures, etc.

You can't in general write a structure in binary. What I am referring
to would be e.g. the layout of a B-tree page, an inode or something
like that.

The idea that the OS might write the inodes by just dumping the
bits doesn't bother me, even if it is a struct. It's a very
special case. (People programming at that level generally have
to know exactly how the compiler lays things out anyway.) But
most programmers aren't accessing disks (or anything else) at
that level. I'd consider code to write inodes rather rare.

But then there is also the code from Oracle, Sybase, Ingres and all
the other database-providers - as well as the people writing file-
systems. I agree, that this is quite a specialised domain, but must
probably there are others as well, where it is an advantage to be able
to quickly write data and transfer it to C++ structures (pdf- and jpeg-
files cross my mind).
Still, we both agree that unless you write huge amounts of I/O to
disk, and can profit from a "fast" translation (often, disk i/o times
will dominate the picture), you should do a proper serialisation. And
if you need to do the dirty stuff, you should be very, very careful
and be absolutely certain that you know what you do.

/Peter