Re: How to to convert object to XML string and back again
On Dec 20, 3:03 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
Maybe I am naive, but the below is completely typical
for data files in what used to be my field of work
(cut'n paset from page 21,http://acoustics.mit.edu/faculty/henrik/oases.pdf):
SAFARI-FIP case 3. Poroelastic.
N C A D J
30 30 1 0
5
0 0 0 0 0 0 0
0 1500 -999.9990 0 1 0# SVPcontinuous at z = 30 m
30 1480 -1490 0 0 1 0
100 -1 -1 0 0 0 0 0 # Cp<0 Cs<0 flag poro-elastic layer
1 2.E9 .001 2.65 9.E9 .4 2.E-9 1.E-5 3.13E8 5.14E9 .8 1.55 1.25
120 1800 600 0.1 0.2 2.0 0
50
0.1 120 41 40
1350 1E8
-1 1 950
0 5 20 1
20 80 12 10
0 120 12 20
40 70 6
XML would be a huge improvement on this state of affairs.
I even find XML encoding of PDF, Word or Excel files
appealing, as it allows me to parse such documents
for information far more efficiently than without XML.
Did I misunderstand something about XML? Or are there,
in your opinion, better ways of achieving the same
functionality? Preferably with C++, but that's not
essential.
You're right that looking at the dataset above versus XML is easier on
your eyes.
But it is not easier on the eyes of the computer. The computer does
not care. The labels and the hiearchical structure is what makes it
better, but only for human consupmption. This is a subtlety that the
XML-with-C++ pronenents do not seem to understand.
Let me illustrate more clearly by starting with a C++ object, and show
three methods of storing its state to disk, then explain why the first
two are actually identical, one being easier on the eyes than the
other, but the third, based on the *mindset* of the XML proponents, is
entirely different from the first two, even though it looks like the
second of the first two.
Our goal is to take a C++ object that represents a system that is
hugely hierarchical, containing a massive amount of state, and put it
to and from disk:
struct Airbus_A380 : public Airbus
{
Fuselage fuselage;
Empanage empanage;
Wing wing_left;
Wing wing_right;
Engine E1, E2, E3, E4;
etc;
} ;
Method 1 [Straight Binary]:
Airbus_A380 my_airbus_A380;
Target t;
t << my_airbus_A380; // sends out the fuselage, empannage, wings,
engines, etc. in raw binary
This is the method that Boost Serialization uses of course. The
binary state of the object is simply serialized to disk.
Method 2 [String Encoding of C++ Data Types]
This method is very much like the straight binary method, except,
instead of sending say, 4 bytes for an int, you write out a string
describing the fact that what is being written has type int, then
write its value as a string:
[unsigned int : 15625]
Normally, only the type is writen, not the actual name of the variable
(of course). If a structure like the Airbus above is written, these
elements become hierarchical. Unfortunately, if the only type name
information written to the file is for the 13 scalar types of C++, a
human would see the hierarchical structure, but still would not be
able interpret the state to be that of an Airbus_A380. The types
(scalar/vector/aggregate) must be labeled, by hand, by the programer,
and written to disk:
[Fuel_Tank_Left : Fuel_Tank]
[fuel_level : unsigned int : 15625]
Method 1 and Method 2 are almost equivalent. Even without labels, it
is easier to interpret the format of Method 2 while being unable to
interpret the data format of Method 1. It is also possible for the
Method 1 "import" serialization code written to be reused to import
data written with Method 2, by simply discarding the labels.
Exporting, with inclusion of labels, however, is a different matter.
The programmer must explicitly write code to export the labels of all
variables. He could call the Fuel_Tank "Coca_Cola_Bottle" during
exportation, and the importation would still work. The problem is
that the computer doesn't care when it imports.
So with every change of the Airbus_A380 , the programmer must touch
the serialization code, and this is what bothers the XML-to-object-to-
XML people so much. They want to eliminate this touching and many try
in vain to find a breakthrough to do this, perplexed that it is
trivial in Java, but seemingly impossible in C++, not realizing that
Java and C++ are not really comparable as programming languages, in
the sense that one cannot call Java a generic programming language any
more than one could call 8080 assembly a generic programming language,
because both presume a specific execution environment. Java presumes
the JVM, 8080 assembly presumes the 8080 CPU, and C++ presumes pretty
much any computer that has a Von-Neumann architecture and sufficient
RAM, and even a few other architectures. So one must be specific
comparing programming "languages" by simultaneously defining the
target environment before making the comparisons. Pascal, and similar
languages that do not allow significant amounts of superfluous run-
time state could be meaningfully compared to C++.
So if you are C++ programmer, and you want the benefit of XML, it is
best to recognize specifically that which makese XML so appealing: the
human interpretability. Then STOP! LOL. :). Assume nothing more. Most
importantly, accept that it is only interpretable by humans, and not
by computers, on any level. Then you can use one of the following
options:
Option A:
Use Method 2, explicitly writing out the labels of the structures in
Boost-like serialization code, using a target arhive that knows that a
long double should be writtten out with the string 'long double', for
example.
Option B:
Define a data structure in C++ which is an associative tree of
strings, each node of the street a map from string-to-string or
string-to-list-of-string. The C++ code surrounding the object in RAM
will do lookups of string values, then interpret them at run-time.
The object containing the state can be trivial serialized to and from
disk, with hiearchical formatting and even comments, but rigitity of
form inside the program will quickly disintegrate with this method.
The code will be very messy, as the "raw" internal C++ code will be
doing run-time intepretation of strings, and it will be easy to add an
unknown string ....(then what?)...
The XML proponents are not looking for either of these methods. They
want code to magically write itself, in a language that was designed
with the premise that the target architecture is mostly devoid of
superfluous run-time state.
-Le Chaud Lapin-
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]