Re: Fastest way to serialize arbitrary objects ???
On Apr 30, 3:01 am, Brian <c...@mailvault.com> wrote:
On Apr 29, 8:04 pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
I think that I figured out a way that is pretty simple and
fast. I simply serialize everything to a single
std::vector<unsigned int>, and then write this out.
I provide a quick way to determine the exact size of every
sub-object so that I can allocate the single std::vector all
at once, and each sub-object knows how to append itself to
the single std::vector<unsigned int>.
There are at least a couple of different ways to approach
this. The way I do it is to count the size of the message
and then begin marshalling the data. So I make two passes
over the types involved. There are some positive aspects
to counting the size before marshalling data:
1. I don't waste time putting all of the data into a
buffer/vector only to find late in the process that
the length of the message exceeds the maximum message
length.
2. I don't have to have buffers as big as the maximum
message length.
3. The first parts of the message can be dispatched to
their destination without waiting for the whole message
to be marshalled. Say the message is 200,000 bytes and
the buffer is 16384 bytes. My approach frees the
first parts of the message to go on their merry way
without having to wait for the balance of the message
to be formatted.
Another advantage is that you can define a protocol which puts
the length of each object at its beginning. This can
considerably speed up skipping an object you're not interested
in.
Those are the upsides of my approach. The downside is the
two passes through the objects. There may be some upside to
the downside though in that the first pass is a cursory
counting pass and may be helpful cache-wise since the
second pass follows immediately after the first pass.
Or if you have enough objects, it can hurt cache-wise by
ensuring that the first objects you visited and will write will
have been replaced in the cache by later objects:-). (Tuning
for cache behavior is incredibly tricky, and what is optimal for
one machine may be sub-optimal for another, even if the two
machines use the same basic architecture.)
--
James Kanze