Re: Questions about buffered streams

From:

Lew <lew@lewscanon.com>

Newsgroups:

comp.lang.java.programmer

Date:

Wed, 09 Jan 2008 10:50:59 -0500

Message-ID:

<7pOdndPAi7T0cBnanZ2dnUVZ_jidnZ2d@comcast.com>

Roedy Green schrieb:

If you wrote a byte at a time, since the hardware works in 512-byte
sectors at a time, the OS would need to read the sector, plop your
byte into it and write the entire sector back. This would take at
least 2 disk rotations, perhaps 3. Even if you wrote your data 512
bytes at a time, when you went to write the next sector, its spot
would have just past the head, so you would have to wait an entire
rotation for its spot to come round. If you wrote 131,072 bytes (still
less than 1 physical track) at a pop, you could do that all in one
rotation.

Christian wrote:

I doubt it is that simple with a modern OS.
As discs have large caches that buffer read/write operations.
The OS has a cache that does additional buffering shure these caches may
be slower than your buffer that may reside in the cache of the cpu...
but that doesn't mean you can measure or explain the latency of writing
single bytes with hdd rotation.

Let us not forget the effect of file systems. A journaling file system will
add more physical writes to the logical writes that Java requests, further
complicating matters. And we aren't talking RAID, even. As others have
pointed out, the issues pertain if disks aren't even involved, as with TCP/IP
streams.

It is next to useless to talk about platters and heads and disk spin in a Java
context. Just about any IO Stream will behave better with larger chunks, up
to a point, even if it's only because of the CPU chip's own internal memory
cache. Memory accesses are striped, too.

The rule of thumb is that a write() carries overhead. The penalty of that
overhead is reduced with a larger payload - the Automated Teller Machine (ATM)
fee effect. The larger the transaction, the smaller the fee in proportion to it.

For just about all practical IO Streams, the write() overhead is large enough
to make that 64KB go much faster as one write than as 64K individual one-byte
writes. Disks, platters and heads are not even in that overhead any more [1]
- it's all OS, file-system and driver in-memory overhead and cache accesses,
mobo and outboard both.

[1] for the large category of applications not requiring guaranteed writes
(e.g., not RDBMSes).

--
Lew