Re: Lock variables between two threads
"Alex Blekhman" <tkfx.REMOVE@yahoo.com> wrote in message
news:OE7Tju6ZJHA.284@TK2MSFTNGP04.phx.gbl...
"Ben Voigt [C++ MVP]" wrote:
Yes, but can the CPU exchange single bytes with main memory (or L2/L3
cache shared between cores), or always an entire word (or more)? I guess
the cache coherency protocol guarantees atomicity of the
read/modify/write to the cache line even with an instruction without LOCK
prefix.
Actually, you cannot use LOCK prefix with MOVx instructions. There is
limited set of instructions that can utilize LOCK prefix. Also, according
to Intel SW developer manual, reading and writing a byte from/to memory is
atomic operaton:
<quote>
7.1.1. Guaranteed Atomic Operations
The Intel386T, Intel486T, Pentium?, and P6 family processors guarantee
that the following basic memory operations will always be carried out
atomically:
. Reading or writing a byte.
. Reading or writing a word aligned on a 16-bit boundary.
. Reading or writing a doubleword aligned on a 32-bit boundary.
</quote>
Alex
Sure.
I was just pointing out that changing a byte potentially requires:
Reading a cache line from main memory (since the memory controller interface
is umpteen bytes wide and doesn't support smaller transfers).
Modifying one byte in the cache line.
Writing the cache line back to main memory.
And asking, what keeps this sequence from stomping on top of other data in
the same machine word being written by a different core. I think the answer
is "the cache coherency protocol".
I wonder if there's a bus lock taken between the "take ownership of the
cache line" and "modify the content" to prevent a tug of war where nothing
ever actually gets accomplished (until one thread gets context switched
out). I guess the moral of the story is that avoiding false sharing is at
least as important to performance as avoiding unnecessary LOCK prefixes
(although maybe only with a few cores, whereas the LOCK prefix is worse with
many-core). Maybe someday LOCK will be changed from a bus-wide lock to
snooping and an address-linked lock to only prevent other cores from
modifying the same address simultaneously. Maybe this is already true on
NUMA?