(no subject)
Matthew Dillon
dillon at apollo.backplane.com
Sat Jun 4 07:07:27 GMT 2005
Subject: Re: Possible instruction pipelining problem between HT's on the same
die ?
References: <200506032057.j53KvOFw062012 at apollo.backplane.com> <20050604021812.GG594 at funkthat.com> <200506040257.j542veCm063487 at apollo.backplane.com> <42A11F4A.40502 at samsco.org>
:I would expect that putting the fence on the write side will solve the
:problem. As Stephen discussed, the writes will land in a store buffer
:for a period of time, during which a fence on the write CPU will flush
:it out and make it visible to the other CPUs. Doing a fence on the read
:CPU will have no effect on the store buffers of the write CPU and will
:be a waste of time.
As a way to reduce latency... but a fence on the write side does
not solve the reordering problem on the read side. When the write
side writes the FIFO entry and then updates the FIFO index, the
read side must be able to guarentee that the FIFO data it reads
is valid when it sees that the FIFO index has been updated. This
means that the read side cannot afford to allow the reads to be
reordered and thus must use some sort of fence.
:Another thing to keep in mind is that there is no difference here
:between HT and non HT SMP protocol. While HT cores share execution
:units, they DO NOT share registers, store buffers, or cache (at least,
:not in a way that is visible outside of the low-level implementation of
:the chip).
:
:Scott
They do share the cache, but I see your point. I'm not sure about
store buffers but from the behavior I've observed I suspect that
store buffers either are not shared, or a logical cpu's store buffer
sniffing does not extend to the other logical cpu's entries.
In our case latency is not a big issue. These are almost universally
asynchronous messages flying between the cpus. What matters is the cycle
overhead on each side to send and process the message. Hence I was
trying to avoid the use of locked bus cycle instructions. I'll have
to run tests to check the relative expense of the *FENCE instructions
(when supported) verses doing a lock; addl 0(%esp) to fence the read.
At least I don't have to put the fence in the body of the processing
loop... I just have to put it after the read of the FIFO's write index
before the loop is entered.
It's a real shame that special instructions are required at all.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-hackers
mailing list