powerpc64/GENERIC64 use of dcbst vs. dcbf: is the dcbst use really okay? Anyone know?
Mark Millard
markmi at dsl-only.net
Tue Sep 23 05:35:58 UTC 2014
I keep seeing if I can get more evidence of what is going on for the PowerMac G5 boot-hang problem. (I'd like full memory with reliable boots some day...) In the process I look around and research what I see.
And that can lead to such questions even if the question formed is not a possible explanation of a contribution to the G5 hangs.
In this case experiments had shown that dcbf's did not change the boot-hang problem. But I still had the question.
I gather from your explanation that those 4 places that use dcbst do not have issues with processors fetching "without coherence being enforced". .__syncicache seemed to be for more general use then the others and so seemed less likely to have special criteria that might always apply. That is part of what prompted me to ask: analyzing all the usage to prove such properties was more than I wanted to take on.
Side notes:
For Apple's BootX-81: if/when the kernel returns BootX-81 does its own restore of the openfirmware exception vectors before returning to openfirmware (returning a -1). So they seem to not presume openfirmware itself does so in that place. (More paranoia code?)
One point of paranoia Apple did not follow in BootX-81: They have dcbf then the matching icbi with nothing between and do all of the pairs of those instruction before doing either just one "isync; sync; eieio" (going into the kernel) or just one "sync; mtmsr ...; isync" on return from the kernel (kernel exit).
===
Mark Millard
markmi at dsl-only.net
On Sep 22, 2014, at 9:55 PM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
On 09/22/14 17:18, Mark Millard wrote:
> Anyone know why the following is true in FreeBSD (10.1-BETA2, for example) for kernel vs. openfirmware transitions (in both directions) for powerpc64/GENERIC64? (And some other places are noted.) The issue is dcbst vs. dcbf instruction usage.
>
> (I later quote from pem_64_bit_v3.0.2005jul2005.pdf (from IBM).)
Yes, a mix is used. It's done deliberately. dcbf implies dcbst plus a cache flush, but there is in general no need for an invalidation where dcbst is used in the kernel.
> Some context first...
>
> Apple's published BootX-81 always saves and restored the Exception Vectors when going between openfirmware and the kernel: it maintains separate vectors for the two contexts. In addition it carefully uses dcbf and icbi no matter if copies to that area at address 0 or to a save area. And that is followed by isync. (And more, sync and eieio: Apple seems paranoid.)
There's a lot of paranoia that is useful to attach to this process. Open Firmware is supposed to restore its own exception vectors (and does, in general) if it needs to. We could imagine doing that too. Doing that would probably involve a great deal of work implementing the OF callback infrastructure that we don't currently support, however. Is there a reason you think the exception vectors are causing a problem?
-Nathan
> Apple used dcbf instead of dcbst.
>
> IBM writes of dcbst vs. dcbf:
>
>> Instruction caches, if they exist, are not required to be consistent with data caches, memory, or I/O data trans- fers. Software must use the appropriate cache management instructions to ensure that instruction caches are kept coherent when instructions are modified by the processor or by input data transfer. When a processor alters a memory location that may be contained in an instruction cache, software must ensure that updates to memory are visible to the instruction fetching mechanism. Although the instructions to enforce consistency vary among implementations, the following sequence for a uniprocessor system is typical:
>> 1. dcbst (update memory)
>> 2. sync (wait for update)
>> 3. icbi (invalidate copy in instruction cache) 4. isync (perform context synchronization)
>> Note: Most operating systems will provide a system service for this function. These operations are neces- sary because the memory may be designated as write-back. Since instruction fetching may bypass the data cache, changes made to items in the data cache may not otherwise be reflected in memory until after the instruction fetch completes.
>> For implementations used in multiprocessor systems, variations on this sequence may be recommended. For example, in a multiprocessor system with a unified instruction/data cache (at any level), if instructions are fetched without coherency being enforced, the preceding instruction sequence is inadequate. Because the icbi instruction does not invalidate blocks in a unified cache, a dcbf instruction should be used instead of a dcbst instruction for this case.
>>
>
> Then the point given that background information...
>
> FreeBSD's powerpc64/GENERIC64 seems to have a mix of dcbst and dcbf use. The following have dcbst (unless patched separately at run time):
>
> 000000000086c1e8 <.agp_apple_unbind_page+0x60> dcbst r0,r0
>
> 000000000086c27c <.agp_apple_bind_page+0x64> dcbst r0,r0
>
> 00000000008b1b78 <.elf_reloc_internal+0x12c> dcbst r0,r30
>
> 00000000008bcd30 <.__syncicache+0x38> dcbst r0,r0
>
> That last is used during the openfirmware vs. kernel transitions. The above are from "objdump -d --prefix-address /boot/kernel/kernel".
>
> Is the dcbst use risky because of any unified caches at any level on any of the processors that powerpc64/GENERIC64 is supposed to handle?
>
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
More information about the freebsd-ppc
mailing list