Question about panic in brelse()

Christoph Mallon christoph.mallon at gmx.de
Tue Jan 20 02:43:27 PST 2009


Sadly I got no response about the panic problem so far. I investigated 
further and I came to the conclusion, that there are at least two 
problems/bugs in brelse(). Here are my findings. All lines refer to 
sys/kern/vfs_bio.c at r183754, which is the most recent version of this 
file.

Below is the KASSERT(), which triggers in bundirty() (which is called by 
brelse()):
KASSERT(bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE,
     ("bundirty: buffer %p still on queue %d", bp, bp->b_qindex));

So this assertion triggers, if neither B_REMFREE is set nor the buffer 
is in QUEUE_NONE.

Now lets have a look at brelse() before the call to bundirty() happens. 
In line 1325 we find:
if (bp->b_flags & B_REMFREE)
     bremfreel(bp);

In bremfreel() B_REMFREE is cleared (line 684). So after this B_REMFREE 
is guaranteed to be clear and so the first part of the triggered 
KASSERT() will be false.

Now for the second part, i.e. QUEUE_NONE: At line 1331 starts an 
if-cascade. No matter which case of this cascade is entered, b_qindex is 
set and it especially in no case it is set to QUEUE_NONE. so after this 
if-cascade it is guaranteed that b_qindex is *not* QUEUE_NONE.

So when we arrive at line 1373ff at the problematic call to bundirty(), 
which contains the KASSERT(), it is guaranteed that neither B_REMFREE is 
set nor the buffer is in QUEUE_NONE and therefore the KASSERT() will be 
triggered. So we have a serious problem here: When the call is done, it 
*will* fail.

This is one part of the problem. I have no solution, I don't know what 
would be correct here. Further, I looked at r93707 when this call to 
bundirty() was added. The situation was basically the same back then, 
i.e. the buffer could not be in QUEUE_NONE (the check for B_REMFREE did 
not exist back then, so the KASSERT() was more strict). So this change 
seems to wrong all along or at least dubious. The commit log of r93707 
on the other hand claims that it solved a serious problem. If this was 
tested back then, it must have been done without INVARIANTS. It seems to 
be, that it was a hot fix - mind the MFC time of 1 day.


Now for the second problem. As stated in the original mail, there is a 
buffer for which a write attempt failed:
>   b_iocmd   = BIO_WRITE
>   b_ioflags = BIO_ERROR
>   b_error   = EIO
>   b_flags   = B_NOCACHE 

Because the error ist "just" an EIO, the buffer is re-dirtied and 
BIO_ERROR is cleared (line 1144ff). So writing the buffer should be 
tried again.
But later in line 1342ff it is determined that the buffer contains junk, 
because B_NOCACHE is set so B_INVAL is also set. This leads to entering 
the path in line 1373, which then triggers the KASSERT().
On the other hand, the buffer does not contain junk, because it should 
be retried to be written! It only should be not kept around after the 
write was successful (or is considered to be failed permamently). So I 
think testing B_NOCACHE here alone is to weak and the condition has to 
be modified. I propose this change:

@@ -1340,7 +1340,8 @@
  		}
  		TAILQ_INSERT_HEAD(&bufqueues[bp->b_qindex], bp, b_freelist);
  	/* buffers with junk contents */
-	} else if (bp->b_flags & (B_INVAL | B_NOCACHE | B_RELBUF) ||
+	} else if (bp->b_flags & (B_INVAL | B_RELBUF) ||
+	    ((bp->b_flags & (B_NOCACHE | B_DELWRI)) == B_NOCACHE)
  	    (bp->b_ioflags & BIO_ERROR)) {
  		bp->b_flags |= B_INVAL;
  		bp->b_xflags &= ~(BX_BKGRDWRITE | BX_ALTDATA);

This only enters the path to invalidate the buffer in case of B_NOCACHE 
being set, if B_DELWRI is *not* set.


Can somebody reconstruct my analysis and confirm/reject it?
Does my proposed solution for the second problem look sensible?
Are the two problems related, i.e. can the first problem only occur 
because the second problem exists?
What about the change in r93707 in general? Is it wrong? Does my 
proposal remove the cause for the change done in r93707 or are there 
other circumstances by which the situation can be triggered?


Hopefully somebody can shed some more light on this.

	Christoph


More information about the freebsd-hackers mailing list