gvirstor & UFS

Thu Mar 29 22:21:12 UTC 2007

On 30/03/2007 7:13 AM, Bruce Evans wrote:
> On Thu, 29 Mar 2007, Ivan Voras wrote:
> 
>> Bruce Evans wrote:
>>
>>> The following old patch may help.  vfs retries too hard after write
>>> errors.  Retrying after EIO is bad enough (since most parts of the
>>> kernel still expect the old treatment of not retrying), but retrying
>>> after a non-recoverable error is just a bug.
>>
>> I've tried the patch - it resulted in a panic :(
>>
>> g_vfs_done():virstor/foo[WRITE(offset=17353104384, 
>> length=131072)]error = 28
>> /bla: got error 28 while accessing file system
>> panic: softdep_deallocate_dependencies: unrecovered I/O error
>> cpuid=0
> 
> That is hard to fix.  The change to vfs_bio.c to not discard buffer 
> contents
> after a write error (rev.1.196 of vfs_bio.c) may even have been triggered
> by this and similar panics in soft updates.  However, I think it is a bug
> for file systems to not be able to deal with i/o errors.  Rev.1.196 could
> have reasonably left the buffer alone instead of discarding it as before
> or clearing its error indicator and dirty flag as now, so that file system
> code could deal with the error a little later.  Then I think the above
> panic would still occur, sincs soft updates can't deal with the error.
> Soft updates is apparently depending on not even seeing the error.  But
> some errors are non-recoverable, so not seeing them is no solution.

Is this at all related to the whole being unable to unmount a filesystem 
after the device goes awall (eg. removable media disconnected), and 
forcibly doing so leads to a panic?

I haven't looked into this recently, but recall stumbling across the 
thread as something that interested me some years back. I imagine this 
would be particularly messy for any sort of embedded system dealing with 
removable media (eg. USB/Firewire devices), as if an end-user unplugs a 
device when you don't want them to, the whole system will collapse in a 
heap...

 From what I gather, this is probably related, as the device goes 
walkabout generating IO errors, but the filesystem keeps trying over and 
over to flush any dirty buffers.

Forgive me if any of the above seems incorrect - I'm just going from memory!

--Antony