kernel: swap_pager: indefinite wait buffer - on 5.3-RELEASE-p5

Tue May 3 01:18:31 PDT 2005

Oliver Fromme wrote:
> Uwe Doering <gemini at geminix.org> wrote:
>  > Oliver Fromme wrote:
>  > > If they're really identical (i.e. the same size and same
>  > > geometry), then you can use dd(1) for duplication, like
>  > > this:
>  > > 
>  > > # dd if=/dev/ad0 of=/dev/ad1 bs=64k conv=noerror,sync
>  > > 
>  > > The "noerror,sync" part is important so the dd command will
>  > > not stop when it hits any bad spots on the source drive and
>  > > instead will fill the blocks with zeroes on the destination
>  > > drive.  Since it's only the swap partition, you shouldn't
>  > > lose any data.
>  > 
>  > I would like to point out that the conclusion you're drawing in the last 
>  > sentence is invalid IMHO.
> 
> I'm afraid I don't agree.
> 
>  > "indefinite wait buffer" messages at 
>  > apparently random block numbers just indicate that the pager was unable 
>  > to access the swap area (in its entirety!) when it wanted to.  It means 
>  > that the disk drive was either dead at that point in time or busy trying 
>  > to deal with a bad sector.
>  > 
>  > This sector could have been anywhere on the disk.  It just kept the disk 
>  > drive busy for long enough that the pager started to complain.
> 
> The OP specifically said that the swap_pager messages were
> the only kernel messages that he got.  That indicates that
> only the swap partition is affected, because otherwise
> there would have been other kernel messages indicating
> I/O errors from one of the filesystems on that disk.

Your assumption here is that the filesystem code would become impatient, 
too.  This in not the case.  The swap pager has a timeout built in (20 
seconds IIRC) after which it prints a warning message and continues 
waiting, but there is nothing like this in the filesystem code.

If the disk drive is dead or busy trying to deal with a bad sector in a 
filesystem the kernel will wait silently and indefinitely until either 
the disk drive succeeds in recovering the sector, or it fails to do so. 
  In the latter case the kernel would log an I/O error.  But only when 
it hears back from the disk drive and not any earlier, in contrast to 
the swap pager.  That's why you often see only swap pager messages in 
case of a dying disk drive.

I checked the kernel sources, but of course I could have missed the 
relevant lines.  In this case I would appreciate a pointer to the place 
at which the filesystem code generates a warning message comparable to 
that from the swap pager.

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini at geminix.org  |  http://www.escapebox.net