swap_pager complaints but not using swap

Sun Jan 25 04:44:22 PST 2009

     On Sat, 24 Jan 2009 14:48:39 +0000 Dieter <freebsd at sopwith.solgatos.com>
wrote:
>> >> AMD64  FreeBSD 7.0  2 GiB main memory
>> >>
>> >> My console says:
>> >>
>> >> login: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 22, 
>> >> size: 4096
>> >> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 22, size: 4096
>> >> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 22, size: 4096
>> >> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 22, size: 4096
>> >>
>> >> pstat -sk
>> >> Device          1K-blocks     Used    Avail Capacity
>> >> /dev/ad6s10       4590208       96  4590112     0%
>> >>
>> >> Wow, using a whole 96K of swap.  I don't see any disk related
>> >> complaints in dmesg.
>> >>
>> >> Is this something to worry about?
>> >
>> > Yes, the system was *trying* to do swap I/O and timing out while doing 
>> > so.
>
>> isn't swapspace supposed to be on a 'b' partition?  Are you using swap 
>> on a slice 10?  how is that possible when the i386/amd64 BIOS can't see 
>> more than 4 primary partitions?
>> 
>> Kris, would you mind giving input to this?  How can there be a s10, and 
>> how can you add swapspace to a device that isn't a partition 'b' nor a 
>> file backed swapspace?  Those were the only two ways I thought was 
>> supported for swap.
>> 
>> Dieter, does my questions above sound to be a correct interpretation of 
>> your disk setup?
>
>Traditionally swap used the b partition.  But then traditionally, there

     Even then, however, the swapdev line in the kernel config file made
it possible to change it and to add additional partitions of equal size
on other drives.

>weren't MBR style partitions, called "slices" in FreeBSD-land.
>
>I suspect that the computers Unix grew up on (PDP-7, PDP-11, VAX) had
>to boot from the beginning of the disk, so the a partition went there.
>The Alpha continues in this DEC tradition.  I was about to say that
>swap went next for speed, since the machines back then never had enough
>main memory, but those old disks didn't have variable number of sectors
>on inner vs outer tracks, so the speed would have been the same across
>the platter.  So I'm not sure why swap was next.

     Speed was indeed the reason.  Swapping in UNIX dates further back than
virtual memory support.  Swapping made it possible to have more process
memory allocated than was available in real memory, but it was necessary
to swap out an entire process to swap in a process whose memory was located
at the same addresses as the one forced out.  That didn't necessarily mean
writing all of the first process's memory out, just enough to make room for
the one coming in.  The downside, of course, was that anything *not* written
to the swap area was vulnerable to damage by other processes unless some
sort of storage protection mechanism was available in the hardware a la
System/360 storage protection.  Of course, the same risk applied to any
other processes in memory (i.e., not swapped out already) at the time, too.
     Swap in 3BSD (the first VM UNIX system) and 4BSD was active for another
purpose, too.  Large memory moves/copies on those old machines could actually
be *slower* than writing to a disk and then reading the data back in at the
target memory location. (Yes, speechless horror is the correct reaction
here.:-)
     /usr was typically placed right after the swap area (i.e., partition d).
The idea was that / and /usr would be the most heavily accessed file systems,
so it made sense to have them surrounding the swap area to minimize head
movement distances and delays.  On systems with more than one disk drive
that did a lot of compiling, sorting, editing, or anything else with much
activity in /tmp, it was especially helpful to move /tmp to a separate drive
from / and /usr.  The manual used to give suggested partition configurations
for two- and three-drive systems for optimum speed.
>
>This machine has 2 GiB of main memory and almost never uses the swap
>partition, so I put swap at the slow end of the drive.  Yes I have
>swap on slice 10.  I use NetBSD's fdisk, as it handles more than
>4 slices nicely, unlike FreeBSD's fdisk.  As far as I know, the BIOS

     So NetBSD's fdisk understands logical partitions in an extended
partition?  Cool.  I wish we had it in FreeBSD.  It's really a pain to
have to shut FreeBSD down and boot a standalone program to change the layout
of a disk that has an EP. :-(  At least the FreeBSD kernel has no problem
understanding a disk like that.

>firmware doesn't need to know about swap.  I think the BIOS firmware
>just loads and runs the MBR, which in turn loads and runs the bootstrap
>in the selected slice (or loads and runs the MBR in a different disk if
>you want).
>
>I suppose I could put a BSD disklabel on slice 10 and set it up
>with the whole slice as the b partition.  But as far as I can tell
>FreeBSD is happy with /dev/ad6s10.  As I wrote in my previous

     It should be.

>message I suspect that the pager/swaper is competing for disk i/o.
>I forgot to ask if there is some sysctl or other knob to lengthen
>the timeout.  The real fix is to improve the i/o fairness, but I've
>been asking about this for 2-3 years and not getting anywhere.
>
     BSD UNIX introduced the disksort() routine into its kernel ages ago.
I know it was in 4.2BSD, but it may well have been there long before then.
disksort() was added to satisfy a maximum number of disk I/O requests with
a minimum of head movement and delay.  Basically, it sorts new requests
into queues for each drive such that the arm moves from request to request
in one direction through the disk, and then the next queue started is sorted
into the opposite sequence for the arm to move in the opposite direction.
The result is that the arm moves back and forth from the start to the end
of the disk and then back again, reading and writing as it goes, thus
minimizing the distance traveled for each request handled.  In FreeBSD,
I think there is also some sort of change to the algorithm that tends to
subprioritize or subdivide requests according to the amount of data to be
read/written in each request, but I don't know any of its details.  In
general, disksort() gives pretty good performance.
     I doubt that the current algorithm is the source of your problems, but
if it is, then perhaps moving swap to sit between the two most active file
systems on that drive could help.  You may wish to look carefully at the
disk I/O system in FreeBSD to see whether your idea of "fairness" could be
implemented without running afoul of the existing code structure and also
to get an idea as to whether what you want done would really be likely to
yield any performance improvement.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:       bennett at cs.niu.edu                              *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************