dumps freeze when invoked by amanda's 'sendbackup'

John Fox readbsd at mind.net
Thu Feb 26 11:13:16 PST 2004


Hello,

I've found some strange behavior on one of our FreeBSD 2.2.2 machines,
and am hoping that someone here may be able to shed some light on it
for us.

As mentioned, the machine is running FreeBSD 2.2.2 and AMANDA 2.4.0.

A while back, we noticed that dumps of the machines '/usr' partition
seemed to freeze as soon as they were started; the 'dump' processes
appeared in 'ps' output, but they stayed there all day long (our backups
typically finish by around 10:00 AM at the latest).  At the same time,
dmesg showed us "hard errors" reading from /dev/sd0s1g (the partion
holding "/usr"). We eventually killed the dumps.

As a test, we then tried a manual dump.  I was able to successfully
dump that same file system over the network to a drive on another
machine.  

It seemed wierd that a manual dump went fine, but that the dumps
spawned by 'sendbackup' did not go at all.  However, as I mentioned,
we'd seen that drive error, so we removed all of that machine's sd0
partitions from amanda's disklist (system has two other drives, and
backups went fine for them)  until we could get the drive replaced,
which we did last night.

Replacement of drive went just fine, and once we'd verified everything
was up and running properly, I edited the disklist file and re-enabled
backups of the machine's '/usr'.

I got in this morning and found that again, the dumps are frozen.  

'dmesg' shows nothing except the boot-up messages.

I'm rather frustrated at this point, trying to understand how this could
be happening.

I have checked permissions on the device files, and 'bin', the user
amanda runs as on that machine, is a member of the operator group,
which has read access to the device files, so it doesn't seem a permissions
problem.

I doubt it's a drive problem, because the same behavior on two different
drives by two different manufacturers?  I suppose it's possible, but it
seems unlikely.

I should mention as well that all drives on system are SCSI 2 50 pin format,
under the control of an Adaptec 2940 controller card.

So if it's not permissions, and not the drive (although I realize that I 
haven't really ruled either of these entirely out) then might it be the 
controller?  But if it's the controller, why no problems with other drives
in system?

Any thoughts would be most welcome.

-John
--
+---------------------------------------------------------------------------+
| John Fox <jjf @ mind.net>   |    System Administrator   | InfoStructure   |
+---------------------------------------------------------------------------+
| I used to trust the media to tell me the truth, tell us the truth         |
| But now I've seen the payoffs everywhere I look                           |
| Who can you trust when everyone's a crook?                                |
|             -- Queensryche, "Revolution Calling"                          |
+---------------------------------------------------------------------------+


More information about the freebsd-questions mailing list