dump hangs on 7.1
Mel Flynn
mel.flynn+fbsd.questions at mailing.thruhere.net
Sun Jul 12 23:56:19 UTC 2009
On Sunday 12 July 2009 13:20:49 Len Conrad wrote:
> At 04:04 PM 7/12/2009, you wrote:
> >On Sunday 12 July 2009 11:03:00 Len Conrad wrote:
> >> >On Friday 10 July 2009 08:29:01 Len Conrad wrote:
> >> >> FreeBSD 7.1-RELEASE #0: Thu Jan 1 14:37:25 UTC 2009
> >> >> root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386
> >> >>
> >> >> CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2496.26-MHz
> >> >> 686-class CPU) Origin = "GenuineIntel" Id = 0x1067a Stepping = 10
> >> >> AMD Features=0x20100000<NX,LM>
> >> >> AMD Features2=0x1<LAHF>
> >> >> Cores per package: 4
> >> >> real memory = 3484745728 (3323 MB)
> >> >> avail memory = 3405537280 (3247 MB)
> >> >> ACPI APIC Table: <DELL PE_SC3 >
> >> >> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> >> >> cpu0 (BSP): APIC ID: 0
> >> >> cpu1 (AP): APIC ID: 1
> >> >> cpu2 (AP): APIC ID: 2
> >> >> cpu3 (AP): APIC ID: 3
> >> >>
> >> >>
> >> >> /sbin/dump -0uanL -f - / | ssh dump_images at xxx.net dd
> >> >> of=/var/ftp/dump_images/mx1-root-test
> >> >>
> >> >> dump has completed only once. Several other dumps have all gotten
> >> >> under way, target file is created and increases until the hang.
> >> >>
> >> >> CTRL-C gets back to shell,eg:
> >> >>
> >> >> DUMP: Date of this level 0 dump: Fri Jul 10 10:25:33 2009
> >> >> DUMP: Date of last level 0 dump: the epoch
> >> >> DUMP: Dumping snapshot of /dev/da0s1d (/usr) to standard output
> >> >> DUMP: mapping (Pass I) [regular files]
> >> >> DUMP: mapping (Pass II) [directories]
> >> >> DUMP: estimated 1713942 tape blocks.
> >> >> DUMP: dumping (Pass III) [directories]
> >> >> DUMP: dumping (Pass IV) [regular files]
> >> >> ^C DUMP: Interrupt received.
> >> >> DUMP: Do you want to abort dump?: ("yes" or "no") Killed by signal
> >> >> 2. DUMP: Broken pipe
> >> >> DUMP: The ENTIRE dump is aborted.
> >> >>
> >> >> Hangs always in Pass IV
> >> >
> >> >What's the output ps -auwwx|grep dump at the time of the dump.
> >>
> >> when the dump hangs:
> >>
> >> ps auxww | grep dump
> >>
> >> root 61360 0.0 0.0 3128 1168 p0 I+ 1:47PM 0:00.06
> >> /sbin/dump -0uanL -f - / (dump)
> >>
> >> root 61361 0.0 0.1 5560 2768 p0 I+ 1:47PM 0:03.65 ssh
> >> xxx at xxx.net dd of=/var/ftp/dump_images/mx1-root-test
> >>
> >> root 61364 0.0 0.0 3128 1528 p0 I+ 1:47PM 0:00.36 dump:
> >> /dev/da0s1a: pass 4: 92.66% done, finished in 0:00 at Sun Jul 12
> >> 13:47:52 2009 (dump)
> >
> >procstat -k 61364 please?
>
> I ran it again, diff pid:
>
> procstat -k 67765
> PID TID COMM TDNAME KSTACK
> 67765 100159 dump - mi_switch sleepq_switch
> sleepq_catch_signals sleepq_wait_sig _sleep sbwait soreceive_generic
> soreceive soo_read dofileread kern_readv read syscall Xint0x80_syscall
It looks like it's waiting ssh/dd to report. Is the same happening when you
dump to a local file (on a different partition obviously)? This would rule out
inter process communications within dump itself.
FYI, I'm using this daily through periodic with a few 7.1-STABLE machines and
-current. Although, I do compress (with gzip and bzip2 on faster CPU's) before
transfer. The only difference is that I don't use then -n flag to dump. Worth
a try, though I doubt the so_receive it's waiting on is because it's unable to
notify a human in the operator group.
If you're comfortable doing so, you could grab a 7.2-RELEASE livefs CD to see
if this issue persists using the dump tools from there, though I don't know of
any particular fixes in this area.
> >Is the percentage always the same for the same disk?
>
> no, it varies widely.
>
> >If you kill dd on the other side, does dump notice it?
>
> yes, I kill dd on the target, and the dump shows:
>
> DUMP: dumping (Pass IV) [regular files]
> Terminated
> DUMP: Broken pipe
> DUMP: The ENTIRE dump is aborted.
--
Mel
More information about the freebsd-questions
mailing list