vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

perikillo perikillo at gmail.com
Tue Oct 3 13:53:25 PDT 2006


  Hi people i have read a some mails about this problem, it looks like all
was running some 5.X branch, i have been using FreeBSD 6.1 some months
ago,  yesterday i make the buildworld process, right now i have my box with
FreeBSD6.1-p10.

  This box runs bacula server with this NIC:

vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe400-0xe4ff mem
0xee022000-0xee0220ff at device 18.0 on pci0
vr0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xe400
miibus0: <MII bus> on vr0
vr0: bpf attached
vr0: Ethernet address: 00:01:6c:2c:09:90
vr0: [MPSAFE]

  This NIC is integrated with the motherboard, i used this box with freebsd
5.4-pX almost 1 year running bacula 1.38.5 without a problem.

  1 full backup take almost 140Gb of data.

Last week i lost 1 job Full Backup from one of my biggest servers running
RH9 aprox 80Gb off data, bacula just backup 35Gb and mark the job ->Error

26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: Network
error with FD during Backup: ERR=Operation timed out
26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: No Job
status returned from FD.
26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Error: Bacula
1.38.11(28Jun06): 26-Sep-2006 00:28:48

FD termination status:  Error
SD termination status:  Error
Termination:            *** Backup Error ***

  I have no problem with the client, is running our ERP software and no
comment here.

In my freebsd console appear this:

vr0: watchdog timeout

  I reset the server, and all the Differential backups has been working
good, i do the buildworld yesterday and let my bacula server ready to do a
full backup for all my clients and whops...

I lost 2 clients jobs:

Client 1:

02-Oct 18:30 bacula-dir: Start Backup JobId 176, Job=PDC.2006-10-02_18.30.00
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: Network error
with FD during Backup: ERR=Operation timed out
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: No Job status
returned from FD.
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Error: Bacula
1.38.11(28Jun06): 02-Oct-2006 20:40:11
  JobId:                  176
  Job:                    PDC.2006-10-02_18.30.00
  Backup Level:           Full
  Client:                   "PDC" Windows NT 4.0,MVS,NT 4.0.1381
  FileSet:                "PDC-FS" 2006-08-21 18:04:12
  Pool:                   "FullTape"
  Storage:                "LTO-1"
  Scheduled time:         02-Oct-2006 18:30:00
  Start time:             02-Oct-2006 18:30:06
  End time:               02-Oct-2006 20:40:11
  Elapsed time:           2 hours 10 mins 5 secs
  Priority:               11
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Volume name(s):         FullTape-0004
  Volume Session Id:      2
  Volume Session Time:    1159832414
  Last Volume Bytes:      38,857,830,949 (38.85 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

Client 2

02-Oct 21:30 bacula-dir: Start Backup JobId 178, Job=
MBXBDCB.2006-10-02_21.30.00
02-Oct 21:31 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:37 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:44 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:51 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:58 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 22:04 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Fatal error: bnet.c:859
Unable to connect to File daemon on 192.168.2.9:9102. ERR=Host is down
02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Error: Bacula
1.38.11(28Jun06): 02-Oct-2006 22:10:03
  JobId:                  178
  Job:                    MBXBDCB.2006-10-02_21.30.00
  Backup Level:           Full
  Client:                 "MBXBDCB" i686-pc-linux-gnu,redhat,9
  FileSet:                "MBXBDCB-FS" 2006-08-21 23:00:02
  Pool:                   "FullTape"
  Storage:                "LTO-1"
  Scheduled time:         02-Oct-2006 21:30:00
  Start time:             02-Oct-2006 21:30:02
  End time:               02-Oct-2006 22:10:03
  Elapsed time:           40 mins 1 sec
  Priority:               13
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Volume name(s):
  Volume Session Id:      4
  Volume Session Time:    1159832414
  Last Volume Bytes:      38,857,830,949 (38.85 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:
  SD termination status:  Waiting on FD
  Termination:            *** Backup Error ***

My console again:

vr0: watchdog timeout

But my catalog backup was made with success.

03-Oct 03:00 bacula-dir: Start Backup JobId 179, Job=
BackupCatalog.2006-10-03_03.00.00
03-Oct 03:03 bacula-dir: Bacula 1.38.11 (28Jun06): 03-Oct-2006 03:03:00
  JobId:                  179
  Job:                    BackupCatalog.2006-10-03_03.00.00
  Backup Level:           Full
  Client:                 "BACULA" i386-portbld-freebsd6.1,freebsd,
6.1-RELEASE-p3
  FileSet:                "CATALOG-FS" 2006-08-22 05:00:02
  Pool:                   "FullTape"
  Storage:                "LTO-1"
  Scheduled time:         03-Oct-2006 03:00:00
  Start time:             03-Oct-2006 03:00:50
  End time:               03-Oct-2006 03:03:00
  Elapsed time:           2 mins 10 secs
  Priority:               14
  FD Files Written:       7,646
  SD Files Written:       7,646
  FD Bytes Written:       360,432,688 (360.4 MB)
  SD Bytes Written:       361,320,457 (361.3 MB)
  Rate:                   2772.6 KB/s
  Software Compression:   None
  Volume name(s):         FullTape-0004
  Volume Session Id:      5
  Volume Session Time:    1159832414
  Last Volume Bytes:      39,219,629,264 (39.21 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

I wasnt on that office,  i  note this during the morning because went i was
trying to access that server from the other building with putty, i couldn't
connect at first, them my main say "it's happend again :-("... i call to my
friend there to un-plug and plug the cable and just with that i was able to
connect to that server.

   It looks like this NIC is having problems with the workload hi, i have 2
things here that i can do:

1; Change the cable and try again.
2; Change the NIC and try again.

   What else can i do..?

   But i really hope someone fix this problem, thanks all for your time.

Part of my dmesg output:

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 6.1-RELEASE-p10 #5: Mon Oct  2 13:26:52 PDT 2006
    root at bacula.MBX.local:/usr/obj/usr/src/sys/BACULA
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a14000.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a14188.
Table 'FACP' at 0x1bff3040
Table 'APIC' at 0x1bff7dc0
MADT: Found table at 0x1bff7dc0
MP Configuration Table version 1.1 found at 0xc00f1400
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
ACPI APIC Table: <KM400  AWRDACPI>
Calibrating clock(s) ... i8254 clock: 1193181 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter "i8254" frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 1600072446 Hz
CPU: AMD Duron(tm) processor (1600.07-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x681  Stepping = 1

Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
  AMD Features=0xc0400800<SYSCALL,MMX+,3DNow+,3DNow>
Data TLB: 32 entries, fully associative
Instruction TLB: 16 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way
associative
L2 internal cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative
real memory  = 469696512 (447 MB)
Physical memory chunk(s):
0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages)
0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages)
0x0000000000c25000 - 0x000000001b7d7fff, 448475136 bytes (109491 pages)
avail memory = 450490368 (429 MB)
bios32: Found BIOS32 Service Directory header at 0xc00fac70
bios32: Entry = 0xfb0f0 (c00fb0f0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf0000+0xb160
pnpbios: Found PnP BIOS data at 0xc00fbc20
pnpbios: Entry = f0000:bc50  Rev = 1.0
Other BIOS signatures found:
APIC: CPU 0 has ACPI ID 0
MADT: Found IO APIC ID 2, Interrupt 0 at 0xfec00000
ioapic0: Routing external 8259A's -> intpin 0

Greetings.


More information about the freebsd-questions mailing list