smbfs crashes since approx. 10.1-RELEASE

Wed Oct 7 00:09:13 UTC 2015

On Monday, October 05, 2015 06:16:54 PM Rick Macklem wrote:
> Christian Kratzer wrote:
> > Hi,
> > 
> > I run a regular rsync job that runs from cron and copies stuff that gets
> > created on a Windows smbfs share.
> > 
> > Starting about 10.1-RELEASE the VM has become unstable and started panicing.
> > 
> > I have narrowed the issue down to the aforementioned rsync job.
> > 
> > When I move the job to a different VM the the other VM starts crashing and
> > the VM without the job becomes stable agin.
> > 
> > I have panics and crashinfos stored in /var/crash if anybody is interested:
> > 
> >      root at noc2:/var/crash # uname -a
> >      FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed
> >      Aug 12 15:26:37 UTC 2015
> >      root at releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
> >      root at noc2:/var/crash # freebsd-version -u
> >      10.2-RELEASE-p5
> >      root at noc2:/var/crash # freebsd-version -k
> >      10.2-RELEASE
> >      root at noc2:/var/crash #
> > 
> > This is what I have in /var/crash/core.txt.0
> > 
> >      Fatal trap 12: page fault while in kernel mode
> >      cpuid = 0; apic id = 00
> >      fault virtual address   = 0x20
> >      fault code              = supervisor read data, page not present
> >      instruction pointer     = 0x20:0xffffffff80996c7c
> >      stack pointer           = 0x28:0xfffffe003d6c0ac0
> >      frame pointer           = 0x28:0xfffffe003d6c0af0
> >      code segment            = base 0x0, limit 0xfffff, type 0x1b
> >  			    = DPL 0, pres 1, long 1, def32 0, gran 1
> >      processor eflags        = resume, IOPL = 0
> >      current process         = 1349 (smbiod10)
> >      trap number             = 12
> >      panic: page fault
> >      cpuid = 0
> >      KDB: stack backtrace:
> >      #0 0xffffffff80984e30 at kdb_backtrace+0x60
> >      #1 0xffffffff809489e6 at vpanic+0x126
> >      #2 0xffffffff809488b3 at panic+0x43
> >      #3 0xffffffff80d4aadb at trap_fatal+0x36b
> >      #4 0xffffffff80d4addd at trap_pfault+0x2ed
> >      #5 0xffffffff80d4a47a at trap+0x47a
> >      #6 0xffffffff80d307f2 at calltrap+0x8
> >      #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60
> >      #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69
> >      #9 0xffffffff81a1b724 at smb_iod_thread+0xb4
> >      #10 0xffffffff8091244a at fork_exit+0x9a
> >      #11 0xffffffff80d30d2e at fork_trampoline+0xe
> >      Uptime: 2h43m55s
> >      Dumping 103 out of 999 MB: (CTRL-C to abort)
> >      ..16%..31%..47%..62%..78%..93%
> > 
> This crash is occurring when doing an mtx_unlock(&Giant). Unfortunately, I'm not
> conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight.
> If you don't get any responses, I'd suggest reposting to freebsd-current@ with
> "crashes in mtx_unlock(&Giant)" in the subject line.
> 
> Btw John, the code does tsleep() in a loop before the mtx_unlock(&Giant). I do
> remember that was once allowed, but am not sure if it still is (ie a tsleep() call
> while holding Giant)?
> 
> Hopefully someone who knows what is special about Giant that might cause this will
> respond.
> 
> Good luck with it, rick

tsleep() with Giant is still allowed.  However, this sort of panic usually means
you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get
an assertion failure earlier).

I don't see anything obviously wrong in smb_iod_thread() however.

If you have the crashdump, can you please run this in kgdb:

frame 9
p (struct mtx *)c
p *(struct mtx *)c

-- 
John Baldwin