Reclaiming "dirty buffers" after seeing "fsync: giving up on dirty..." / Unplugging USB while copy in progress
    Konstantin Belousov 
    kib at freebsd.org
       
    Fri Aug 16 19:00:00 UTC 2019
    
    
  
On Fri, Aug 16, 2019 at 10:16:05AM -0700, Shrikanth Kamath wrote:
> How do “lingering” dirty buffers get reclaimed? In the function
> vop_stdfsync there is logic to retry but eventually fail after “maxretry”
> and print “fsync: giving up on dirty (error “ while returning the error. In
> a scenario where a USB stick is plugged in and a large file (> 1.5G) is
> being copied to it from the host filesystem when the USB device is abruptly
> removed. I see the fsync function retrying for a number of times before
> returning with the below error
> 
> fsync: giving up on dirty 0xfffff8058091d1d8: tag devfs, type VCHR
> 
>     usecount 1, writecount 0, refcount 1070 mountedhere 0xfffff805808af800
> 
>     flags (VI_DOOMED|VI_ACTIVE)
> 
>     v_object 0xfffff807a6efe948 ref 0 pages 1069 cleanbuf 893 dirtybuf 174
> 
>     lock type devfs: EXCL by thread 0xfffff8009aebb560 (pid 6463, chassisd,
> tid 100270)
> 
> What is eventually happening is there are other processes that start
> appearing to be stuck waiting in “flswai” state (including the copy
> operation to the USB stick).
> 
> # ps jaux -o mwchan -o command | grep flswai
>  6423         1    6423  6423    0 Ds    -     0:02.35 /usr/sbin/eventd
> 0.0  0.0  744768   12916 06:22   flswai   /usr/sbin/eventd -r -s -A
>  6463   6428    6427  6427    0 D     -     8:25.69 /usr/sbin/chassi   0.0
>  0.1  862940   56472 06:22   flswai   /usr/sbin/chassisd -N
> 19753 19195 19753  6453    1 D+   u0     0:01.08 cp junos-vmhost-   0.0
>  0.0    8164    2968 12:13   flswai   cp
> junos-vmhost-install-mx-x86-64-19.3I-14062-TB-130172-_cd-builder.tgz /mnt/
> 
> Looking at the code, this seems to be coming from the “bwillwrite” function
> (sys/kern/vfs_bio.c) where it explains it will block prior to “…locking of
> any vnodes we attempt to avoid the situation where a locked vnode prevents
> the various system daemons from flushing related buffers…” How does the
> dirty buffers in this scenario get reclaimed?
> 
> The dmesg log is from a Juniper device running stable/11 (closer to
> 11.1ish) based Junos.
> 
> Jul 23 12:06:31.740  da0 at umass-sim0 bus 0 scbus3 target 0 lun 0
> 
> Jul 23 12:06:31.740  da0: <USBFlash USBFlashDrive 0100> s/n
> AA04012700046751 detached
> Jul 23 12:06:31.740  g_vfs_done():da0p1[WRITE(offset=272711680,
> length=65536)]error = 6
> ...
> 
> Jul 23 12:06:31.943   g_vfs_done():da0p1[WRITE(offset=277626880,
> length=65536)]error = 6
> ...
> 
> Jul 23 12:06:31.992   g_vfs_done():da0p1[WRITE(offset=281624576,
> length=65536)]error = 6
> ...
> 
> Jul 23 12:06:32.144   g_vfs_done():da0p1[WRITE(offset=285687808,
> length=65536)]error = 6
> Jul 23 12:06:32.144  (da0:umass-sim0:0:0:0): Periph destroyed
> 
> Jul 23 12:06:32.144  umass0: detached
> 
> Jul 23 12:06:36.672  fsync: giving up on dirty 0xfffff8058091d1d8: tag
> devfs, type VCHR
> Jul 23 12:06:36.672      usecount 1, writecount 0, refcount 1070
> mountedhere 0xfffff805808af800
> Jul 23 12:06:36.672      flags (VI_DOOMED|VI_ACTIVE)
> 
> Jul 23 12:06:36.672      v_object 0xfffff807a6efe948 ref 0 pages 1069
> cleanbuf 893 dirtybuf 174
What I describe below is relevant for HEAD, and might be absent in 11.
After the io finished with whatever results, brelse(9) is called by
some means.  There, if io finished with an error, and the error is
ENXIO, which is believed to mean that the device went away, the buffer
is marked as B_INVAL and truncated.  Then the normal flow in brelse()
causes the buffer return to the freelist.
A large unsolved issue is that if the buffer was used by UFS with
softupdates and there are unfinished dependencies hanging from the
buffer, system checks that and panics.  You should not use SU on
USB stick anyway.
    
    
More information about the freebsd-hackers
mailing list