Random truncated files on USB hard disk with timeouts; how to debug?

Arrigo Marchiori ardovm at yahoo.it
Wed Oct 19 13:12:52 UTC 2016


Hello Poul-Henning,

On Wed, Oct 19, 2016 at 11:26:22AM +0000, Poul-Henning Kamp wrote:

> --------
> In message <20161019080005.GD93031 at nuvolo>, Arrigo Marchiori writes:
> 
> >> If the drive has bad power supply, that may not happen.
> >
> >Yes, I understand. But, forgive me for insisting: there is an
> >inconsistency that is _at filesystem level_ and _temporary_, and this
> >really puzzles me.
> 
> Because the drive returns wrong data every so often and when
> power is better returns correct data ?
> 
> End-to-End arguments in system design applies here:
> 
> Either you trust your drive, or you check everything it tells you
> (ie: RAID with parity, ZFS or similar).

Ok, but I cannot understand why read() returns plain zero bytes. If
``bad'' data was received from a USB read operation, it should just
not make sense to the kernel. Not just show up as an empty file?...

Wile fiddling with a funny file, I found that read(2) and mmap(2)
behave differently. While cat(1) shows an empty file, cp(1) was able
to read its contents. The file was in fact
/usr/src/usr.bin/clang/clang/clang.1, the source of the clang(1)
manual page. On the other hand, mv(1) does not alter the
``readability'' of the file.

# mv clang.1 a
# truss cat a
[snip]
openat(AT_FDCWD,"a",O_RDONLY,00)                 = 3 (0x3)
fstat(1,{ mode=crw--w---- ,inode=146,size=0,blksize=4096 }) = 0 (0x0)
__sysctl(0x7fffffffe5e0,0x2,0x7fffffffe5c4,0x7fffffffe5c8,0x0,0x0) = 0 (0x0)
__sysctl(0x7fffffffe5e0,0x2,0x7fffffffe5c4,0x7fffffffe5c8,0x0,0x0) = 0 (0x0)
mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34374418432 (0x800e00000)
read(3,0x800e16000,4096)                         = 0 (0x0)
close(3)                                         = 0 (0x0)
[snip]


# Truss cp a b
[snip]
stat("b",0x7fffffffe9d8)                         ERR#2 'No such file or directory'
lstat("a",{ mode=-rw-r--r-- ,inode=6510202,size=16993,blksize=32768 }) = 0 (0x0)
umask(0x1ff)                                     = 18 (0x12)
umask(0x12)                                      = 511 (0x1ff)
mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34374418432 (0x800e00000)
fstatat(AT_FDCWD,"a",{ mode=-rw-r--r-- ,inode=6510202,size=16993,blksize=32768 },0x0) = 0 (0x0)
stat("b",0x7fffffffea50)                         ERR#2 'No such file or directory'
openat(AT_FDCWD,"a",O_RDONLY,00)                 = 3 (0x3)
openat(AT_FDCWD,"b",O_WRONLY|O_CREAT|O_TRUNC,0100644) = 4 (0x4)
mmap(0x0,16993,PROT_READ,MAP_SHARED,3,0x0)       = 34366304256 (0x800643000)
write(4,".\\" $FreeBSD: stable/11/usr.bin"...,16993) = 16993 (0x4261)
munmap(0x800643000,16993)                        = 0 (0x0)
close(4)                                         = 0 (0x0)
close(3)                                         = 0 (0x0)
[snip]


Please also consider that these commands are repeatable (on the same
file): cat always sees the file empty, cp always succeedes.

# cp a c
# cat a
# cat c
[data]

I think this also tracks down the problem to read operations: the file
was successfully installed with yesterday's buildworld. Only today, at
this time, it started to behave ``funny''.

Best regards,
-- 
rigo

http://rigo.altervista.org


More information about the freebsd-fs mailing list