Problems with gjournal or something else.
Artem Kuchin
matrix at itlegion.ru
Mon Oct 29 18:42:54 PDT 2007
I am experiencing a very weird problem with filesystem
and it seems to be related to gjournal.
It is FreeBSD 7-BETA1
RAID controller: 3WARE 7500x
device driver: twe
SMP enabled (Pentium D)
Mirror raid.
I have created the following partitions:
twed1s1a <none> 1100MB *
twed1s1b swap 1024MB SWAP
twed1s1d <none> 5120MB *
twed1s1e <none> 30720MB *
twed1s1f <none> 261GB *
did reboot just is case something is cached.
Then did:
newfs -J -b 8192 -f 1024 -g 50000 -h 20 -i 40960 /dev/twed1s1f
gjournal load
gjournal label -f /dev/twed1s1f
tunefs -J enable -n disable /dev/twed1s1f
mount -o noatime /dev/twed1s1f.journal /NEW/suit
osiris# tunefs -p /dev/twed1s1f
tunefs: ACLs: (-a) disabled
tunefs: MAC multilabel: (-l) disabled
tunefs: soft updates: (-n) disabled
tunefs: gjournal: (-J) enabled
tunefs: maximum blocks per file in a cylinder group: (-e) 1024
tunefs: average file size: (-f) 50000
tunefs: average number of files in a directory: (-s) 20
tunefs: minimum percentage of free space: (-m) 8%
tunefs: optimization preference: (-o) time
tunefs: volume label: (-L)
# newfs command for /dev/twed1s1f (/dev/twed1s1f)
newfs -O 2 -a 16 -b 8192 -d 8192 -e 1024 -f 1024 -g 50000 -h 20 -m 8 -o time -s 273771329 /dev/twed1s1f
Then i started a huge and long copying process from the old raid 5 array (about 200GB of data).
Some time later i have found machine practically frozen becauase log file is filling
with error:
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275085824, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278362624, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279272857600, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278493696, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275216896, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278624768, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279272988672, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275347968, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278755840, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273119744, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279278886912, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275479040, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279017984, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273250816, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279149056, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275610112, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279280128, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279273381888, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279279411200, length=131072)]error = 5
Oct 28 22:18:42 osiris kernel: g_vfs_done():twed1s1f.journal[WRITE(offset=279275741184, length=131072)]error = 5
Since it is a EIO i have started verify on the contoller - everything is ok.
Did
cat /dev/random > /NEW/suit/aaa.dat
filling the whole fs with a hunge file. - ok
did
dd if=/dev/twed1s1f of=/dev/null bs=1M - ok
The i re-newfs-ed this fs w/o -J, unloaded gjournal and did the same copying - it took several hours
and went just fine.
So, it is not a hardware problem and it seems to be related to gjournal.
One more weird thing happened here. gjournal complained hat BIO_FLUSH is not supported by the driver.
However, AFAIK twe is working via scsi subsystem and the authour of gjournal said somewhere that he
has had implemeneted BIO_FLISH for scsi and he specifically mentioned that he has tested twe and twa
and they both support BIO_FLUSH.
Alo, I think offset value in the error message is out of range of this filesystem.
The controller has a cache of 64MB on board and the author of gjournal said in some
discussion that if BIO_FLUSH support is missing and controller chache is larger than
gjournal's cache then there might be problems. I did not find any specific value for
the gjournal cache. So, the problem maybe related to this issue (something gets messed up).
but i am not sure.
Any idea anyone?
--
Regards,
Artem
More information about the freebsd-current
mailing list