gjournal questions

Kevin Kramer kramer at centtech.com
Thu Aug 31 19:01:36 UTC 2006


Pavel,

running 6.1-stable with these patches
rebuilt kernel/world as of 8/28 @ 2p CST w/ these patches

gjournal6_20060808.patch
vfs_subr.c.3.patch

the backend RAID presents 4 luns, this is how we config'd it.
da1 - 8G
da2 - ~897G
da3 - 8G
da4 - ~897G

da2/4 have been partitioned in FreeBSD, then we did the following

gjournal label -v /dev/da2 /dev/da1
gjournal label -v /dev/da4 /dev/da3
newfs -U -L "scr09" /dev/da2.journal
newfs -U -L "scr10" /dev/da4.journal

so  1 -8 G journal for each data device.

now that the server is under load i'm seeing NFS not responding messages 
on my clients. the message corresponds to the gjournal suspend/copy 
operation, causing my clients to hang or give "no such file or directory".

we copied 137G to /scr10 and it just finished, could this be some 
remains of writes from the journal?

here is the time correlation

Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002798s
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
14.030198s
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:55:33 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000013s
Aug 31 13:55:44 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000013s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.000010s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.000009s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.000007s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002302s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr10: 
0.029769s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr10: 
0.035259s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr10: 
10.109732s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002756s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
10.182759s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:14 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000012s
Aug 31 13:56:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000011s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.000010s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.000009s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.000007s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002364s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.

from syslog server

Aug 31 13:55:23 <user.notice> bowltest4 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:23 <user.notice> bowltest4 kernel: nfs: server donkey OK
Aug 31 13:55:23 <user.notice> laybox32 kernel: nfs: server donkey OK
Aug 31 13:55:29 <user.notice> b-115-4 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:29 <user.notice> b-115-4 kernel: nfs: server donkey OK
Aug 31 13:55:56 <user.notice> b-116-16 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:56 <user.notice> b-204-40 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> b-116-16 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> lic2 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> b-204-40 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> lic2 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> laybox29 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> laybox26 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 <user.notice> laybox19 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 <user.notice> laybox37 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:00 <user.notice> laybox19 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox26 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox37 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox29 kernel: nfs: server donkey OK
Aug 31 13:56:05 <daemon.info> ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state not responding
Aug 31 13:56:05 <daemon.info> ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state ok
Aug 31 13:56:36 <user.notice> b-116-17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:36 <user.notice> b-116-17 kernel: nfs: server donkey OK
Aug 31 13:56:40 <user.notice> b-210-17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:41 <user.notice> b-204-41 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:41 <user.notice> laybox17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:44 <user.notice> b-204-38 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:44 <user.notice> b-204-38 kernel: nfs: server donkey OK
Aug 31 13:56:44 <user.notice> bowltest3 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:46 <user.notice> b-210-17 kernel: nfs: server donkey OK
Aug 31 13:56:46 <user.notice> laybox17 kernel: nfs: server donkey OK

are the journal devices not large enough? is there a formula for sizing? 
sorry this is long. can i umount the data device, remove journaling and 
mount as a regular device? what are those steps? thanks and sorry for 
the long-winded posting..


------------------------------

Kevin Kramer
Sr. Systems Administrator
512.418.5725
Centaur Technology, Inc.
www.centtech.com



More information about the freebsd-stable mailing list