g_vfs_done error third part--PLEASE HELP!

Fri May 16 12:43:23 UTC 2008

Willy Offermans wrote:
> Hello Roland and FreeBSD friends,
> 
> I'm sorry to be so quite for a while, but I went away for a vacation.
> But now I'm back, I like to solve this issue.
> 
> 
> On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:
>> On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:
>>> Dear FreeBSD friends,
>>>
>>> It is already the third time that I report this error. Can someone help
>>> me in solving this issue?
>> Probably the reason that you hear so little is that you provide so
>> little information. Most of us are not clairvoyant.
>>  
>>> Over and over again and always after heavy disk I/O I see the following
>>> errors in the log files. If I force ar0s1g to unmount the machine
>>> spontaneously reboots. Nothing seriously seems to be damaged by this
>>> act, but anyway I cannot afford something bad happening to this
>>> production machine.
>> Why would you force an unmount?
> 
> Otherwise the device keeps on reporting to be unavailable and cannot be
> unmounted:
> 
> sun# umount /share/
> umount: unmount of /share failed: Resource temporarily unavailable
> 
>>> Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
>>>
>>> I have no clue what the errors mean, since offsets of 290725068800,
>>> 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
>>> have a clue what is going on?
>> For starters, how big is ar0s1g? If the offset is in bytes, it is around
>> 270 GB, which is not that unusual in this day and age.
> 
> I have to admit that I was a bit confused by an offset value of 
> 290725068800. There is no indication of a unit, so I assumed that it
> was sector but probably it is simply bytes and then indeed the number
> does make sense.
>>> I'm using FreeBSD 7.0, but found the error being reported before with
>>> previous versions of FreeBSD. I can and will provide more details on
>>> demand.
>> What does 'df' say?
> 
> Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
> /dev/ar0s1a  20308398   230438  18453290     1%    /
> devfs               1        1         0   100%    /dev
> /dev/ar0s1d  21321454  3814482  15801256    19%    /usr
> /dev/ar0s1e  50777034  5331686  41383186    11%    /var
> /dev/ar0s1f 101554150 18813760  74616058    20%    /home
> /dev/ar0s1g 274977824 34564876 218414724    14%    /share
> 
> pretty normal I would say.
> 
>> Did you notice any file corruption in the filesystem on ar0s1g?
> 
> No the two disks are brand new and I did not encounter any noticeable
> file corruption. However I assume that nowadays bad sectors on HD are
> handled by the hardware and do not need any user interaction to correct
> this. But maybe I'm totally wrong.
> 
>> Unmount the filesystem and run fsck(8) on it. Does it report any errors?
> 
> sun# fsck /dev/ar0s1g 
> ** /dev/ar0s1g
> ** Last Mounted on /share
> ** Phase 1 - Check Blocks and Sizes
> INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
> CORRECT? [yn] y
> 
> INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
> CORRECT? [yn] y
> 
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> FREE BLK COUNT(S) WRONG IN SUPERBLK
> SALVAGE? [yn] y
> 
> SUMMARY INFORMATION BAD
> SALVAGE? [yn] y
> 
> BLK(S) MISSING IN BIT MAPS
> SALVAGE? [yn] y
> 
> 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
> blocks, 0.0% fragmentation)
> 
> ***** FILE SYSTEM MARKED CLEAN *****
> 
> ***** FILE SYSTEM WAS MODIFIED *****
> 
> The usual stuff I would say.

No, any form of filesystem corruption is not usual.

> 
>>> Any hints are very much appreciated.
>> Did you manage to create a partition larger than the disk is (using
>> newfs's -s switch)? In that case it could be that you're trying to write
>> past the end of the device.
> 
> No, look to the following output:
> 
> sun# bsdlabel -A /dev/ar0s1
> # /dev/ar0s1:
> type: unknown
> disk: amnesiac
> label: 
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 255
> sectors/cylinder: 16065
> cylinders: 60799
> sectors/unit: 976751937
> rpm: 3600
> interleave: 1
> trackskew: 0
> cylinderskew: 0
> headswitch: 0           # milliseconds
> track-to-track seek: 0  # milliseconds
> drivedata: 0 
> 
> 8 partitions:
> #        size   offset    fstype   [fsize bsize bps/cpg]
>   a: 41943040        0    4.2BSD        0     0     0 
>   b:  8388608 41943040      swap                    
>   c: 976751937        0    unused        0     0         # "raw"
> part, don't edit
>   d: 44040192 50331648    4.2BSD     2048 16384 28552 
>   e: 104857600 94371840    4.2BSD     2048 16384 28552 
>   f: 209715200 199229440    4.2BSD     2048 16384 28552 
>   g: 567807297 408944640    4.2BSD     2048 16384 28552 
> 
> /dev/ar0s1g starts after 408944640*512/1024/1024=199680MB
> 
> 
> So I have to conclude that the write error message does make sense and
> that something seems to be wrong with the disks. The next question is
> what can I do about it? Should I return the disks to the shop and ask
> for new ones?

#define EIO             5               /* Input/output error */

At least one of your disks is toast.

Kris