thr3ads.net - freebsd stable - g_vfs_done error third part--PLEASE HELP! [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Willy Offermans

2008-Apr-21 19:03 UTC

g_vfs_done error third part--PLEASE HELP!

Dear FreeBSD friends,

It is already the third time that I report this error. Can someone help
me in solving this issue?

Over and over again and always after heavy disk I/O I see the following
errors in the log files. If I force ar0s1g to unmount the machine
spontaneously reboots. Nothing seriously seems to be damaged by this
act, but anyway I cannot afford something bad happening to this
production machine.

Currently the error is the following:

<snip>
...
Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
...
</snip>

before the error appeared like:

<snip>
...
Apr 18 20:00:15 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896,
length=2048)]error = 5
Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800,
length=4096)]error = 5
Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896,
length=2048)]error = 5
Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800,
length=4096)]error = 5
Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896,
length=2048)]error = 5
Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800,
length=4096)]error = 5
Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896,
length=2048)]error = 5
Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800,
length=4096)]error = 5
...
</snip>

I have no clue what the errors mean, since offsets of 290725068800,
290725072896, and 290725074944 seem to be ridiculous. Does anybody 
have a clue what is going on?

I'm using FreeBSD 7.0, but found the error being reported before with
previous versions of FreeBSD. I can and will provide more details on
demand.

Any hints are very much appreciated.


-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*************************************
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: Willy@Offermans.Rompen.nl

                                       Powered by ....

                                            (__)
                                         \\\'',)
                                           \/  \ ^
                                           .\._/_)

                                       www.FreeBSD.org

Roland Smith

2008-Apr-21 20:10 UTC

head link

g_vfs_done error third part--PLEASE HELP!

On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans
wrote:> Dear FreeBSD friends,
> 
> It is already the third time that I report this error. Can someone help
> me in solving this issue?
Probably the reason that you hear so little is that you provide so
little information. Most of us are not clairvoyant.
 > Over and over again and always after heavy disk I/O I see the following
> errors in the log files. If I force ar0s1g to unmount the machine
> spontaneously reboots. Nothing seriously seems to be damaged by this
> act, but anyway I cannot afford something bad happening to this
> production machine.
Why would you force an unmount?
> Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800,
length=4096)]error = 5
> 
> I have no clue what the errors mean, since offsets of 290725068800,
> 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
> have a clue what is going on?
For starters, how big is ar0s1g? If the offset is in bytes, it is around
270 GB, which is not that unusual in this day and age.
> I'm using FreeBSD 7.0, but found the error being reported before with
> previous versions of FreeBSD. I can and will provide more details on
> demand.
What does 'df' say?

Did you notice any file corruption in the filesystem on ar0s1g?

Unmount the filesystem and run fsck(8) on it. Does it report any errors?
> Any hints are very much appreciated.
Did you manage to create a partition larger than the disk is (using
newfs's -s switch)? In that case it could be that you're trying to write
past the end of the device.

Roland
-- 
R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080421/d9f506dc/attachment.pgp

Jan Mikkelsen

2008-Apr-21 23:00 UTC

head link

g_vfs_done error third part--PLEASE HELP!

Hi Willy,

You seem to have emailed me directly as well as posting to the list.

The bad offsets are probably because you have filesystem corruption, and the
actual event that caused it was probably not reported (or is at least not
reported by these errors).

Basic question: Do you have a hardware problem?

- Do you have ECC memory? If not, have you run memtest?
- Are your disks reliable, or is one corrupting data?

Less basic questions:  What is the corruption, and what the cause?  That
might require a little more work and dropping into the debugger.

You could also try reconfiguring to use gmirror instead of ar to see if that
improves things (ie: it could be an ar bug).

Regards,

Jan.


> -----Original Message-----
> From: Willy Offermans [mailto:Willy@Offermans.Rompen.nl] 
> Sent: Tuesday, 22 April 2008 5:04 AM
> To: freebsd-stable@FreeBSD.ORG
> Subject: g_vfs_done error third part--PLEASE HELP!
> 
> 
> Dear FreeBSD friends,
> 
> It is already the third time that I report this error. Can 
> someone help
> me in solving this issue?
> 
> Over and over again and always after heavy disk I/O I see the 
> following
> errors in the log files. If I force ar0s1g to unmount the machine
> spontaneously reboots. Nothing seriously seems to be damaged by this
> act, but anyway I cannot afford something bad happening to this
> production machine.
> 
> Currently the error is the following:
> 
> <snip>
> ...
> Apr 21 19:44:36 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
> Apr 21 19:45:07 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
> Apr 21 19:45:38 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5
> ...
> </snip>
> 
> before the error appeared like:
> 
> <snip>
> ...
> Apr 18 20:00:15 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
> Apr 18 20:00:46 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
> Apr 18 20:00:46 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
> Apr 18 20:01:17 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
> Apr 18 20:01:17 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
> Apr 18 20:01:48 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
> Apr 18 20:01:48 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5
> Apr 18 20:02:19 sun kernel: 
> g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
> ...
> </snip>
> 
> I have no clue what the errors mean, since offsets of 290725068800,
> 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
> have a clue what is going on?
> 
> I'm using FreeBSD 7.0, but found the error being reported before with
> previous versions of FreeBSD. I can and will provide more details on
> demand.
> 
> Any hints are very much appreciated.
> 
> 
> -- 
> Met vriendelijke groeten,
> With kind regards,
> Mit freundlichen Gruessen,
> De jrus wah,
> 
> Willy
> 
> *************************************
> W.K. Offermans
> Home:   +31 45 544 49 44
> Mobile: +31 653 27 16 23
> e-mail: Willy@Offermans.Rompen.nl
> 
>                                        Powered by ....
> 
>                                             (__)
>                                          \\\'',)
>                                            \/  \ ^
>                                            .\._/_)
> 
>                                        www.FreeBSD.org
>

Toomas Aas

2008-Apr-25 05:19 UTC

head link

g_vfs_done error third part--PLEASE HELP!

Willy Offermans wrote:
> It is already the third time that I report this error. Can someone help
> me in solving this issue?
> 
> Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
> Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
> Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944,
length=2048)]error = 5
> ...
I can only tell you that I had similar problems with FreeBSD 6.3 and ICH7R 
based RAID. Since I couldn't figure out how to solve them, I discarded the 
BIOS-based RAID and instead set up gmirror. It's been running this way for 
a year now and been rock solid.

--
Toomas Aas

... One way to be happy ever after is not to be after too much.

Willy Offermans

2008-May-16 12:12 UTC

head link

g_vfs_done error third part--PLEASE HELP!

Hello Roland and FreeBSD friends,

I'm sorry to be so quite for a while, but I went away for a vacation.
But now I'm back, I like to solve this issue.


On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith
wrote:> On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:
> > Dear FreeBSD friends,
> > 
> > It is already the third time that I report this error. Can someone
help
> > me in solving this issue?
> 
> Probably the reason that you hear so little is that you provide so
> little information. Most of us are not clairvoyant.
>  
> > Over and over again and always after heavy disk I/O I see the
following
> > errors in the log files. If I force ar0s1g to unmount the machine
> > spontaneously reboots. Nothing seriously seems to be damaged by this
> > act, but anyway I cannot afford something bad happening to this
> > production machine.
> 
> Why would you force an unmount?
Otherwise the device keeps on reporting to be unavailable and cannot be
unmounted:

sun# umount /share/
umount: unmount of /share failed: Resource temporarily unavailable
> 
> > Apr 18 20:02:19 sun kernel:
g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5
> > 
> > I have no clue what the errors mean, since offsets of 290725068800,
> > 290725072896, and 290725074944 seem to be ridiculous. Does anybody 
> > have a clue what is going on?
> 
> For starters, how big is ar0s1g? If the offset is in bytes, it is around
> 270 GB, which is not that unusual in this day and age.
I have to admit that I was a bit confused by an offset value of 
290725068800. There is no indication of a unit, so I assumed that it
was sector but probably it is simply bytes and then indeed the number
does make sense.> 
> > I'm using FreeBSD 7.0, but found the error being reported before
with
> > previous versions of FreeBSD. I can and will provide more details on
> > demand.
> 
> What does 'df' say?
Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
/dev/ar0s1a  20308398   230438  18453290     1%    /
devfs               1        1         0   100%    /dev
/dev/ar0s1d  21321454  3814482  15801256    19%    /usr
/dev/ar0s1e  50777034  5331686  41383186    11%    /var
/dev/ar0s1f 101554150 18813760  74616058    20%    /home
/dev/ar0s1g 274977824 34564876 218414724    14%    /share

pretty normal I would say.
> 
> Did you notice any file corruption in the filesystem on ar0s1g?
No the two disks are brand new and I did not encounter any noticeable
file corruption. However I assume that nowadays bad sectors on HD are
handled by the hardware and do not need any user interaction to correct
this. But maybe I'm totally wrong.
> 
> Unmount the filesystem and run fsck(8) on it. Does it report any errors?
sun# fsck /dev/ar0s1g 
** /dev/ar0s1g
** Last Mounted on /share
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=34788357 (272 should be 264)
CORRECT? [yn] y

INCORRECT BLOCK COUNT I=34789217 (296 should be 288)
CORRECT? [yn] y

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [yn] y

SUMMARY INFORMATION BAD
SALVAGE? [yn] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [yn] y

182863 files, 17282440 used, 120206472 free (12448 frags, 15024253
blocks, 0.0% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****

The usual stuff I would say.
> 
> > Any hints are very much appreciated.
> 
> Did you manage to create a partition larger than the disk is (using
> newfs's -s switch)? In that case it could be that you're trying to
write
> past the end of the device.
No, look to the following output:

sun# bsdlabel -A /dev/ar0s1
# /dev/ar0s1:
type: unknown
disk: amnesiac
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 60799
sectors/unit: 976751937
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a: 41943040        0    4.2BSD        0     0     0 
  b:  8388608 41943040      swap                    
  c: 976751937        0    unused        0     0         # "raw"
part, don't edit
  d: 44040192 50331648    4.2BSD     2048 16384 28552 
  e: 104857600 94371840    4.2BSD     2048 16384 28552 
  f: 209715200 199229440    4.2BSD     2048 16384 28552 
  g: 567807297 408944640    4.2BSD     2048 16384 28552 

/dev/ar0s1g starts after 408944640*512/1024/1024=199680MB


So I have to conclude that the write error message does make sense and
that something seems to be wrong with the disks. The next question is
what can I do about it? Should I return the disks to the shop and ask
for new ones?

However other people that I have contacted and who had a similar
problem before have solved it by using software raid setup instead of a
hardware raid setup. This seems to indicate that there is some bug in
the FreeBSD code.

Another peculiarity that I have to mention is the following. If I use
sysinstall and if I try to ``Label allocated disk partitions'', I
cannot see the partitions on ar0. However the partitions can be
visualised by bsdlabel as shown above.

What is going on and what should I do?
> 
> Roland
> -- 
> R.F.Smith                                   http://www.xs4all.nl/~rsmith/
> [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
> pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,
De jrus wah,

Willy

*************************************
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 653 27 16 23
e-mail: Willy@Offermans.Rompen.nl

                                       Powered by ....

                                            (__)
                                         \\\'',)
                                           \/  \ ^
                                           .\._/_)

                                       www.FreeBSD.org

freebsd stable - Apr 2008 - g_vfs_done error third part--PLEASE HELP!

g_vfs_done error third part--PLEASE HELP!

g_vfs_done error third part--PLEASE HELP!

g_vfs_done error third part--PLEASE HELP!

g_vfs_done error third part--PLEASE HELP!

g_vfs_done error third part--PLEASE HELP!