Dear FreeBSD friends, It is already the third time that I report this error. Can someone help me in solving this issue? Over and over again and always after heavy disk I/O I see the following errors in the log files. If I force ar0s1g to unmount the machine spontaneously reboots. Nothing seriously seems to be damaged by this act, but anyway I cannot afford something bad happening to this production machine. Currently the error is the following: <snip> ... Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 ... </snip> before the error appeared like: <snip> ... Apr 18 20:00:15 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 Apr 18 20:00:46 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 Apr 18 20:01:17 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 Apr 18 20:01:48 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 ... </snip> I have no clue what the errors mean, since offsets of 290725068800, 290725072896, and 290725074944 seem to be ridiculous. Does anybody have a clue what is going on? I'm using FreeBSD 7.0, but found the error being reported before with previous versions of FreeBSD. I can and will provide more details on demand. Any hints are very much appreciated. -- Met vriendelijke groeten, With kind regards, Mit freundlichen Gruessen, De jrus wah, Willy ************************************* W.K. Offermans Home: +31 45 544 49 44 Mobile: +31 653 27 16 23 e-mail: Willy@Offermans.Rompen.nl Powered by .... (__) \\\'',) \/ \ ^ .\._/_) www.FreeBSD.org
On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote:> Dear FreeBSD friends, > > It is already the third time that I report this error. Can someone help > me in solving this issue?Probably the reason that you hear so little is that you provide so little information. Most of us are not clairvoyant.> Over and over again and always after heavy disk I/O I see the following > errors in the log files. If I force ar0s1g to unmount the machine > spontaneously reboots. Nothing seriously seems to be damaged by this > act, but anyway I cannot afford something bad happening to this > production machine.Why would you force an unmount?> Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > > I have no clue what the errors mean, since offsets of 290725068800, > 290725072896, and 290725074944 seem to be ridiculous. Does anybody > have a clue what is going on?For starters, how big is ar0s1g? If the offset is in bytes, it is around 270 GB, which is not that unusual in this day and age.> I'm using FreeBSD 7.0, but found the error being reported before with > previous versions of FreeBSD. I can and will provide more details on > demand.What does 'df' say? Did you notice any file corruption in the filesystem on ar0s1g? Unmount the filesystem and run fsck(8) on it. Does it report any errors?> Any hints are very much appreciated.Did you manage to create a partition larger than the disk is (using newfs's -s switch)? In that case it could be that you're trying to write past the end of the device. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080421/d9f506dc/attachment.pgp
Hi Willy, You seem to have emailed me directly as well as posting to the list. The bad offsets are probably because you have filesystem corruption, and the actual event that caused it was probably not reported (or is at least not reported by these errors). Basic question: Do you have a hardware problem? - Do you have ECC memory? If not, have you run memtest? - Are your disks reliable, or is one corrupting data? Less basic questions: What is the corruption, and what the cause? That might require a little more work and dropping into the debugger. You could also try reconfiguring to use gmirror instead of ar to see if that improves things (ie: it could be an ar bug). Regards, Jan.> -----Original Message----- > From: Willy Offermans [mailto:Willy@Offermans.Rompen.nl] > Sent: Tuesday, 22 April 2008 5:04 AM > To: freebsd-stable@FreeBSD.ORG > Subject: g_vfs_done error third part--PLEASE HELP! > > > Dear FreeBSD friends, > > It is already the third time that I report this error. Can > someone help > me in solving this issue? > > Over and over again and always after heavy disk I/O I see the > following > errors in the log files. If I force ar0s1g to unmount the machine > spontaneously reboots. Nothing seriously seems to be damaged by this > act, but anyway I cannot afford something bad happening to this > production machine. > > Currently the error is the following: > > <snip> > ... > Apr 21 19:44:36 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > Apr 21 19:45:07 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > Apr 21 19:45:38 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > ... > </snip> > > before the error appeared like: > > <snip> > ... > Apr 18 20:00:15 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 > Apr 18 20:00:46 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > Apr 18 20:00:46 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 > Apr 18 20:01:17 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > Apr 18 20:01:17 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 > Apr 18 20:01:48 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > Apr 18 20:01:48 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725072896, length=2048)]error = 5 > Apr 18 20:02:19 sun kernel: > g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > ... > </snip> > > I have no clue what the errors mean, since offsets of 290725068800, > 290725072896, and 290725074944 seem to be ridiculous. Does anybody > have a clue what is going on? > > I'm using FreeBSD 7.0, but found the error being reported before with > previous versions of FreeBSD. I can and will provide more details on > demand. > > Any hints are very much appreciated. > > > -- > Met vriendelijke groeten, > With kind regards, > Mit freundlichen Gruessen, > De jrus wah, > > Willy > > ************************************* > W.K. Offermans > Home: +31 45 544 49 44 > Mobile: +31 653 27 16 23 > e-mail: Willy@Offermans.Rompen.nl > > Powered by .... > > (__) > \\\'',) > \/ \ ^ > .\._/_) > > www.FreeBSD.org >
Willy Offermans wrote:> It is already the third time that I report this error. Can someone help > me in solving this issue? > > Apr 21 19:44:36 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > Apr 21 19:45:07 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > Apr 21 19:45:38 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725074944, length=2048)]error = 5 > ...I can only tell you that I had similar problems with FreeBSD 6.3 and ICH7R based RAID. Since I couldn't figure out how to solve them, I discarded the BIOS-based RAID and instead set up gmirror. It's been running this way for a year now and been rock solid. -- Toomas Aas ... One way to be happy ever after is not to be after too much.
Hello Roland and FreeBSD friends, I'm sorry to be so quite for a while, but I went away for a vacation. But now I'm back, I like to solve this issue. On Mon, Apr 21, 2008 at 10:10:47PM +0200, Roland Smith wrote:> On Mon, Apr 21, 2008 at 09:04:03PM +0200, Willy Offermans wrote: > > Dear FreeBSD friends, > > > > It is already the third time that I report this error. Can someone help > > me in solving this issue? > > Probably the reason that you hear so little is that you provide so > little information. Most of us are not clairvoyant. > > > Over and over again and always after heavy disk I/O I see the following > > errors in the log files. If I force ar0s1g to unmount the machine > > spontaneously reboots. Nothing seriously seems to be damaged by this > > act, but anyway I cannot afford something bad happening to this > > production machine. > > Why would you force an unmount?Otherwise the device keeps on reporting to be unavailable and cannot be unmounted: sun# umount /share/ umount: unmount of /share failed: Resource temporarily unavailable> > > Apr 18 20:02:19 sun kernel: g_vfs_done():ar0s1g[WRITE(offset=290725068800, length=4096)]error = 5 > > > > I have no clue what the errors mean, since offsets of 290725068800, > > 290725072896, and 290725074944 seem to be ridiculous. Does anybody > > have a clue what is going on? > > For starters, how big is ar0s1g? If the offset is in bytes, it is around > 270 GB, which is not that unusual in this day and age.I have to admit that I was a bit confused by an offset value of 290725068800. There is no indication of a unit, so I assumed that it was sector but probably it is simply bytes and then indeed the number does make sense.> > > I'm using FreeBSD 7.0, but found the error being reported before with > > previous versions of FreeBSD. I can and will provide more details on > > demand. > > What does 'df' say?Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ar0s1a 20308398 230438 18453290 1% / devfs 1 1 0 100% /dev /dev/ar0s1d 21321454 3814482 15801256 19% /usr /dev/ar0s1e 50777034 5331686 41383186 11% /var /dev/ar0s1f 101554150 18813760 74616058 20% /home /dev/ar0s1g 274977824 34564876 218414724 14% /share pretty normal I would say.> > Did you notice any file corruption in the filesystem on ar0s1g?No the two disks are brand new and I did not encounter any noticeable file corruption. However I assume that nowadays bad sectors on HD are handled by the hardware and do not need any user interaction to correct this. But maybe I'm totally wrong.> > Unmount the filesystem and run fsck(8) on it. Does it report any errors?sun# fsck /dev/ar0s1g ** /dev/ar0s1g ** Last Mounted on /share ** Phase 1 - Check Blocks and Sizes INCORRECT BLOCK COUNT I=34788357 (272 should be 264) CORRECT? [yn] y INCORRECT BLOCK COUNT I=34789217 (296 should be 288) CORRECT? [yn] y ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? [yn] y SUMMARY INFORMATION BAD SALVAGE? [yn] y BLK(S) MISSING IN BIT MAPS SALVAGE? [yn] y 182863 files, 17282440 used, 120206472 free (12448 frags, 15024253 blocks, 0.0% fragmentation) ***** FILE SYSTEM MARKED CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** The usual stuff I would say.> > > Any hints are very much appreciated. > > Did you manage to create a partition larger than the disk is (using > newfs's -s switch)? In that case it could be that you're trying to write > past the end of the device.No, look to the following output: sun# bsdlabel -A /dev/ar0s1 # /dev/ar0s1: type: unknown disk: amnesiac label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 60799 sectors/unit: 976751937 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 41943040 0 4.2BSD 0 0 0 b: 8388608 41943040 swap c: 976751937 0 unused 0 0 # "raw" part, don't edit d: 44040192 50331648 4.2BSD 2048 16384 28552 e: 104857600 94371840 4.2BSD 2048 16384 28552 f: 209715200 199229440 4.2BSD 2048 16384 28552 g: 567807297 408944640 4.2BSD 2048 16384 28552 /dev/ar0s1g starts after 408944640*512/1024/1024=199680MB So I have to conclude that the write error message does make sense and that something seems to be wrong with the disks. The next question is what can I do about it? Should I return the disks to the shop and ask for new ones? However other people that I have contacted and who had a similar problem before have solved it by using software raid setup instead of a hardware raid setup. This seems to indicate that there is some bug in the FreeBSD code. Another peculiarity that I have to mention is the following. If I use sysinstall and if I try to ``Label allocated disk partitions'', I cannot see the partitions on ar0. However the partitions can be visualised by bsdlabel as shown above. What is going on and what should I do?> > Roland > -- > R.F.Smith http://www.xs4all.nl/~rsmith/ > [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] > pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)-- Met vriendelijke groeten, With kind regards, Mit freundlichen Gruessen, De jrus wah, Willy ************************************* W.K. Offermans Home: +31 45 544 49 44 Mobile: +31 653 27 16 23 e-mail: Willy@Offermans.Rompen.nl Powered by .... (__) \\\'',) \/ \ ^ .\._/_) www.FreeBSD.org