I would like to get some help diagnosing permanent errors on my files. The machine in question has 12 1TB disks connected to an Areca raid card. I installed OpenSolaris build 134 and according to zpool history, created a pool with zpool create bigraid raidz2 c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5 c4t0d6 c4t0d7 c4t1d0 c4t1d1 c4t1d2 c4t1d3 I then backed up 806G of files to the machine, and had the backup program verify the files. It failed. The check is continuing to run, but so far it found 4 files where the checksums of the backup files don''t match the checksum of the original file. Zpool status shows problems: $ sudo zpool status -v pool: bigraid state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM bigraid DEGRADED 0 0 536 raidz2-0 DEGRADED 0 0 3.14K c4t0d0 ONLINE 0 0 0 c4t0d1 ONLINE 0 0 0 c4t0d2 ONLINE 0 0 0 c4t0d3 ONLINE 0 0 0 c4t0d4 ONLINE 0 0 0 c4t0d5 ONLINE 0 0 0 c4t0d6 ONLINE 0 0 0 c4t0d7 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t1d1 ONLINE 0 0 0 c4t1d2 ONLINE 0 0 0 c4t1d3 DEGRADED 0 0 0 too many errors errors: Permanent errors have been detected in the following files: <metadata>:<0x18> <metadata>:<0x3a> So, it appears that one of the disks is bad, but if one disk failed, how would a raidz2 pool develop permanent errors? The numbers in the CKSUM column are continuing to grow, but is that because the backup verification is tickling the errors as it runs? Previous postings on permanent errors said to look at fmdump -eV, but that has 437543 lines, and I don''t really know how to interpret what I see. I did check the vdev_path with " fmdump -eV | grep vdev_path | sort | uniq -c" to see if it was only certain disks, but every disk in the array is listed in the file, albeit with different frequencies: 2189 vdev_path = /dev/dsk/c4t0d0s0 1077 vdev_path = /dev/dsk/c4t0d1s0 1077 vdev_path = /dev/dsk/c4t0d2s0 1097 vdev_path = /dev/dsk/c4t0d3s0 25 vdev_path = /dev/dsk/c4t0d4s0 25 vdev_path = /dev/dsk/c4t0d5s0 20 vdev_path = /dev/dsk/c4t0d6s0 1072 vdev_path = /dev/dsk/c4t0d7s0 1092 vdev_path = /dev/dsk/c4t1d0s0 2222 vdev_path = /dev/dsk/c4t1d1s0 2221 vdev_path = /dev/dsk/c4t1d2s0 1149 vdev_path = /dev/dsk/c4t1d3s0 What should I make of this? All the disks are bad? That seems unlikely. I found another thread http://opensolaris.org/jive/thread.jspa?messageID=399988 where it finally came down to bad memory, so I''ll test that. Any other suggestions? -- This message posted from opensolaris.org
On 04/ 4/10 10:00 AM, Willard Korfhage wrote:> What should I make of this? All the disks are bad? That seems > unlikely. I found another thread > > http://opensolaris.org/jive/thread.jspa?messageID=399988 > > where it finally came down to bad memory, so I''ll test that. Any > other suggestions?It could be the cpu. I had a very bizarre case where the cpu would sometimes miscalculate the checksums of certain files and mostly when the cpu was also busy doing other things. Probably the cache. Days of running memtest and SUNWvts didn''t result in any errors because this was a weirdly pattern sensitive problem. However, I too am of the opinion that you shouldn''t even think of running zfs without ECC memory (lots of threads about that!) and that this is far, far more likely to be your problem, but I wouldn''t count on diagnostics finding it, either. Of course it could be the controller too. For laughs, the cpu calculating bad checksums was discussed in http://opensolaris.org/jive/message.jspa?messageID=469108 (see last message in the thread). If you are seriously contemplating using a system with non-ECC RAM, check out the Google research mentioned in http://opensolaris.org/jive/thread.jspa?messageID=423770 http://www.cs.toronto.edu/%7Ebianca/papers/sigmetrics09.pdf Cheers -- Frank
Yeah, this morning I concluded I really should be running ECC ram. I sometimes wonder why people people don''t run ECC ram more frequently. I remember a decade ago, when ram was much, much less dense, people fretted about alpha particles randomly flipping bits, but that seems to have died down. I know, of course, there is some added expense, but browsing on Newegg, the additional RAM cost is pretty minimal. I see 2GB ECC sticks going for about $12 more than similar non-ECC sticks. It''s the motherboards that can handle ECC which are the expensive part. Now I''ve got to see what is a good motherboard for a file server. -- This message posted from opensolaris.org
Looks like it was RAM. I ran memtest+ 4.00, and it found no problems. I removed 2 of the 3 sticks of RAM, ran a backup, and had no errors. I''m running more extensive tests, but it looks like that was it. A new motherboard, CPU and ECC RAM are on the way to me now. -- This message posted from opensolaris.org
On Sun, Apr 04, 2010 at 11:46:16PM -0700, Willard Korfhage wrote:> Looks like it was RAM. I ran memtest+ 4.00, and it found no problems.Then why do you suspect the ram? Especially with 12 disks, another likely candidate could be an overloaded power supply. While there may be problems showing up in RAM, it may only be happening under the combined load of disks, cpu and memory activity that brings the system into marginal power conditions. Sometimes it may be just one rail that is out of bounds, and other devices are unaffected. If memtest didn''t find any problems without the disk and cpu load, that tends to support this hypothesis. So, the memory may not be "bad" per se, though it''s still not ECC and therefore not "good" either :-) Perhaps you can still find a good use for it elsewhere.> I removed 2 of the 3 sticks of RAM, ran a backup, and had no > errors. I''m running more extensive tests, but it looks like that was > it. A new motherboard, CPU and ECC RAM are on the way to me now.Switching to ECC is a good thing.. but be prepared for possible continued issues (with different detection thaks to ecc) if the root cause is the psu. In fact, ECC memory may draw marginally more power and maybe make the problem worse (the new cpu and motherboard could go either way, depending on your choices). -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/30dd35b7/attachment.bin>
It certainly has symptoms that match a marginal power supply, but I measured the power consumption some time ago and found it comfortably within the power supply''s capacity. I''ve also wondered if the RAM is fine, but there is just some kind of flaky interaction of the ram configuration I had with the motherboard. -- This message posted from opensolaris.org
On Mon, Apr 5, 2010 at 9:39 PM, Willard Korfhage <opensolaris at familyk.org>wrote:> It certainly has symptoms that match a marginal power supply, but I > measured the power consumption some time ago and found it comfortably within > the power supply''s capacity. I''ve also wondered if the RAM is fine, but > there is just some kind of flaky interaction of the ram configuration I had > with the motherboard. > -- > This message posted from opensolaris.org > >I think the confusion is that you said you ran memtest86+ and the memory tested just fine. Did you remove some memory before running memtest86+ and narrow it down to a certain stick being bad or something? Your post makes it sound as though you found that all of the ram is working perfectly fine. IE: It''s not the problem. Also, a low power draw doesn''t mean much of anything. The power supply could just be dying. Load wouldn''t really matter in that scenario (although a high load will generally help it out the door a bit quicker due to higher heat/etc.). --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100405/ab039bd3/attachment.html>
On Mon, Apr 05, 2010 at 09:46:58PM -0500, Tim Cook wrote:> On Mon, Apr 5, 2010 at 9:39 PM, Willard Korfhage <opensolaris at familyk.org>wrote: > > > It certainly has symptoms that match a marginal power supply, but I > > measured the power consumption some time ago and found it comfortably within > > the power supply''s capacity. I''ve also wondered if the RAM is fine, but > > there is just some kind of flaky interaction of the ram configuration I had > > with the motherboard. > > > I think the confusion is that you said you ran memtest86+ and the memory > tested just fine. Did you remove some memory before running memtest86+ and > narrow it down to a certain stick being bad or something? Your post makes > it sound as though you found that all of the ram is working perfectly fine.Exactly.> Also, a low power draw doesn''t mean much of anything. The power supply > could just be dying.Or just one part of it could be overloaded (like a particular 5v or 12v rail that happens to be shared between too many drives and the m/b), even if the overall draw at the wall is less than the total rating. Sometimes, just moving plugs around can help - or at least show that a better psu is warranted. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/1f6044cb/attachment.bin>
Memtest didn''t show any errors, but between Frank, early in the thread, saying that he had found memory errors that memtest didn''t catch, and remove of DIMMs apparently fixing the problem, I too soon jumped to the conclusion it was the memory. Certainly there are other explanations. I see that I have a spare Corsair 620W power supply that I could try. It is a Corsair supply of some wattage in there now. If I recall properly, the steady state power draw is between 150 and 200 watts. By the way, I see that now one of the disks is listed as degraded - too many errors. Is there a good way to identify exactly which of the disks it is? -- This message posted from opensolaris.org
On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote:> By the way, I see that now one of the disks is listed as degraded - too many errors. Is there a good way to identify exactly which of the disks it is?It''s hidden in iostat -E, of all places. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/e9e9ffb8/attachment.bin>
On Tue, Apr 6, 2010 at 12:24 AM, Daniel Carosone <dan at geek.com.au> wrote:> On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote: > > By the way, I see that now one of the disks is listed as degraded - too > many errors. Is there a good way to identify exactly which of the disks it > is? > > It''s hidden in iostat -E, of all places. > > -- > Dan. > >I think he wants to know how to identify which physical drive maps to the dev ID in solaris. The only way I can think of is to run something like DD against the drive to light up the activity LED. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/6d908cdf/attachment.html>
On Tue, Apr 06, 2010 at 12:29:35AM -0500, Tim Cook wrote:> On Tue, Apr 6, 2010 at 12:24 AM, Daniel Carosone <dan at geek.com.au> wrote: > > > On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote: > > > By the way, I see that now one of the disks is listed as degraded - too > > many errors. Is there a good way to identify exactly which of the disks it > > is? > > > > It''s hidden in iostat -E, of all places. > > > > -- > > Dan. > > > > > I think he wants to know how to identify which physical drive maps to the > dev ID in solaris. The only way I can think of is to run something like DD > against the drive to light up the activity LED.or look at the serial numbers printed in iostat -E -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/13de95a7/attachment.bin>
On Tue, Apr 6, 2010 at 12:47 AM, Daniel Carosone <dan at geek.com.au> wrote:> On Tue, Apr 06, 2010 at 12:29:35AM -0500, Tim Cook wrote: > > On Tue, Apr 6, 2010 at 12:24 AM, Daniel Carosone <dan at geek.com.au> > wrote: > > > > > On Mon, Apr 05, 2010 at 09:35:21PM -0700, Willard Korfhage wrote: > > > > By the way, I see that now one of the disks is listed as degraded - > too > > > many errors. Is there a good way to identify exactly which of the disks > it > > > is? > > > > > > It''s hidden in iostat -E, of all places. > > > > > > -- > > > Dan. > > > > > > > > I think he wants to know how to identify which physical drive maps to the > > dev ID in solaris. The only way I can think of is to run something like > DD > > against the drive to light up the activity LED. > > or look at the serial numbers printed in iostat -E > > -- > Dan. >And then what? Cross your fingers and hope you pull the right drive on the first go? I don''t know of any drives that come from the factory in a hot-swap bay with the serial number printed on the front of the caddy. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100406/1256648d/attachment.html>
Yes, I was hoping to find the serial numbers. Unfortunately, it doesn''t show any serial numbers for the disk attached to the Areca raid card. -- This message posted from opensolaris.org
On 6/04/10 11:47 PM, Willard Korfhage wrote:> Yes, I was hoping to find the serial numbers. Unfortunately, it doesn''t > show any serial numbers for the disk attached to the Areca raid card.You''ll need to reboot and go into the card bios to get that information. James C. McPherson -- Senior Software Engineer, Solaris Oracle http://www.jmcp.homeunix.com/blog
Willard Korfhage wrote:> Yes, I was hoping to find the serial numbers. Unfortunately, it > doesn''t show any serial numbers for the disk attached to the Areca > raid card.Does Areca provide any Solaris tools that will show you the drive info? If you are using the Areca in JBOD mode, smartctl will frequently show serial numbers that iostat -E will not (iostat appears to be really stupid about getting serial numbers compared to just about any other tool out there). -- Carson