I''ve sent this to the driver list as well, but since the zfs folks tend to be intimately involved with the marvell driver stack, I figured I''d give you guys a shot too. Does anyone happen to know if there was a driver change with build 126? I had a pool that was 2x5+1 raidz vdev''s. I moved all the data off temporarily, changed it to one 10+2 raidz2 vdev, and am in the process of moving all the data back. I''ve had two drives "fail" in the last 3 hours that have been running fine for over a year, and presented absolutely no issues moving the data out of the original zpool. My first inclination is this is a driver issue. I''m currently running 2xMarvell SAT2-MV8 SATA controllers. 6 disks on the first controller, 7 on the second (one hot spare). zpool status pool: fserv state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using ''zpool online'' or replace the device with ''zpool replace''. scrub: resilver completed after 1h38m with 0 errors on Sun Nov 1 18:42:16 2009 config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 2.83M c8t2d0 REMOVED 0 0 0 c7t6d0 ONLINE 0 0 0 35.6G resilvered c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 REMOVED 0 0 0 c7t5d0 ONLINE 0 0 0 spares c7t6d0 INUSE currently in use Nov 1 16:21:34 fserv sata: [ID 801593 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/ pci1033,125 at 0,1/pci11ab,11ab at 6: Nov 1 16:21:34 fserv SATA device at port 2 - device failed Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv Command failed to complete...Device is gone Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv drive offline Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv drive offline Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv drive offline Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv drive offline Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:34 fserv drive offline Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:40 fserv drive offline Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:40 fserv drive offline Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:40 fserv drive offline Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 16:21:40 fserv drive offline Nov 1 17:03:38 fserv marvell88sx: [ID 268337 kern.warning] WARNING: marvell88sx2:device on port 4 failed to reset Nov 1 17:04:08 fserv sata: [ID 801593 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4: Nov 1 17:04:08 fserv SATA device at port 4 - device failed Nov 1 17:04:08 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:08 fserv Command failed to complete...Device is gone Nov 1 17:04:08 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:08 fserv drive offline Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:09 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:09 fserv drive offline Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:09 fserv drive offline Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:09 fserv drive offline Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0/pci11ab,11ab at 4/disk at 4,0 (sd30): Nov 1 17:04:09 fserv drive offline Nov 1 18:31:59 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:31:59 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 18:32:11 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:32:11 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 18:35:00 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:35:00 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 18:35:12 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:35:12 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 18:35:21 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:35:21 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 18:38:36 fserv scsi: [ID 107833 kern.warning] WARNING: /pci at 7c ,0/pci10de,376 at a/pci1033,125 at 0,1/pci11ab,11ab at 6/disk at 2,0 (sd26): Nov 1 18:38:36 fserv SYNCHRONIZE CACHE command failed (5) Nov 1 21:06:31 fserv pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 2 irq 0xe vector 0x44 ioapic 0x4 intin 0xe is bound to cpu 3 Nov 1 21:06:31 fserv pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 3 irq 0xf vector 0x44 ioapic 0x4 intin 0xf is bound to cpu 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091102/2a2e993e/attachment.html>
I have the same card and might have seen the same problem. Yesterday I upgraded to b126 and started to migrate all my data to 8 disc raidz2 connected to such a card. And suddenly ZFS reported checksum errors. I thought the drives were faulty. But you suggest the problem could have been the driver? I also noticed that one of the drives had resilvered a small amount, just like yours. I now use b125 and there are no checksum errors. So, is there a bug in the new b126 driver? -- This message posted from opensolaris.org
On Mon, Nov 2, 2009 at 6:34 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> wrote:> I have the same card and might have seen the same problem. Yesterday I > upgraded to b126 and started to migrate all my data to 8 disc raidz2 > connected to such a card. And suddenly ZFS reported checksum errors. I > thought the drives were faulty. But you suggest the problem could have been > the driver? I also noticed that one of the drives had resilvered a small > amount, just like yours. > > I now use b125 and there are no checksum errors. So, is there a bug in the > new b126 driver? >Can any of you Sun folks comment on this? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091103/f9f089f8/attachment.html>
Noone has noticed this? -- This message posted from opensolaris.org
> Nov 1 16:21:34 fserv Command failed to complete...Device is gone> Nov 1 17:04:08 fserv Command failed to complete...Device is gone kinda looks like drive FW or cable issue... if it was a driver issue it might be a lost command or reset for phase resync. > driver change with build 126? not for the SATA framework, but for HBAs there is: http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001 Rob
Right now I do not dare to use builds later than 125, because in b126 the problem showed up. Maybe a coincidence, maybe not. But I think it is best to not use b126 or later, until someone has confirmed there are no driver changes. So, to confirm, there are no driver changes in b126 for the marvell88sx2, right? So I should safely be able to use b126 and later? -- This message posted from opensolaris.org
On Fri, Nov 6, 2009 at 2:10 PM, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> wrote:> Right now I do not dare to use builds later than 125, because in b126 the > problem showed up. Maybe a coincidence, maybe not. But I think it is best to > not use b126 or later, until someone has confirmed there are no driver > changes. > > So, to confirm, there are no driver changes in b126 for the marvell88sx2, > right? So I should safely be able to use b126 and later? > >Let me know what your results are if you decide to upgrade. I''ve already replaced both drives that were having issues, I''ll do cables later but I''m still having a hard time believing my cables magically went bad right when I upgraded to build 126. The new drives have the same issues the old drives did. New brand and model. And from what I can tell, I''m getting checksum errors through the roof on the replace as well... pool: fserv state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver in progress for 0h34m, 22.60% done, 1h57m to go config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 14340903866396142118 UNAVAIL 0 0 0 was /dev/dsk/c8t2d0s0 c7t6d0 ONLINE 0 0 0 c8t3d0 REMOVED 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 replacing-10 DEGRADED 0 0 816K 15401866802517339500 FAULTED 0 0 0 was /dev/dsk/c7t4d0s0/old c7t4d0 ONLINE 0 0 0 52.3G resilvered c7t5d0 ONLINE 0 0 0 spares c7t6d0 INUSE currently in use --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091106/b92d1043/attachment.html>
Ok, so you changed drives and you still see errors? Are the drives brand new or used? What kind of drives, which brand? 2TB? And if you reboot into an earlier build such as b125 you dont see any errors, right? Right now I am running b125. I dont dare to run b126, if your observation is correct. Could you just rip out drivers from b125? I could post them drivers here for you, if you tell me which files you need. And then you can see if it is the drivers causing the problem or not. -- This message posted from opensolaris.org
On Sat, Nov 7, 2009 at 4:27 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> wrote:> Ok, so you changed drives and you still see errors? Are the drives brand > new or used? What kind of drives, which brand? 2TB? And if you reboot into > an earlier build such as b125 you dont see any errors, right? >Brand new. I''ve tried both 1TB hitachi and 1.5TB seagate (not the "bad" ones). I can''t boot into an older version because the last version I had was b118 which doesn''t have zfs version 19 support. I''ve been looking to see if there''s a way to downgrade via IPS but that''s turned up a lot of nothing.> > Right now I am running b125. I dont dare to run b126, if your observation > is correct. Could you just rip out drivers from b125? I could post them > drivers here for you, if you tell me which files you need. And then you can > see if it is the drivers causing the problem or not.It''s tough to say what exactly is causing the problems. I would imagine ripping something like sd from the older version would break more than it would fix. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091107/57257bb1/attachment.html>
Hi Tim and all, I believe you are saying that marvell88sx2 driver error messages started in build 126, along with new disk errors in RAIDZ pools. Is this correct? If so, please send me the following information: 1. Hardware you are running 2. If you are also seeing new disk errors in your RAIDZ pools include your zpool status output. I''m not the right person to be diagnosing driver-level issues but I will investigate. Thanks, Cindy ----- Original Message ----- From: Tim Cook <tim at cook.ms> Date: Saturday, November 7, 2009 10:08 am Subject: Re: [zfs-discuss] marvell88sx2 driver build126 To: Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> Cc: zfs-discuss at opensolaris.org> On Sat, Nov 7, 2009 at 4:27 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com > > wrote: > > > Ok, so you changed drives and you still see errors? Are the drives brand > > new or used? What kind of drives, which brand? 2TB? And if you > reboot into > > an earlier build such as b125 you dont see any errors, right? > > > > Brand new. I''ve tried both 1TB hitachi and 1.5TB seagate (not the "bad" > ones). > > I can''t boot into an older version because the last version I had was > b118 > which doesn''t have zfs version 19 support. I''ve been looking to see if > there''s a way to downgrade via IPS but that''s turned up a lot of nothing. > > > > > > > > Right now I am running b125. I dont dare to run b126, if your observation > > is correct. Could you just rip out drivers from b125? I could post them > > drivers here for you, if you tell me which files you need. And then > you can > > see if it is the drivers causing the problem or not. > > > > It''s tough to say what exactly is causing the problems. I would imagine > ripping something like sd from the older version would break more than > it > would fix. > > --Tim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Sat, Nov 7, 2009 at 12:02 PM, Cindy Swearingen <Cindy.Swearingen at sun.com>wrote:> Hi Tim and all, > > I believe you are saying that marvell88sx2 driver error messages started > in build 126, along with new disk errors in RAIDZ pools. > > Is this correct? If so, please send me the following information: >Yes.> > 1. Hardware you are running >Motherboard: SUPERMICRO MBD-H8DAE-2-O 2xAMD opteron 22xx CPU''s (forget the exact model, they''re 2010mhz) 8GB crucial ECC ddr2 memory 2xSupermicro AOC-SAT2-MV8 SATA adapters Supermicro SC932T-R760B case** with 15xSATA passthrough backplane I also have an nvidia video card in it, but I''m not sure of the model, and doubt it has any role in this troubleshooting.> > 2. If you are also seeing new disk errors in your RAIDZ pools > include your zpool status output. >Well, I can give you a current one, but I''ve done about a hundred things troubleshooting, so it isn''t representative of what the issues were a few days ago. I''m still trying to figure out why it''s choking on any drive I put into c8t2d0. It''s stopped generating errors on c7t4d0, but I haven''t changed a thing with that slot outside of stopping the zpool replace and restarting it a few times... which is also extremely odd to me. r00t at fserv:~$ zpool status pool: fserv state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 2h53m with 0 errors on Fri Nov 6 22:09:08 2009 config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 14340903866396142118 UNAVAIL 0 0 0 was /dev/dsk/c8t2d0s0 c7t6d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 2.68G resilvered c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 231G resilvered c7t5d0 ONLINE 0 0 0 spares c7t6d0 INUSE currently in use errors: No known data errors -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091107/126ce595/attachment.html>
I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB drives. Brand new. -- This message posted from opensolaris.org
"I can''t boot into an older version because the last version I had was b118 which doesn''t have zfs version 19 support. I''ve been looking to see if there''s a way to downgrade via IPS but that''s turned up a lot of nothing." If someone can tell me which files are needed for the driver I can extract them from my b125 and post them here for you, so you can try out. Then we can know if the problem is in b126 drivers or not. If b125 drivers work, we know the problem is in b126. Otherwise there might be some other problem. Another solution could be that you install SCXE b125. There are links to that DVD b125. And from SCXE you can upgrade to later Opensolaris builds. I think. Is it possible to upgrade to a specific build via IPS? When I use the Update Manager, I always upgrade to the latest build. Can I target, say, bXXX? Or is the only way to get bXXX, by installing SXCE? -- This message posted from opensolaris.org
> "I can''t boot into an older version because the last > version I had was b118 which doesn''t have zfs version > 19 support. I''ve been looking to see if there''s a > way to downgrade via IPS but that''s turned up a lot > of nothing." > > If someone can tell me which files are needed for the > driver I can extract them from my b125 and post them > here for you, so you can try out. Then we can know if > the problem is in b126 drivers or not. If b125 > drivers work, we know the problem is in b126. > Otherwise there might be some other problem. > > Another solution could be that you install SCXE b125. > There are links to that DVD b125. And from SCXE you > can upgrade to later Opensolaris builds. I think. > > Is it possible to upgrade to a specific build via > IPS? When I use the Update Manager, I always upgrade > to the latest build. Can I target, say, bXXX? Or is > the only way to get bXXX, by installing SXCE?Here are some notes i stole from the list earlier. I think they might be on a wiki somewhere now, but it seems relatively easy to upgrade to a specific version: Starting from OpenSolaris 2009.06 (snv_111b) active BE. 1) beadm create snv_111b-dev 2) beadm activate snv_111b-dev 3) reboot 4) pkg set-authority -O http://pkg.opensolaris.org/dev opensolaris.org 5) pkg install SUNWipkg 6) pkg list ''entire*'' 7) beadm create snv_118 8) beadm mount snv_118 /mnt 9) pkg -R /mnt refresh 10) pkg -R /mnt install entire at 0.5.11-0.118 11) bootadm update-archive -R /mnt 12) beadm umount snv_118 13) beadm activate snv_118 14) reboot Now you have a snv_118 development environment. -- This message posted from opensolaris.org
Great! So if I want another build, for instance b125, I just change step 10? 10) pkg -R /mnt install entire at 0.5.11-0.125 Yes? What is this "0.5.11" thing? Should that be changed too, if I try to install b125? Like "0.5.12-0.125"? -- This message posted from opensolaris.org
On Sun, Nov 8, 2009 at 9:47 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo.com> wrote:> Great! So if I want another build, for instance b125, I just change step > 10? > 10) pkg -R /mnt install entire at 0.5.11-0.125 > Yes? > > What is this "0.5.11" thing? Should that be changed too, if I try to > install b125? Like "0.5.12-0.125"? >No. That''s the SunOS version number, and you should always use 0.5.11- for anything in opensolaris today. Solaris 10= "5.10". Opensolaris="5.11". 9=5.9 etc. etc. etc. http://en.wikipedia.org/wiki/Solaris_%28operating_system%29 --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091108/482d7f89/attachment.html>
I think you can work out the files for the driver by looking here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/pkgdefs/SUNWmv88sx/prototype_i386 So the 32 bit driver is: kernel/drv/marvell88sx And the 64 bit driver is: kernel/drv/amd64/marvell88sx It a pity that the marvell driver is not open source. For the sata drivers that are open source, ahci, nv_sata, si3124 ..you can see the history of all the changes to the source code of the drivers, all cross referenced to the bug numbers, using OpenGrok: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/sata/adapters/ Regards Nigel Smith -- This message posted from opensolaris.org
Ok, here I attached the 64 bit variant. You can try it if you wish and see if the checksum errors disappear. -- This message posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: marvell88sx Type: application/octet-stream Size: 100256 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091108/98df9ddc/attachment.obj>
This is from build 125. -- This message posted from opensolaris.org
Hi, I can''t find any bug-related issues with marvell88sx2 in b126. I looked over Dave Hollister''s shoulder while he searched for marvell in his webrevs of this putback and nothing came up: > driver change with build 126? not for the SATA framework, but for HBAs there is: http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001 I will find a thumper, load build 125, create a raidz pool, and upgrade to b126. I''ll also send the error messages that Tim provided to someone who works in the driver group. Thanks, Cindy On 11/07/09 14:33, Orvar Korvar wrote:> I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware > Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 > I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB drives. Brand new.
On Mon, Nov 9, 2009 at 2:51 PM, Cindy Swearingen <Cindy.Swearingen at sun.com>wrote:> Hi, > > I can''t find any bug-related issues with marvell88sx2 in b126. > > I looked over Dave Hollister''s shoulder while he searched for > marvell in his webrevs of this putback and nothing came up: > > > driver change with build 126? > not for the SATA framework, but for HBAs there is: > http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001 > > I will find a thumper, load build 125, create a raidz pool, and > upgrade to b126. > > I''ll also send the error messages that Tim provided to someone who > works in the driver group. > > Thanks, > > Cindy >I tried the build 125 driver and it didn''t make a difference. The odd part I''ve just noticed is that it''s port 4 on both cards that have been giving me issues. I guess it''s possible it''s just a coincidence/bad luck. I''ve grabbed the b125 ISO from genunix and am going to try booting off the livecd to see if it produces different results. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091110/13fe5518/attachment.html>
Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new. How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence? -- This message posted from opensolaris.org
Hi Orvar, Correct, I don''t see any marvell8sx2 driver changes between b125-126. So far, only you and Tim are reporting these issues. Generally, we see bugs filed by the internal test teams if they see similar problems. I will try to reproduce the RAIDZ checksum errors separately from the marvell88sx2 issue. Thanks, Cindy On 11/10/09 02:25, Orvar Korvar wrote:> Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new. > > How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence?
On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote:> Does this mean that there are no driver changes in marvell88sx2, > between b125 and b126? If no driver changes, then it means that we > both had extreme unluck with our drives, because we both had > checksum errors? And my discs were brand new.There are other drivers in the software stack that may have changed. -- richard> > How probable is this? Something is weird here. What is your opinion > on this? Should we agree that there was a hardware error, and it was > just a coincidence? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Nov 10, 2009 at 10:55 AM, Richard Elling <richard.elling at gmail.com>wrote:> > On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote: > > Does this mean that there are no driver changes in marvell88sx2, between >> b125 and b126? If no driver changes, then it means that we both had extreme >> unluck with our drives, because we both had checksum errors? And my discs >> were brand new. >> > > There are other drivers in the software stack that may have changed. > -- richard > > > >> How probable is this? Something is weird here. What is your opinion on >> this? Should we agree that there was a hardware error, and it was just a >> coincidence? >> >So... I do appear to have reached somewhat of a truce with the system and b126 at the moment. I''m now going through and replacing the last of my old maxtor 300GB drives with brand new hitachi 1TB drives. One thing I''m noticing is a lot of checksum errors being generated during the resilver. Is this normal? Furthermore, since I see "no known data errors", is it safe to assume it''s all being corrected, and I''m not losing any data? I still do have a separate copy of this data on a box at work that should be completely consistent... but I will need to re-purpose that storage soon, and will be without a known good backup for a while (I know, I know). I''d rather do a fresh zfs send/receive than find out 6 months from now I lost something. pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h8m, 0.89% done, 15h14m to go config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 161K 14274451003165180679 FAULTED 0 0 0 was /dev/dsk/c7t3d0s0/old c7t3d0 ONLINE 0 0 0 2.05G resilvered c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares c7t6d0 AVAIL errors: No known data errors --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091110/1f9351d9/attachment.html>
On Tue, Nov 10, 2009 at 5:15 PM, Tim Cook <tim at cook.ms> wrote:> > > On Tue, Nov 10, 2009 at 10:55 AM, Richard Elling <richard.elling at gmail.com > > wrote: > >> >> On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote: >> >> Does this mean that there are no driver changes in marvell88sx2, between >>> b125 and b126? If no driver changes, then it means that we both had extreme >>> unluck with our drives, because we both had checksum errors? And my discs >>> were brand new. >>> >> >> There are other drivers in the software stack that may have changed. >> -- richard >> >> >> >>> How probable is this? Something is weird here. What is your opinion on >>> this? Should we agree that there was a hardware error, and it was just a >>> coincidence? >>> >> > > So... I do appear to have reached somewhat of a truce with the system and > b126 at the moment. I''m now going through and replacing the last of my old > maxtor 300GB drives with brand new hitachi 1TB drives. One thing I''m > noticing is a lot of checksum errors being generated during the resilver. > Is this normal? Furthermore, since I see "no known data errors", is it safe > to assume it''s all being corrected, and I''m not losing any data? I still do > have a separate copy of this data on a box at work that should be completely > consistent... but I will need to re-purpose that storage soon, and will be > without a known good backup for a while (I know, I know). I''d rather do a > fresh zfs send/receive than find out 6 months from now I lost something. > > pool: fserv > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 0h8m, 0.89% done, 15h14m to go > > config: > > NAME STATE READ WRITE CKSUM > fserv DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > c8t0d0 ONLINE 0 0 0 > c8t1d0 ONLINE 0 0 0 > c8t2d0 ONLINE 0 0 0 > > c8t3d0 ONLINE 0 0 0 > c8t4d0 ONLINE 0 0 0 > c8t5d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t2d0 ONLINE 0 0 0 > replacing-9 DEGRADED 0 0 161K > 14274451003165180679 FAULTED 0 0 0 was > /dev/dsk/c7t3d0s0/old > c7t3d0 ONLINE 0 0 0 2.05G > resilvered > > c7t4d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > spares > c7t6d0 AVAIL > > > errors: No known data errors > > > --Tim > >Anyone? It''s up to 7.35M checksum errors and it''s rebuilding extremely slowly (as evidenced by the 10 hour time). The errors are only showing on the "replacing-9" line, not the individual drive. pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 6h56m, 39.61% done, 10h34m to go config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 7.35M 14274451003165180679 FAULTED 0 0 0 was /dev/dsk/c7t3d0s0/old c7t3d0 ONLINE 0 0 0 91.9G resilvered c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares c7t6d0 AVAIL errors: No known data errors --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091110/803d26e7/attachment.html>
Other drivers in the stack? Which drivers? And have anyone of them been changed between b125 and b126? -- This message posted from opensolaris.org
rwalists at washdcmail.com
2009-Nov-11 13:24 UTC
[zfs-discuss] marvell88sx2 driver build126
On Nov 11, 2009, at 12:01 AM, Tim Cook wrote:> On Tue, Nov 10, 2009 at 5:15 PM, Tim Cook <tim at cook.ms> wrote: >> One thing I''m >> noticing is a lot of checksum errors being generated during the >> resilver. >> Is this normal?> Anyone? It''s up to 7.35M checksum errors and it''s rebuilding > extremely > slowly (as evidenced by the 10 hour time). The errors are only > showing on > the "replacing-9" line, not the individual drive.I''ve only replaced a drive once, but it didn''t show any checksum errors during the resilver. This was a 2 TB WD Green drive in a mirror pool that had started to show write errors. It was attached to a SuperMicro AOC-SAT2-MV8. Good luck, Ware
On Wed, Nov 11, 2009 at 3:38 AM, Orvar Korvar < knatte_fnatte_tjatte at yahoo.com> wrote:> Other drivers in the stack? Which drivers? And have anyone of them been > changed between b125 and b126? >Looks like the sd drive for one. http://dlc.sun.com/osol/on/downloads/b126/on-changelog-b126.html --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091111/3c295503/attachment.html>
The checksum errors are fixed in build 128 with: 6807339 spurious checksum errors when replacing a vdev No; you''re not losing any data due to this. - Eric -- This message posted from opensolaris.org
So he did actually hit a bug? But the bug is not dangerous as it doesnt destroy data? But I did not replace any devices and still it showed checksum errors. I think I did a zfs send | zfs receive? I dont remember. But I just copied things back and forth, and the checksum errors showed up. So what does that mean? -- This message posted from opensolaris.org