Hi Folks, I have 3510 disk array connected to T2000 server running: SunOS 5.10 Generic_118833-33 sun4v sparc SUNW,Sun-Fire-T200 12 disks (300G each) is exported from array and ZFS is used to manage them (raidz with one hot spare). Few days ago we had a disk failure. First problem was that hot spare hasn''t automatically kicked in, so I have run zpool replace manually - resilvering started. It''s now 4th day and it''s still not finished. Also when I run zpool status yesterday it was 56% complete, today it''s only 13%. This is our production machine and we really can''t afford this service to be slow any longer. Please could someone shed some light on why resilvering takes so long (and restarts) ? Is there a patch we can apply to fix it ? many thanks for any info, Maciej
On Dec 25, 2007 3:19 AM, Maciej Olchowik <m.olchowik at ed.ac.uk> wrote:> Hi Folks, > > I have 3510 disk array connected to T2000 server running: > SunOS 5.10 Generic_118833-33 sun4v sparc SUNW,Sun-Fire-T200 > 12 disks (300G each) is exported from array and ZFS is used > to manage them (raidz with one hot spare). > > Few days ago we had a disk failure. First problem was that hot > spare hasn''t automatically kicked in, so I have run zpool replace > manually - resilvering started. > > It''s now 4th day and it''s still not finished. Also when I run zpool > status yesterday it was 56% complete, today it''s only 13%. > > This is our production machine and we really can''t afford this > service to be slow any longer. Please could someone shed some > light on why resilvering takes so long (and restarts) ? > > Is there a patch we can apply to fix it ? > > many thanks for any info, > > MaciejDo you have snapshots taking place (like in a cron job) during the resilver process? If so, you may be hitting a bug that the resilver will restart from the beginning whenever a new snapshot occurs. If you disable the snapshots during the resilver then it should complete to 100%. -Eric
Hi,> Do you have snapshots taking place (like in a cron job) during the > resilver process? If so, you may be hitting a bug that the resilver > will restart from the beginning whenever a new snapshot occurs. If > you disable the snapshots during the resilver then it should complete > to 100%.no, I don''t have snapshots taking place. I found that when I query zfs pool with "zpool status" it restarts resilvering process, strange ... Anyway, after ~10 days resilvering has finally completed to 100% resilver completed with 0 errors on Wed Jan 2 12:46:10 2008 The filesystem is still slow however. When I try to run zpool iostat it takes few hours to produce output, same with zfs create. I can''t even post output of "zpool status -v" as it takes that long to complete. We have 11 disks (+1 hot spare) in raidZ config, why is the filesystem so slow even now when hot spare has replaced faulty disk ? thanks, Maciej