Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it''s doing it all again. It''s happened on another pool as well, looking at a load av of around 40 on the box currently, just sitting there churning disk IO. OS is snv_134 on x86. # zpool status -x pool: content4 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver in progress for 147h39m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM content4 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c7t5000CCA221DE1E1Dd0 ONLINE 0 0 0 c7t5000CCA221DE17BFd0 ONLINE 0 0 0 c7t5000CCA221DE2229d0 ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 c7t5000CCA221DE0CC7d0 UNAVAIL 0 0 0 cannot open c7t5000CCA221F4EC54d0 ONLINE 0 0 0 5.63T resilvered c7t5000CCA221DE200Ad0 ONLINE 0 0 0 c7t5000CCA221DDFE6Ed0 ONLINE 0 0 0 c7t5000CCA221DE0103d0 ONLINE 0 0 0 c7t5000CCA221DE00C9d0 ONLINE 0 0 0 c7t5000CCA221DE0D2Dd0 ONLINE 0 0 0 c7t5000CCA221DE189Cd0 ONLINE 0 0 0 c7t5000CCA221DE18A7d0 ONLINE 0 0 0 c7t5000CCA221DE2A47d0 ONLINE 0 0 0 c7t5000CCA221DE1E48d0 ONLINE 0 0 0 c7t5000CCA221DE18A1d0 ONLINE 0 0 0 c7t5000CCA221DE18A2d0 ONLINE 0 0 0 c7t5000CCA221DE2A3Ed0 ONLINE 0 0 0 c7t5000CCA221DE2A42d0 ONLINE 0 0 0 c7t5000CCA221DE2225d0 UNAVAIL 0 0 0 cannot open c7t5000CCA221DE28A3d0 ONLINE 0 0 0 c7t5000CCA221DE2A46d0 ONLINE 0 0 0 c7t5000CCA221DE0789d0 ONLINE 0 0 0 c7t5000CCA221DE221Dd0 ONLINE 0 0 0 c7t5000CCA221DE054Fd0 ONLINE 0 0 0 c7t5000CCA221DE2EBEd0 ONLINE 0 0 0 errors: No known data errors -- Tom // www.portfast.co.uk // hosted services, domains, virtual machines, consultancy
On Fri, 17 Sep 2010, Tom Bird wrote:> Morning, > > c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? > > This is actually an old capture of the status output, it got to nearly 10T > before deciding that there was an error and not completing, reseat disk and > it''s doing it all again.You have twice as many big slow drives in a raidz2 that any sane person would recommend. It looks like you either have drives which are too weak to sustain resilvering a failed disk, or a chassis which is not strong enough. Your only option seems to be to also replace c7t5000CCA221DE2225d0 and hope for the best. Expect the replacement to take a very long time. It is wise to restart the pool from scratch with multiple vdevs comprised of fewer devices. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Fri, 17 Sep 2010, Tom Bird wrote: > >> Morning, >> >> c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? >> >> This is actually an old capture of the status output, it got to nearly >> 10T before deciding that there was an error and not completing, reseat >> disk and it''s doing it all again. > > You have twice as many big slow drives in a raidz2 that any sane person > would recommend. It looks like you either have drives which are too > weak to sustain resilvering a failed disk, or a chassis which is not > strong enough.The drives and the chassis are fine, what I am questioning is how can it be "resilvering" more data to a device than the capacity of the device?> Your only option seems to be to also replace c7t5000CCA221DE2225d0 and > hope for the best. Expect the replacement to take a very long time. > > It is wise to restart the pool from scratch with multiple vdevs > comprised of fewer devices.This stuff should just work, if it only rewrote the <2T that was meant to be on the drive the rebuild would take a day or so. -- Tom // www.portfast.co.uk // hosted services, domains, virtual machines, consultancy
On 09/18/10 04:28 AM, Tom Bird wrote:> Bob Friesenhahn wrote: >> On Fri, 17 Sep 2010, Tom Bird wrote: >> >>> Morning, >>> >>> c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? >>> >>> This is actually an old capture of the status output, it got to >>> nearly 10T before deciding that there was an error and not >>> completing, reseat disk and it''s doing it all again. >> >> You have twice as many big slow drives in a raidz2 that any sane >> person would recommend. It looks like you either have drives which >> are too weak to sustain resilvering a failed disk, or a chassis which >> is not strong enough. > > The drives and the chassis are fine, what I am questioning is how can > it be "resilvering" more data to a device than the capacity of the > device? >Is the pool in use? If so, data will be changing while the resliver is running. With such a ridiculously wide vdev and large drives, the resliver will take a very very long time it complete. if the pool is sufficiently busy, it may never complete.>> Your only option seems to be to also replace c7t5000CCA221DE2225d0 >> and hope for the best. Expect the replacement to take a very long time. >> >> It is wise to restart the pool from scratch with multiple vdevs >> comprised of fewer devices. > > This stuff should just work, if it only rewrote the <2T that was meant > to be on the drive the rebuild would take a day or so. >Bob''s comments about the pool design are correct, you have a disaster waiting to happen. -- Ian.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Tom Bird >We recently had a long discussion in this list, about resilver times versus raid types. In the end, the conclusion was: resilver code is very inefficient for raidzN. Someday it may be better optimized, but until that day comes, you really need to break your giant raidzN into smaller vdev''s. 3 vdev''s of 7 disk raidz is preferable over a 21 disk raidz3. If you want this resilver to complete, you should do anything you can to (a) stop taking snapshots (b) don''t scrub (c) stop all IO possible. And be patient. Most people in your situation find it faster to "zfs send" to some other storage, and then destroy & recreate the pool. I know it stinks. But that''s what you''re facing.
Hi all one of our system just developed something remotely similar: s06:~# zpool status pool: atlashome state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM atlashome DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 c7t0d0s0/o FAULTED 0 0 0 corrupted data c7t0d0 ONLINE 0 0 0 678G resilvered [...] It''s 100% done for more than a day now, system is running fully patched Solaris 10 (patchref from September 10th or 13th I believe) Has someone an idea how it is possible to resilver 678G of data on a 500G drive? s06:~# iostat -En c7t0d0 c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: HITACHI HDS7250S Revision: AV0A Serial No: Size: 500.11GB <500107861504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 197 Predictive Failure Analysis: 0 I still have to upgrade the zpool versin, but wanted to wait for the resilver to complete. Any ideas? Cheers Carsten
On 09/18/10 06:47 PM, Carsten Aulbert wrote:> Hi all > > one of our system just developed something remotely similar: > > > s06:~# zpool status > pool: atlashome > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go > config: > > NAME STATE READ WRITE CKSUM > atlashome DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > c0t0d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c5t0d0 ONLINE 0 0 0 > replacing-3 DEGRADED 0 0 0 > c7t0d0s0/o FAULTED 0 0 0 corrupted data > c7t0d0 ONLINE 0 0 0 678G resilvered > > [...] > > It''s 100% done for more than a day now, system is running fully patched > Solaris 10 (patchref from September 10th or 13th I believe) > > Has someone an idea how it is possible to resilver 678G of data on a 500G > drive? >I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. -- Ian.
Hi On Saturday 18 September 2010 10:02:42 Ian Collins wrote:> > I see this all the time on a troublesome Thumper. I believe this > happens because the data in the pool is continuously changing.Ah ok, that may be, there is one particular active user on this box right now. Interesting I''ve never seen this in the past. Is there really an end to this and do I just have to wait? Cheers Carsten
On 09/18/10 08:58 PM, Carsten Aulbert wrote:> Hi > > On Saturday 18 September 2010 10:02:42 Ian Collins wrote: > >> I see this all the time on a troublesome Thumper. I believe this >> happens because the data in the pool is continuously changing. >> > Ah ok, that may be, there is one particular active user on this box right now. > > Interesting I''ve never seen this in the past. > > Is there really an end to this and do I just have to wait? > >Oh yes, the last one I had was 100% done for about 40 hours! -- Ian.
On 18/09/10 09:02, Ian Collins wrote:> On 09/18/10 06:47 PM, Carsten Aulbert wrote:>> Has someone an idea how it is possible to resilver 678G of data on a 500G >> drive? > > I see this all the time on a troublesome Thumper. I believe this happens > because the data in the pool is continuously changing.In my case, other than an hourly snapshot, the data is not significantly changing. It''d be nice to see a response other than "you''re doing it wrong", rebuilding 5x the data on a drive relative to its capacity is clearly erratic behaviour, I am curious as to what is actually happening. All said and done though, we will have to live with snv_134''s bugs from now on, or perhaps I could try Sol 10. Tom
On Sat, Sep 18, 2010 at 7:01 PM, Tom Bird <tom at marmot.org.uk> wrote:> All said and done though, we will have to live with snv_134''s bugs from now > on, or perhaps I could try Sol 10. >or OpenIllumos. Or Nexenta. Or FreeBSD. Or <insert osol distro name>. -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
On 18/09/10 13:06, Edho P Arief wrote:> On Sat, Sep 18, 2010 at 7:01 PM, Tom Bird<tom at marmot.org.uk> wrote: >> All said and done though, we will have to live with snv_134''s bugs from now >> on, or perhaps I could try Sol 10. > > or OpenIllumos. Or Nexenta. Or FreeBSD. Or<insert osol distro name>.... none of which will receive ZFS code updates unless Oracle deigns to bestow them upon the community, this or ZFS dev is taken over by said community, in which case we end up with diverging code bases that would be a sisyphean task to try and merge. Tom
But all of which have newer code, today, than onnv-134. On 18 September 2010 22:20, Tom Bird <tom at marmot.org.uk> wrote:> On 18/09/10 13:06, Edho P Arief wrote: > >> On Sat, Sep 18, 2010 at 7:01 PM, Tom Bird<tom at marmot.org.uk> wrote: >> >>> All said and done though, we will have to live with snv_134''s bugs from >>> now >>> on, or perhaps I could try Sol 10. >>> >> >> or OpenIllumos. Or Nexenta. Or FreeBSD. Or<insert osol distro name>. >> > > ... none of which will receive ZFS code updates unless Oracle deigns to > bestow them upon the community, this or ZFS dev is taken over by said > community, in which case we end up with diverging code bases that would be a > sisyphean task to try and merge. > > Tom > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100918/ec85517a/attachment.html>
Tom Bird wrote:> On 18/09/10 09:02, Ian Collins wrote: > > > In my case, other than an hourly snapshot, the data is not significantly > changing. > > It''d be nice to see a response other than "you''re doing it wrong", > rebuilding 5x the data on a drive relative to its capacity is clearly > erratic behaviour, I am curious as to what is actually happening. > > All said and done though, we will have to live with snv_134''s bugs from > now on, or perhaps I could try Sol 10. > > Tom > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussIt sounds like you''re hitting ''6891824 7410 NAS head "continually resilvering" following HDD replacement''. If you stop taking and destroying snapshots you should see the resilver finish. Thanks, George
On 09/19/10 12:01 AM, Tom Bird wrote:> On 18/09/10 09:02, Ian Collins wrote: >> On 09/18/10 06:47 PM, Carsten Aulbert wrote: > >>> Has someone an idea how it is possible to resilver 678G of data on a >>> 500G >>> drive? >> >> I see this all the time on a troublesome Thumper. I believe this happens >> because the data in the pool is continuously changing. > > In my case, other than an hourly snapshot, the data is not > significantly changing. > > It''d be nice to see a response other than "you''re doing it wrong", > rebuilding 5x the data on a drive relative to its capacity is clearly > erratic behaviour, I am curious as to what is actually happening. >The ridiculous pool design isn''t helping! -- Ian.
Hi,> The drives and the chassis are fine, what I am questioning is how can it > be "resilvering" more data to a device than the capacity of the device?If data on pool has changed during resilver, resilver counter will not update accordingly, and it will show resilvering 100% for needed time to catch up. Yours Markus Kovero
On 19 September, 2010 - Markus Kovero sent me these 0,5K bytes:> Hi, > > > The drives and the chassis are fine, what I am questioning is how can it > > be "resilvering" more data to a device than the capacity of the device? > > If data on pool has changed during resilver, resilver counter will not > update accordingly, and it will show resilvering 100% for needed time > to catch up.I believe this was fixed recently, by displaying how many blocks it has checked vs how many to check... /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
On 18/09/10 15:25, George Wilson wrote:> Tom Bird wrote:>> In my case, other than an hourly snapshot, the data is not >> significantly changing. >> >> It''d be nice to see a response other than "you''re doing it wrong", >> rebuilding 5x the data on a drive relative to its capacity is clearly >> erratic behaviour, I am curious as to what is actually happening.> It sounds like you''re hitting ''6891824 7410 NAS head "continually > resilvering" following HDD replacement''. If you stop taking and > destroying snapshots you should see the resilver finish.George, I think you''ve won the prize. I suspended the snapshots last night and this morning one pool had completed, one left to go. Thanks, Tom