Hi list, Here''s my case : pool: mypool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM filerbackup13 DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t8d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c0t9d0 OFFLINE 0 0 0 c0t23d0 ONLINE 0 0 0 454G resilvered c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c0t13d0 ONLINE 0 0 0 c0t14d0 ONLINE 0 0 0 c0t15d0 ONLINE 0 0 0 c0t16d0 ONLINE 0 0 0 c0t17d0 ONLINE 0 0 0 c0t18d0 ONLINE 0 0 0 c0t19d0 ONLINE 0 0 0 c0t20d0 ONLINE 0 0 0 c0t21d0 ONLINE 0 0 0 c0t22d0 ONLINE 0 0 0 After having launched replace command, I had to offlined c0t9d0 because it was generating too many warnings and slow down i/os. Now replace seems to be finished but zpool status still displays "replacing" and according to scrub status, resilver seems to continue ? Any idea how to clarify this situation ? Thanks. -- Francois
> After having launched replace command, I had to offlined c0t9d0 > because > it was generating too many warnings and slow down i/os. > > Now replace seems to be finished but zpool status still displays > "replacing" and according to scrub status, resilver seems to continue > ? > > Any idea how to clarify this situation ?I''ve seen this happen earlier, and then, the resilvering (or scrub) was finished after a while - an hour or so. Watching iostat -xd showed high i/o traffic (without much from the users). - What sort of drives are you using? - For how long has the pool been in ''100% done'', while still resilvering? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
If you have one zpool consisting of only one large raidz2, then you have a slow raid. To reach high speed, you need maximum 8 drives in each raidz2. So one of the reasons it takes time, is because you have too many drives in your raidz2. Everything would be much faster if you split your zpool into two raidz2, each consisting of 7 or 8 drives. Then it would be fast. -- This message posted from opensolaris.org
----- Original Message -----> If you have one zpool consisting of only one large raidz2, then you > have a slow raid. To reach high speed, you need maximum 8 drives in > each raidz2. So one of the reasons it takes time, is because you have > too many drives in your raidz2. Everything would be much faster if you > split your zpool into two raidz2, each consisting of 7 or 8 drives. > Then it would be fast.Keeping the VDEVs small is one thing, but this is about resilvering spending far more time than reported. The same applies to scrubbing at times. Would it be hard to rewrite the reporting mechanisms in ZFS to report something more likely, than just a first guess? ZFS scrub reports tremendous times at start, but slows down after it''s worked it''s way through the metadata. What ZFS is doing when the system still scrubs after 100 hours at 100% is beyond my knowledge. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On 05 July, 2010 - Roy Sigurd Karlsbakk sent me these 1,9K bytes:> ----- Original Message ----- > > If you have one zpool consisting of only one large raidz2, then you > > have a slow raid. To reach high speed, you need maximum 8 drives in > > each raidz2. So one of the reasons it takes time, is because you have > > too many drives in your raidz2. Everything would be much faster if you > > split your zpool into two raidz2, each consisting of 7 or 8 drives. > > Then it would be fast. > > Keeping the VDEVs small is one thing, but this is about resilvering spending far more time than reported. The same applies to scrubbing at times. > > Would it be hard to rewrite the reporting mechanisms in ZFS to report > something more likely, than just a first guess? ZFS scrub reports > tremendous times at start, but slows down after it''s worked it''s way > through the metadata. What ZFS is doing when the system still scrubs > after 100 hours at 100% is beyond my knowledge.I believe it''s something like this: * When starting, it notes the number of blocks to visit * .. visiting blocks ... * .. adding more data (which then will be beyond the original 100%) .. and visiting blocks ... * .. reaching the initial "last block", which since then has gotten lots of new friends afterwards. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6899970 /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
On 07/ 6/10 02:21 AM, Francois wrote:> Hi list, > > Here''s my case : > > pool: mypool > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 147h19m, 100.00% done, 0h0m to go > config: ><snip>> > After having launched replace command, I had to offlined c0t9d0 > because it was generating too many warnings and slow down i/os. > > Now replace seems to be finished but zpool status still displays > "replacing" and according to scrub status, resilver seems to continue ? >As others have noted, your wide raidz2 will be slow to resilver. As for the reported progress, I see this all the time with an x4500. The resilver is often 100% done for over half of the real resilver time (which is normally >100 hours for a 500G drive in an 8 drive raidz). This box is a backup server, so there is a fair amount of churn, which I assume confuses the reporting. -- Ian.