Bryan Horstmann-Allen
2010-Nov-19 14:57 UTC
[zfs-discuss] Replacing log devices takes ages
Disclaimer: Solaris 10 U8. I had an SSD die this morning and am in the process of replacing the 1GB partition which was part of a log mirror. The SSDs do nothing else. The resilver has been running for ~30m, and suggests it will finish sometime before Elvis returns from Andromeda, though perhaps only just barely (we''ll probably have to run to the airport to meet him at security). scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go According to zpool iostat -v, the log contains ~900k of data on it. The disks are not particularly busy (c0t3d0 is the replacing disk): # iostat -xne c0t3d0 c0t5d0 5 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.2 0.1 0.6 5.0 0.0 0.0 0.0 5.7 0 0 0 0 0 0 c0t3d0 5.3 52.3 68.2 1694.1 0.0 0.2 0.0 4.2 0 2 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 3.0 112.6 0.9 6064.0 0.0 0.1 0.0 0.8 0 9 0 0 0 0 c0t3d0 6.4 118.8 39.5 6519.7 0.0 0.0 0.0 0.3 0 3 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 1.0 50.2 0.3 5068.8 0.0 1.4 0.0 27.5 0 6 0 0 0 0 c0t3d0 36.0 61.8 534.1 5921.6 0.0 0.5 0.0 5.5 0 6 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 58.0 0.0 1590.4 0.0 0.0 0.0 0.8 0 3 0 0 0 0 c0t3d0 39.2 67.0 651.3 1884.9 0.0 0.0 0.0 0.5 0 3 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 23.4 0.0 678.3 0.0 0.0 0.0 0.4 0 1 0 0 0 0 c0t3d0 11.8 30.6 135.0 1025.4 0.0 0.0 0.0 0.3 0 1 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 20.2 0.0 1045.0 0.0 0.0 0.0 1.2 0 1 0 0 0 0 c0t3d0 14.8 25.8 131.9 1335.7 0.0 0.0 0.0 0.4 0 1 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 33.0 0.0 2029.6 0.0 0.1 0.0 1.9 0 2 0 0 0 0 c0t3d0 1.8 37.6 37.9 2107.0 0.0 0.0 0.0 0.6 0 1 2 0 0 2 c0t5d0 extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 21.2 0.0 797.6 0.0 0.0 0.0 0.7 0 1 0 0 0 0 c0t3d0 12.2 22.8 111.9 823.2 0.0 0.0 0.0 0.4 0 1 2 0 0 2 c0t5d0 My question is twofold: Why do log mirrors need to resilver at all? Why does this seem like it''s going to take a full day, if I''m lucky? (If the answer is: Shut up and upgrade, that''s fine.) Cheers. -- bdha cyberpunk is dead. long live cyberpunk.
I''m not sure that leaving the ZIL enabled whilst replacing the log devices is a good idea? Also - I had no idea Elvis was coming back tomorrow! Sweet. ;-) --- W. A. Khushil Dep - khushil.dep at gmail.com - 07905374843 Visit my blog at http://www.khushil.com/ On 19 November 2010 14:57, Bryan Horstmann-Allen <bda at mirrorshades.net>wrote:> Disclaimer: Solaris 10 U8. > > I had an SSD die this morning and am in the process of replacing the 1GB > partition which was part of a log mirror. The SSDs do nothing else. > > The resilver has been running for ~30m, and suggests it will finish > sometime > before Elvis returns from Andromeda, though perhaps only just barely (we''ll > probably have to run to the airport to meet him at security). > > scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go > scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go > scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go > scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go > scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go > scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go > scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go > scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go > scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go > scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go > scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go > > According to zpool iostat -v, the log contains ~900k of data on it. > > The disks are not particularly busy (c0t3d0 is the replacing disk): > > # iostat -xne c0t3d0 c0t5d0 5 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.2 0.1 0.6 5.0 0.0 0.0 0.0 5.7 0 0 0 0 0 0 > c0t3d0 > 5.3 52.3 68.2 1694.1 0.0 0.2 0.0 4.2 0 2 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 3.0 112.6 0.9 6064.0 0.0 0.1 0.0 0.8 0 9 0 0 0 0 > c0t3d0 > 6.4 118.8 39.5 6519.7 0.0 0.0 0.0 0.3 0 3 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 1.0 50.2 0.3 5068.8 0.0 1.4 0.0 27.5 0 6 0 0 0 0 > c0t3d0 > 36.0 61.8 534.1 5921.6 0.0 0.5 0.0 5.5 0 6 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.0 58.0 0.0 1590.4 0.0 0.0 0.0 0.8 0 3 0 0 0 0 > c0t3d0 > 39.2 67.0 651.3 1884.9 0.0 0.0 0.0 0.5 0 3 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.0 23.4 0.0 678.3 0.0 0.0 0.0 0.4 0 1 0 0 0 0 > c0t3d0 > 11.8 30.6 135.0 1025.4 0.0 0.0 0.0 0.3 0 1 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.0 20.2 0.0 1045.0 0.0 0.0 0.0 1.2 0 1 0 0 0 0 > c0t3d0 > 14.8 25.8 131.9 1335.7 0.0 0.0 0.0 0.4 0 1 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.0 33.0 0.0 2029.6 0.0 0.1 0.0 1.9 0 2 0 0 0 0 > c0t3d0 > 1.8 37.6 37.9 2107.0 0.0 0.0 0.0 0.6 0 1 2 0 0 2 > c0t5d0 > extended device statistics ---- errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot > device > 0.0 21.2 0.0 797.6 0.0 0.0 0.0 0.7 0 1 0 0 0 0 > c0t3d0 > 12.2 22.8 111.9 823.2 0.0 0.0 0.0 0.4 0 1 2 0 0 2 > c0t5d0 > > My question is twofold: > > Why do log mirrors need to resilver at all? > > Why does this seem like it''s going to take a full day, if I''m lucky? > > (If the answer is: Shut up and upgrade, that''s fine.) > > Cheers. > -- > bdha > cyberpunk is dead. long live cyberpunk. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101119/d50e53cc/attachment.html>