Since I got my zfs pool working under solaris (I talked on this list last week about moving it from linux & bsd to solaris, and the pain that was), I''m seeing very good reads, but nada for writes. Reads: root at shebop:/data/dvds# rsync -aP young_frankenstein.iso /tmp sending incremental file list young_frankenstein.iso ^C1032421376 20% 86.23MB/s 0:00:44 Writes: root at shebop:/data/dvds# rsync -aP /tmp/young_frankenstein.iso yf.iso sending incremental file list young_frankenstein.iso ^C 68976640 6% 2.50MB/s 0:06:42 This is pretty typical of what I''m seeing. root at shebop:/data/dvds# zpool status -v pool: datapool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 errors: No known data errors pool: syspool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM syspool ONLINE 0 0 0 c0d1s0 ONLINE 0 0 0 errors: No known data errors (This is while running an rsync from a remote machine to a ZFS filesystem) root at shebop:/data/dvds# iostat -xn 5 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 11.1 4.8 395.8 275.9 5.8 0.1 364.7 4.3 2 5 c0d1 9.8 10.9 514.3 346.4 6.8 1.4 329.7 66.7 68 70 c5d0 9.8 10.9 516.6 346.4 6.7 1.4 323.1 66.2 67 70 c6d0 9.7 10.9 491.3 346.3 6.7 1.4 324.7 67.2 67 70 c3d0 9.8 10.9 519.9 346.3 6.8 1.4 326.7 67.2 68 71 c4d0 9.8 11.0 493.5 346.6 3.6 0.8 175.3 37.9 38 41 c2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d1 64.6 12.6 8207.4 382.1 32.8 2.0 424.7 25.9 100 100 c5d0 62.2 12.2 7203.2 370.1 27.9 2.0 375.1 26.7 99 100 c6d0 53.2 11.8 5973.9 390.2 25.9 2.0 398.8 30.5 98 99 c3d0 49.4 10.6 5398.2 389.8 30.2 2.0 503.7 33.3 99 100 c4d0 45.2 12.8 5431.4 337.0 14.3 1.0 247.3 17.9 52 52 c2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 Any ideas? Paul
Oh, for the record, the drives are 1.5TB SATA, in a 4+1 raidz-1 config. All the drives are on the same LSI 150-6 PCI controller card, and the M/B is a generic something or other with a triple-core, and 2GB RAM. Paul 3:34pm, Paul Archer wrote:> Since I got my zfs pool working under solaris (I talked on this list last > week about moving it from linux & bsd to solaris, and the pain that was), I''m > seeing very good reads, but nada for writes. >
This controller card, you have turned off any raid functionality, yes? ZFS has total control of all discs, by itself? No hw raid intervening? -- This message posted from opensolaris.org
paul at paularcher.org
2009-Sep-26 19:05 UTC
[zfs-discuss] extremely slow writes (with good reads)
> This controller card, you have turned off any raid functionality, yes? ZFS > has total control of all discs, by itself? No hw raid intervening? > -- > This message posted from opensolaris.org > >yes, it''s an LSI 150-6, with the BIOS turned off, which turns it into a dumb SATA card. Paul
So, after *much* wrangling, I managed to take on of my drives offline, relabel/repartition it (because I saw that the first sector was 34, not 256, and realized there could be an alignment issue), and get it back into the pool. Problem is that while it''s back, the performance is horrible. It''s resilvering at about (according to iostat) 3.5MB/sec. And at some point, I was zeroing out the drive (with ''dd if=/dev/zero of=/dev/dsk/c7d0''), and iostat showed me that the drive was only writing at around 3.5MB/sec. *And* it showed reads of about the same 3.5MB/sec even during the dd. This same hardware and even the same zpool have been run under linux with zfs-fuse and BSD, and with BSD at least, performance was much better. A complete resilver under BSD took 6 hours. Right now zpool is estimating this resilver to take 36. Could this be a driver problem? Something to do with the fact that this is a very old SATA card (LSI 150-6)? This is driving me crazy. I finally got my zpool working under Solaris so I''d have some stability, and I''ve got no performance. Paul Archer Friday, Paul Archer wrote:> Since I got my zfs pool working under solaris (I talked on this list last > week about moving it from linux & bsd to solaris, and the pain that was), I''m > seeing very good reads, but nada for writes. > > Reads: > > root at shebop:/data/dvds# rsync -aP young_frankenstein.iso /tmp > sending incremental file list > young_frankenstein.iso > ^C1032421376 20% 86.23MB/s 0:00:44 > > Writes: > > root at shebop:/data/dvds# rsync -aP /tmp/young_frankenstein.iso yf.iso > sending incremental file list > young_frankenstein.iso > ^C 68976640 6% 2.50MB/s 0:06:42 > > > This is pretty typical of what I''m seeing. > > > root at shebop:/data/dvds# zpool status -v > pool: datapool > state: ONLINE > status: The pool is formatted using an older on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using ''zpool upgrade''. Once this is done, the > pool will no longer be accessible on older software versions. > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > datapool ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2d0s0 ONLINE 0 0 0 > c3d0s0 ONLINE 0 0 0 > c4d0s0 ONLINE 0 0 0 > c6d0s0 ONLINE 0 0 0 > c5d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: syspool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > syspool ONLINE 0 0 0 > c0d1s0 ONLINE 0 0 0 > > errors: No known data errors > > (This is while running an rsync from a remote machine to a ZFS filesystem) > root at shebop:/data/dvds# iostat -xn 5 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 11.1 4.8 395.8 275.9 5.8 0.1 364.7 4.3 2 5 c0d1 > 9.8 10.9 514.3 346.4 6.8 1.4 329.7 66.7 68 70 c5d0 > 9.8 10.9 516.6 346.4 6.7 1.4 323.1 66.2 67 70 c6d0 > 9.7 10.9 491.3 346.3 6.7 1.4 324.7 67.2 67 70 c3d0 > 9.8 10.9 519.9 346.3 6.8 1.4 326.7 67.2 68 71 c4d0 > 9.8 11.0 493.5 346.6 3.6 0.8 175.3 37.9 38 41 c2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d1 > 64.6 12.6 8207.4 382.1 32.8 2.0 424.7 25.9 100 100 c5d0 > 62.2 12.2 7203.2 370.1 27.9 2.0 375.1 26.7 99 100 c6d0 > 53.2 11.8 5973.9 390.2 25.9 2.0 398.8 30.5 98 99 c3d0 > 49.4 10.6 5398.2 389.8 30.2 2.0 503.7 33.3 99 100 c4d0 > 45.2 12.8 5431.4 337.0 14.3 1.0 247.3 17.9 52 52 c2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 > > > Any ideas? > > Paul >
On Sep 27, 2009, at 3:19 AM, Paul Archer <paul at paularcher.org> wrote:> So, after *much* wrangling, I managed to take on of my drives > offline, relabel/repartition it (because I saw that the first sector > was 34, not 256, and realized there could be an alignment issue), > and get it back into the pool. > > Problem is that while it''s back, the performance is horrible. It''s > resilvering at about (according to iostat) 3.5MB/sec. And at some > point, I was zeroing out the drive (with ''dd if=/dev/zero of=/dev/ > dsk/c7d0''), and iostat showed me that the drive was only writing at > around 3.5MB/sec. *And* it showed reads of about the same 3.5MB/sec > even during the dd. > > This same hardware and even the same zpool have been run under linux > with zfs-fuse and BSD, and with BSD at least, performance was much > better. A complete resilver under BSD took 6 hours. Right now zpool > is estimating this resilver to take 36. > > Could this be a driver problem? Something to do with the fact that > this is a very old SATA card (LSI 150-6)? > > This is driving me crazy. I finally got my zpool working under > Solaris so I''d have some stability, and I''ve got no performance.It appears your controller is preventing ZFS from enabling write cache. I''m not familiar with that model. You will need to find a way to enable the drives write cache manually. -Ross
>> Problem is that while it''s back, the performance is horrible. It''s >> resilvering at about (according to iostat) 3.5MB/sec. And at some point, I >> was zeroing out the drive (with ''dd if=/dev/zero of=/dev/dsk/c7d0''), and >> iostat showed me that the drive was only writing at around 3.5MB/sec. *And* >> it showed reads of about the same 3.5MB/sec even during the dd. >> >> This same hardware and even the same zpool have been run under linux with >> zfs-fuse and BSD, and with BSD at least, performance was much better. A >> complete resilver under BSD took 6 hours. Right now zpool is estimating >> this resilver to take 36. >> >> Could this be a driver problem? Something to do with the fact that this is >> a very old SATA card (LSI 150-6)? >> >> This is driving me crazy. I finally got my zpool working under Solaris so >> I''d have some stability, and I''ve got no performance. >> It appears your controller is preventing ZFS from enabling write cache. > > I''m not familiar with that model. You will need to find a way to enable the > drives write cache manually. >My controller, while normally a full RAID controller, has had its BIOS turned off, so it''s acting as a simple SATA controller. Plus, I''m seeing this same slow performance with dd, not just with ZFS. And I wouldn''t think that write caching would make a difference with using dd (especially writing in from /dev/zero). The other thing that''s weird is the writes. I am seeing writes in that 3.5MB/sec range during the resilver, *and* I was seeing the same thing during the dd. This is from the resilver, but again, the dd was similar. c7d0 is the device in question: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 80.4 0.0 3417.6 0.0 0.3 0.3 3.3 3.2 8 14 c8d0 80.4 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c9d0 80.6 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c10d0 80.6 0.0 3417.6 0.0 0.3 0.3 3.3 3.1 9 14 c11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12t0d0 Paul Archer
On Sep 27, 2009, at 11:49 AM, Paul Archer <paul at paularcher.org> wrote:>>> Problem is that while it''s back, the performance is horrible. It''s >>> resilvering at about (according to iostat) 3.5MB/sec. And at some >>> point, I was zeroing out the drive (with ''dd if=/dev/zero of=/dev/ >>> dsk/c7d0''), and iostat showed me that the drive was only writing >>> at around 3.5MB/sec. *And* it showed reads of about the same 3.5MB/ >>> sec even during the dd. >>> This same hardware and even the same zpool have been run under >>> linux with zfs-fuse and BSD, and with BSD at least, performance >>> was much better. A complete resilver under BSD took 6 hours. Right >>> now zpool is estimating this resilver to take 36. >>> Could this be a driver problem? Something to do with the fact that >>> this is a very old SATA card (LSI 150-6)? >>> This is driving me crazy. I finally got my zpool working under >>> Solaris so I''d have some stability, and I''ve got no performance. >> > > >> It appears your controller is preventing ZFS from enabling write >> cache. >> >> I''m not familiar with that model. You will need to find a way to >> enable the drives write cache manually. >> > > My controller, while normally a full RAID controller, has had its > BIOS turned off, so it''s acting as a simple SATA controller. Plus, > I''m seeing this same slow performance with dd, not just with ZFS. > And I wouldn''t think that write caching would make a difference with > using dd (especially writing in from /dev/zero).I don''t think you got what I said. Because the controller normally runs as a RAID controller the controller controls the SATA drives'' on- board write cache, it may not allow the OS to enable/disable the drives'' on-board write cache. Using ''dd'' to the raw disk will also show the same poor performance if the HD on-board write-cache is disabled.> The other thing that''s weird is the writes. I am seeing writes in > that 3.5MB/sec range during the resilver, *and* I was seeing the > same thing during the dd.Was the ''dd'' to the raw disk? Either was it shows the HDs aren''t setup properly.> This is from the resilver, but again, the dd was similar. c7d0 is > the device in question: > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 > 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 > 80.4 0.0 3417.6 0.0 0.3 0.3 3.3 3.2 8 14 c8d0 > 80.4 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c9d0 > 80.6 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c10d0 > 80.6 0.0 3417.6 0.0 0.3 0.3 3.3 3.1 9 14 c11d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12t0d0Try using ''format -e'' on the drives, go into ''cache'' then ''write- cache'' and display the current state. You can try to manually enable it from there. -Ross
>> >> My controller, while normally a full RAID controller, has had its BIOS >> turned off, so it''s acting as a simple SATA controller. Plus, I''m seeing >> this same slow performance with dd, not just with ZFS. And I wouldn''t think >> that write caching would make a difference with using dd (especially >> writing in from /dev/zero). > > I don''t think you got what I said. Because the controller normally runs as a > RAID controller the controller controls the SATA drives'' on-board write > cache, it may not allow the OS to enable/disable the drives'' on-board write > cache. >I see what you''re saying. I just think that with the BIOS turned off, this card is essentially acting like a dumb SATA controller, and therefore not doing anything with the drives'' cache.> Using ''dd'' to the raw disk will also show the same poor performance if the HD > on-board write-cache is disabled. > >> The other thing that''s weird is the writes. I am seeing writes in that >> 3.5MB/sec range during the resilver, *and* I was seeing the same thing >> during the dd. > > Was the ''dd'' to the raw disk? Either was it shows the HDs aren''t setup > properly. > >> This is from the resilver, but again, the dd was similar. c7d0 is the >> device in question: >> >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 >> 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 >> 80.4 0.0 3417.6 0.0 0.3 0.3 3.3 3.2 8 14 c8d0 >> 80.4 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c9d0 >> 80.6 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c10d0 >> 80.6 0.0 3417.6 0.0 0.3 0.3 3.3 3.1 9 14 c11d0 >> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12t0d0 > > Try using ''format -e'' on the drives, go into ''cache'' then ''write-cache'' and > display the current state. You can try to manually enable it from there. >I tried this, but the ''cache'' menu item didn''t show up. The man page says it only works for SCSI disks. Do you know of any other way to get/set those parameters? Paul
On Sep 27, 2009, at 1:44 PM, Paul Archer <paul at paularcher.org> wrote:>>> My controller, while normally a full RAID controller, has had its >>> BIOS turned off, so it''s acting as a simple SATA controller. Plus, >>> I''m seeing this same slow performance with dd, not just with ZFS. >>> And I wouldn''t think that write caching would make a difference >>> with using dd (especially writing in from /dev/zero). >> >> I don''t think you got what I said. Because the controller normally >> runs as a RAID controller the controller controls the SATA drives'' >> on-board write cache, it may not allow the OS to enable/disable the >> drives'' on-board write cache. >> > I see what you''re saying. I just think that with the BIOS turned > off, this card is essentially acting like a dumb SATA controller, > and therefore not doing anything with the drives'' cache.You are probably right that the controller doesn''t do anything, neither enables or disables the drives'' cache, so whatever they were set to before it was switched to JBOD mode is what they are now.>> Using ''dd'' to the raw disk will also show the same poor performance >> if the HD on-board write-cache is disabled. >> >>> The other thing that''s weird is the writes. I am seeing writes in >>> that 3.5MB/sec range during the resilver, *and* I was seeing the >>> same thing during the dd. >> >> Was the ''dd'' to the raw disk? Either was it shows the HDs aren''t >> setup properly. >> >>> This is from the resilver, but again, the dd was similar. c7d0 is >>> the device in question: >>> >>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >>> 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 >>> 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 >>> 80.4 0.0 3417.6 0.0 0.3 0.3 3.3 3.2 8 14 c8d0 >>> 80.4 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c9d0 >>> 80.6 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c10d0 >>> 80.6 0.0 3417.6 0.0 0.3 0.3 3.3 3.1 9 14 c11d0 >>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12t0d0 >> >> Try using ''format -e'' on the drives, go into ''cache'' then ''write- >> cache'' and display the current state. You can try to manually >> enable it from there. >> > > I tried this, but the ''cache'' menu item didn''t show up. The man page > says it only works for SCSI disks. Do you know of any other way to > get/set those parameters?Hmm, I thought SATA under Solaris behaves like SCSI? I use RAID controllers that export as each-disk-raid0 which uses the SCSI command set so it abstracts the SATA disks, so I have no way to verify myself. -Ross
Richard Elling
2009-Sep-27 20:19 UTC
[zfs-discuss] extremely slow writes (with good reads)
On Sep 27, 2009, at 8:49 AM, Paul Archer wrote:>>> Problem is that while it''s back, the performance is horrible. It''s >>> resilvering at about (according to iostat) 3.5MB/sec. And at some >>> point, I was zeroing out the drive (with ''dd if=/dev/zero of=/dev/ >>> dsk/c7d0''), and iostat showed me that the drive was only writing >>> at around 3.5MB/sec. *And* it showed reads of about the same 3.5MB/ >>> sec even during the dd. >>> This same hardware and even the same zpool have been run under >>> linux with zfs-fuse and BSD, and with BSD at least, performance >>> was much better. A complete resilver under BSD took 6 hours. Right >>> now zpool is estimating this resilver to take 36. >>> Could this be a driver problem? Something to do with the fact that >>> this is a very old SATA card (LSI 150-6)? >>> This is driving me crazy. I finally got my zpool working under >>> Solaris so I''d have some stability, and I''ve got no performance. >> > > >> It appears your controller is preventing ZFS from enabling write >> cache. >> >> I''m not familiar with that model. You will need to find a way to >> enable the drives write cache manually. >> > > My controller, while normally a full RAID controller, has had its > BIOS turned off, so it''s acting as a simple SATA controller. Plus, > I''m seeing this same slow performance with dd, not just with ZFS. > And I wouldn''t think that write caching would make a difference with > using dd (especially writing in from /dev/zero). > > The other thing that''s weird is the writes. I am seeing writes in > that 3.5MB/sec range during the resilver, *and* I was seeing the > same thing during the dd. > This is from the resilver, but again, the dd was similar. c7d0 is > the device in question: > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 > 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0This is the bottleneck. 29.2 ms average service time is slow. As you can see, this causes a backup in the queue, which is seeing an average service time of 206 ms. The problem could be the disk itself or anything in the path to that disk, including software. But first, look for hardware issues via iostat -E fmadm faulty fmdump -eV -- richard> 80.4 0.0 3417.6 0.0 0.3 0.3 3.3 3.2 8 14 c8d0 > 80.4 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c9d0 > 80.6 0.0 3417.6 0.0 0.3 0.3 3.4 3.2 9 14 c10d0 > 80.6 0.0 3417.6 0.0 0.3 0.3 3.3 3.1 9 14 c11d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c12t0d0 > > > Paul Archer > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
1:19pm, Richard Elling wrote:>> The other thing that''s weird is the writes. I am seeing writes in that >> 3.5MB/sec range during the resilver, *and* I was seeing the same thing >> during the dd. >> This is from the resilver, but again, the dd was similar. c7d0 is the >> device in question: >> >> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 238.0 0.0 476.0 0.0 1.0 0.0 4.1 0 99 c12d1 >> 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 > > This is the bottleneck. 29.2 ms average service time is slow. > As you can see, this causes a backup in the queue, which is > seeing an average service time of 206 ms. > > The problem could be the disk itself or anything in the path > to that disk, including software. But first, look for hardware > issues via > iostat -E > fmadm faulty > fmdump -eV >I don''t see anything in the output of these commands except for the ZFS errors from when I was trying to get the disk online and resilvered. I estimate another 10-15 hours before this disk is finished resilvering and the zpool is OK again. At that time, I''m going to switch some hardware out (I''ve got a newer and higher-end LSI card that I hadn''t used before because it''s PCI-X, and won''t fit on my current motherboard.) I''ll report back what I get with it tomorrow or the next day, depending on the timing on the resilver. Paul Archer
Yesterday, Paul Archer wrote:> > I estimate another 10-15 hours before this disk is finished resilvering and > the zpool is OK again. At that time, I''m going to switch some hardware out > (I''ve got a newer and higher-end LSI card that I hadn''t used before because > it''s PCI-X, and won''t fit on my current motherboard.) > I''ll report back what I get with it tomorrow or the next day, depending on > the timing on the resilver. > > Paul Archer >And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won''t come online: root at shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I''ve tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? Paul PS I don''t suppose there''s an RFE out there for "give useful data when a pool is unavailable." Or even better, "allow a pool to be imported (but no filesystems mounted) so it *can be fixed*."
8:30am, Paul Archer wrote:> And the hits just keep coming... > The resilver finished last night, so rebooted the box as I had just upgraded > to the latest Dev build. Not only did the upgrade fail (love that instant > rollback!), but now the zpool won''t come online: > > root at shebop:~# zpool import > pool: datapool > id: 3410059226836265661 > state: UNAVAIL > status: The pool is formatted using an older on-disk version. > action: The pool cannot be imported due to damaged devices or data. > config: > > datapool UNAVAIL insufficient replicas > raidz1 UNAVAIL corrupted data > c7d0 ONLINE > c8d0s0 ONLINE > c9d0s0 ONLINE > c11d0s0 ONLINE > c10d0s0 ONLINE > > > I''ve tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. > Is it OK to scream and tear my hair out now? >A little more research came up with this: root at shebop:~# zdb -l /dev/dsk/c7d0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3 While ''zdb -l /dev/dsk/c7d0s0'' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can''t do a ''zpool replace'' because the zpool isn''t online. Paul
Victor Latushkin
2009-Sep-28 15:56 UTC
[zfs-discuss] extremely slow writes (with good reads)
On 28.09.09 18:09, Paul Archer wrote:> 8:30am, Paul Archer wrote: > >> And the hits just keep coming... >> The resilver finished last night, so rebooted the box as I had just >> upgraded to the latest Dev build. Not only did the upgrade fail (love >> that instant rollback!), but now the zpool won''t come online: >> >> root at shebop:~# zpool import >> pool: datapool >> id: 3410059226836265661 >> state: UNAVAIL >> status: The pool is formatted using an older on-disk version. >> action: The pool cannot be imported due to damaged devices or data. >> config: >> >> datapool UNAVAIL insufficient replicas >> raidz1 UNAVAIL corrupted data >> c7d0 ONLINE >> c8d0s0 ONLINE >> c9d0s0 ONLINE >> c11d0s0 ONLINE >> c10d0s0 ONLINE >> >> >> I''ve tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. >> Is it OK to scream and tear my hair out now? >> > > A little more research came up with this: > > root at shebop:~# zdb -l /dev/dsk/c7d0 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > failed to unpack label 0 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > failed to unpack label 1 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > failed to unpack label 2 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > failed to unpack label 3 > > > While ''zdb -l /dev/dsk/c7d0s0'' shows normal labels. So the new question > is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can''t do a ''zpool > replace'' because the zpool isn''t online.ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it controls entire disk. As before upgrade it looked like this: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 I guess something happened to the labeling of disk c7d0 (used to be c2d0) before, during or after upgrade. It would be nice to show what zdb -l shows for this disk and some other disk too. output of ''prtvtoc /dev/rdsk/cXdYs0'' can be helpful too. victor
7:56pm, Victor Latushkin wrote:>> While ''zdb -l /dev/dsk/c7d0s0'' shows normal labels. So the new question is: >> how do I tell ZFS to use c7d0s0 instead of c7d0? I can''t do a ''zpool >> replace'' because the zpool isn''t online. > > ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it > controls entire disk. As before upgrade it looked like this: > > NAME STATE READ WRITE CKSUM > datapool ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > c2d0s0 ONLINE 0 0 0 > c3d0s0 ONLINE 0 0 0 > c4d0s0 ONLINE 0 0 0 > c6d0s0 ONLINE 0 0 0 > c5d0s0 ONLINE 0 0 0 > > I guess something happened to the labeling of disk c7d0 (used to be c2d0) > before, during or after upgrade. > > It would be nice to show what zdb -l shows for this disk and some other disk > too. output of ''prtvtoc /dev/rdsk/cXdYs0'' can be helpful too. >This is from c7d0: -------------------------------------------- LABEL 0 -------------------------------------------- version=13 name=''datapool'' state=0 txg=233478 pool_guid=3410059226836265661 hostid=519305 hostname=''shebop'' top_guid=7679950824008134671 guid=17458733222130700355 vdev_tree type=''raidz'' id=0 guid=7679950824008134671 nparity=1 metaslab_array=23 metaslab_shift=32 ashift=9 asize=7501485178880 is_log=0 children[0] type=''disk'' id=0 guid=17458733222130700355 path=''/dev/dsk/c7d0s0'' devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742049/a'' phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 0/ide at 1/cmdk at 0,0:a'' whole_disk=1 DTL=588 children[1] type=''disk'' id=1 guid=4735756507338772729 path=''/dev/dsk/c8d0s0'' devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742050/a'' phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 1/ide at 0/cmdk at 0,0:a'' whole_disk=0 DTL=467 children[2] type=''disk'' id=2 guid=10113358996255761229 path=''/dev/dsk/c9d0s0'' devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742059/a'' phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 1/ide at 1/cmdk at 0,0:a'' whole_disk=0 DTL=573 children[3] type=''disk'' id=3 guid=11460855531791764612 path=''/dev/dsk/c11d0s0'' devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742048/a'' phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 2/ide at 1/cmdk at 0,0:a'' whole_disk=0 DTL=571 children[4] type=''disk'' id=4 guid=14986691153111294171 path=''/dev/dsk/c10d0s0'' devid=''id1,cmdk at AST31500341AS=____________9VS0TTWF/a'' phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 2/ide at 0/cmdk at 0,0:a'' whole_disk=0 DTL=473 Labels 1-3 are identical The other disks in the pool give identical results (except for the guid''s, which match with what''s above). c8d0 - c11d0 are identical, so I didn''t include that output below: root at shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0 * /dev/rdsk/c7d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930263997 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First Sector Last * Sector Count Sector * 34 222 255 * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 4 00 256 2930247391 2930247646 8 11 00 2930247647 16384 2930264030 root at shebop:/tmp# root at shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0 * /dev/rdsk/c8d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930277101 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 17 00 34 2930277101 2930277134 Thanks for the help! Paul Archer
Victor Latushkin
2009-Sep-28 17:11 UTC
[zfs-discuss] extremely slow writes (with good reads)
Paul, Thanks for additional data, please see comments inline. Paul Archer wrote:> 7:56pm, Victor Latushkin wrote: > > >>> While ''zdb -l /dev/dsk/c7d0s0'' shows normal labels. So the new >>> question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can''t >>> do a ''zpool replace'' because the zpool isn''t online. >> >> ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in >> case it controls entire disk. As before upgrade it looked like this: >> >> NAME STATE READ WRITE CKSUM >> datapool ONLINE 0 0 0 >> raidz1 ONLINE 0 0 0 >> c2d0s0 ONLINE 0 0 0 >> c3d0s0 ONLINE 0 0 0 >> c4d0s0 ONLINE 0 0 0 >> c6d0s0 ONLINE 0 0 0 >> c5d0s0 ONLINE 0 0 0 >> >> I guess something happened to the labeling of disk c7d0 (used to be >> c2d0) before, during or after upgrade. >> >> It would be nice to show what zdb -l shows for this disk and some >> other disk too. output of ''prtvtoc /dev/rdsk/cXdYs0'' can be helpful too. >> > > This is from c7d0: > > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=13 > name=''datapool'' > state=0 > txg=233478 > pool_guid=3410059226836265661 > hostid=519305 > hostname=''shebop'' > top_guid=7679950824008134671 > guid=17458733222130700355 > vdev_tree > type=''raidz'' > id=0 > guid=7679950824008134671 > nparity=1 > metaslab_array=23 > metaslab_shift=32 > ashift=9 > asize=7501485178880 > is_log=0 > children[0] > type=''disk'' > id=0 > guid=17458733222130700355 > path=''/dev/dsk/c7d0s0'' > devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742049/a'' > > phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 0/ide at 1/cmdk at 0,0:a'' > whole_disk=1This is why ZFS does not show s0 in the zpool output for c7d0 - it controls entire disk. I guess initially it was the other way - it is unlikely that you specified disks differently at creation time and earlier output suggests that it was other way. So somthing happened before last system reboot that most likely relabeled your c7d0 disk, and configuration in the labels was updated.> DTL=588 > children[1] > type=''disk'' > id=1 > guid=4735756507338772729 > path=''/dev/dsk/c8d0s0'' > devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742050/a'' > > phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 1/ide at 0/cmdk at 0,0:a'' > whole_disk=0All the other disks have whole_disk=0, so there''s s0 in the zpool output for those disks.> DTL=467 > children[2] > type=''disk'' > id=2 > guid=10113358996255761229 > path=''/dev/dsk/c9d0s0'' > devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742059/a'' > > phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 1/ide at 1/cmdk at 0,0:a'' > whole_disk=0 > DTL=573 > children[3] > type=''disk'' > id=3 > guid=11460855531791764612 > path=''/dev/dsk/c11d0s0'' > devid=''id1,cmdk at ASAMSUNG_HD154UI=S1Y6J1KS742048/a'' > > phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 2/ide at 1/cmdk at 0,0:a'' > whole_disk=0 > DTL=571 > children[4] > type=''disk'' > id=4 > guid=14986691153111294171 > path=''/dev/dsk/c10d0s0'' > devid=''id1,cmdk at AST31500341AS=____________9VS0TTWF/a'' > > phys_path=''/pci at 0,0/pci10de,3f3 at 4/pci8086,308 at 7/pci-ide at 2/ide at 0/cmdk at 0,0:a'' > whole_disk=0 > DTL=473 > > > Labels 1-3 are identical > > The other disks in the pool give identical results (except for the > guid''s, which match with what''s above).Ok, then let''s look at the vtoc - probably we can find something interesting there.> c8d0 - c11d0 are identical, so I didn''t include that output below:This is expected. So let''s look for the differences:> root at shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0 > * /dev/rdsk/c7d0s0 partition map > * > * Unallocated space: > * First Sector Last > * Sector Count Sector > * 34 222 255 > * > * First Sector Last > * Partition Tag Flags Sector Count Sector Mount Directory > 0 4 00 256 2930247391 2930247646 > 8 11 00 2930247647 16384 2930264030 > root at shebop:/tmp# > root at shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0 > * /dev/rdsk/c8d0s0 partition map > * > * First Sector Last > * Partition Tag Flags Sector Count Sector Mount Directory > 0 17 00 34 2930277101 2930277134 >Now you can clearly see the difference between the two: 4 disks have only one partition, whereas c7d0 has two and c7d0s0 is smaller than the others. Let''s do a little bit of math. asize for our RAID-Z vdev is 7501485178880; since it is asize of the smallest vdev multiplied by 5, we can calculate what is expected size of the smallest vdev - it is 7501485178880/5 = 1500297035776 bytes or 5723179 blocks of 256K bytes. Why is it interesting to have the size in terms of 256KB block? Size of one label is 128KB, two labels take 256KB, and ZFS currently rounds down slice size to nearest 256KB boundary for further calculations. We need to add to that size size of two labels in the front and in the back (4 in total) and 3.5MB reserved area right after front labels - it is 4.5MB or 18 256KB blocks. So our expected slice size is 5723179 + 18 = 5723197 256MB blocks. Let''s check actual slice sizes: c8d0s0: 2930277101 / 512 = 5723197.462890625 - so it is exactly 5723197 256KB blocks c7d0s0: 2930247391 / 512 = 5723139.435546875 - or 5723139 256KB blocks, which is 58 25KB less then needed. And this is the reason why RAID-Z is unhappy. How to get out of this? There are two ways: 1. Restore original labeling on the c7d0 disk 2. Remove (physically or logically) disk c7d0 Both ways have corresponding pro''s and contra''s. If you want to try logical removal of c7d0 you can do something like this: cfgadm -al find disk c7d0 in the output (may look similar to c7::dsk/c7d0) then do cfgadm -c unconfigure c7::dsk/c7d0 If dynamic reconfiguration for your disks is not supported, you can temporarily remove (or move it out of /dev/dsk and /dev/rdsk) these two symbolic links: /dev/dsk/c7d0s0 /dev/rdsk/c7d0s0 Then do zpool import to see what pools are available for import and if your RAID-Z is happier. To configure disk back you can do ''cfgadm -c configure <disk>" Cheers, Victor PS. do you have an idea how it could happen that c7d0 got relabeled?
In light of all the trouble I''ve been having with this zpool, I bought a 2TB drive, and I''m going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? Thanks, Paul
Robert Milkowski
2009-Sep-29 02:21 UTC
[zfs-discuss] extremely slow writes (with good reads)
Paul Archer wrote:> In light of all the trouble I''ve been having with this zpool, I bought > a 2TB drive, and I''m going to move all my data over to it, then > destroy the pool and start over. > > Before I do that, what is the best way on an x86 system to > format/label the disks? > >if entire disk is going to be dedicated to a one zfs pool then don''t bother with manual labeling - when creating a pool provide a disk name without a slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically put an EFI label on it with s0 representing entire disk (- reserved area). -- Robert Milkowski http://milek.blogspot.com
Cool. FWIW, there appears to be an issue with the LSI 150-6 card I was using. I grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, and I''m getting write speeds of about 60-70MB/sec, which is about 40x the write speed I was seeing with the old card. Paul Tomorrow, Robert Milkowski wrote:> Paul Archer wrote: >> In light of all the trouble I''ve been having with this zpool, I bought a >> 2TB drive, and I''m going to move all my data over to it, then destroy the >> pool and start over. >> >> Before I do that, what is the best way on an x86 system to format/label the >> disks? >> >> > > if entire disk is going to be dedicated to a one zfs pool then don''t bother > with manual labeling - when creating a pool provide a disk name without a > slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically > put an EFI label on it with s0 representing entire disk (- reserved area). > > -- > Robert Milkowski > http://milek.blogspot.com >
11:04pm, Paul Archer wrote:> Cool. > FWIW, there appears to be an issue with the LSI 150-6 card I was using. I > grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, > and I''m getting write speeds of about 60-70MB/sec, which is about 40x the > write speed I was seeing with the old card. > > PaulSmall correction: I was seeing writes in the 60-70MB range because I was writing to a single 2TB (on its own pool). When I tried writing back to the primary (4+1 raid-z) pool, I was getting between 100-120MB/sec. (That''s for sequential writes, anyway.) paul