Hey guys, Iam new to ZFS and have been playing around since few days. Iam trying to improve performance of a iSCSI storage backend by putting the ZIL/log on a SSD. Below are the steps i followed # format < /dev/null Searching for disks... The device does not support mode page 3 or page 4, or the reported geometry info is invalid. WARNING: Disk geometry is based on capacity data. The current rpm value 0 is invalid, adjusting it to 3600 done c8t3d0: configured with capacity of 465.74GB AVAILABLE DISK SELECTIONS: 0. c8t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> /pci at 0,0/pci8086,3476 at 1f,2/disk at 0,0 1. c8t1d0 <ATA-ANS9010_NNNN2N2N-_109-3.56GB> /pci at 0,0/pci8086,3476 at 1f,2/disk at 1,0 2. c8t2d0 <ATA-WDC WD5002ABYS-0-3B02-465.76GB> /pci at 0,0/pci8086,3476 at 1f,2/disk at 2,0 3. c8t3d0 <ATA-WDCWD5001ABYS-0-1D01 cyl 60799 alt 2 hd 255 sec 63> /pci at 0,0/pci8086,3476 at 1f,2/disk at 3,0 4. c8t4d0 <ATA-TS8GSSD25S-S-6-7.45GB> /pci at 0,0/pci8086,3476 at 1f,2/disk at 4,0 5. c8t5d0 <ATA-WDC WD5002ABYS-0-3B03-465.76GB> /pci at 0,0/pci8086,3476 at 1f,2/disk at 5,0 6. c9t9d0 <IFT-S16E-R1130-364Q-4.55TB> /iscsi/disk at 0000iqn.2002-10.com.infortrend%3Araid.sn7755270.0010001,0 7. c9t13d0 <IFT-S16E-R1130-364Q-4.55TB> /iscsi/disk at 0001iqn.2002-10.com.infortrend%3Araid.sn7755270.0010001,0 8. c9t15d0 <IFT-S16E-R1130-364Q-4.55TB> /iscsi/disk at 0002iqn.2002-10.com.infortrend%3Araid.sn7755270.0010001,0 9. c9t16d0 <IFT-S16E-R1130-364Q-4.55TB> /iscsi/disk at 0003iqn.2002-10.com.infortrend%3Araid.sn7755270.0010001,0 Specify disk (enter its number): # zpool create iftraid0 c9t9d0 log c8t4d0 # zfs create iftraid0/fs # zpool status iftraid0 pool: iftraid0 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM iftraid0 ONLINE 0 0 0 c9t9d0 ONLINE 0 0 0 logs ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 errors: No known data errors Now when i run dd and create a big file on /iftraid0/fs and watch `iostat -xnz 2` i dont see any stats for c8t4d0 nor does the write performance improves. I have not formatted either c9t9d0 or c8t4d0. What am i missing ? TIA Dushyanth -- This message posted from opensolaris.org
On 18/11/2009, at 7:33 AM, Dushyanth wrote:> Now when i run dd and create a big file on /iftraid0/fs and watch `iostat -xnz 2` i dont see any stats for c8t4d0 nor does the write performance improves. > > I have not formatted either c9t9d0 or c8t4d0. What am i missing ?Last I checked, iSCSI volumes go direct to the primary storage and not via the slog device. Can anybody confirm that is the case and if there is a mechanism/tuneable to force it via the slog and if there is any benefit/point in this for most cases? cheers, James
Which OS and release? The behaviour has changed over time. -- richard On Nov 17, 2009, at 1:33 PM, Dushyanth wrote:> Hey guys, > > Iam new to ZFS and have been playing around since few days. Iam > trying to improve performance of a iSCSI storage backend by putting > the ZIL/log on a SSD. > > Below are the steps i followed > > # format < /dev/null > Searching for disks... > > The device does not support mode page 3 or page 4, > or the reported geometry info is invalid. > WARNING: Disk geometry is based on capacity data. > > The current rpm value 0 is invalid, adjusting it to 3600 > done > > c8t3d0: configured with capacity of 465.74GB > > > AVAILABLE DISK SELECTIONS: > 0. c8t0d0 <DEFAULT cyl 60798 alt 2 hd 255 sec 63> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 0,0 > 1. c8t1d0 <ATA-ANS9010_NNNN2N2N-_109-3.56GB> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 1,0 > 2. c8t2d0 <ATA-WDC WD5002ABYS-0-3B02-465.76GB> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 2,0 > 3. c8t3d0 <ATA-WDCWD5001ABYS-0-1D01 cyl 60799 alt 2 hd 255 sec > 63> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 3,0 > 4. c8t4d0 <ATA-TS8GSSD25S-S-6-7.45GB> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 4,0 > 5. c8t5d0 <ATA-WDC WD5002ABYS-0-3B03-465.76GB> > /pci at 0,0/pci8086,3476 at 1f,2/disk at 5,0 > 6. c9t9d0 <IFT-S16E-R1130-364Q-4.55TB> > /iscsi/disk at 0000iqn.2002-10.com.infortrend > %3Araid.sn7755270.0010001,0 > 7. c9t13d0 <IFT-S16E-R1130-364Q-4.55TB> > /iscsi/disk at 0001iqn.2002-10.com.infortrend > %3Araid.sn7755270.0010001,0 > 8. c9t15d0 <IFT-S16E-R1130-364Q-4.55TB> > /iscsi/disk at 0002iqn.2002-10.com.infortrend > %3Araid.sn7755270.0010001,0 > 9. c9t16d0 <IFT-S16E-R1130-364Q-4.55TB> > /iscsi/disk at 0003iqn.2002-10.com.infortrend > %3Araid.sn7755270.0010001,0 > Specify disk (enter its number): > > # zpool create iftraid0 c9t9d0 log c8t4d0 > # zfs create iftraid0/fs > > # zpool status iftraid0 > pool: iftraid0 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > iftraid0 ONLINE 0 0 0 > c9t9d0 ONLINE 0 0 0 > logs ONLINE 0 0 0 > c8t4d0 ONLINE 0 0 0 > > errors: No known data errors > > Now when i run dd and create a big file on /iftraid0/fs and watch > `iostat -xnz 2` i dont see any stats for c8t4d0 nor does the write > performance improves. > > I have not formatted either c9t9d0 or c8t4d0. What am i missing ? > > TIA > Dushyanth > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Oops - most important info missed - Its OpenSolaris 2009.06 # uname -a SunOS m1-sv-ZFS-1 5.11 snv_111b i86pc i386 i86pc Solaris TIA Dushyanth -- This message posted from opensolaris.org
I ran a quick test to confirm James theory - and there is more weirdness # Mirror pool with two 500GB SATA disks - no log device root at m1-sv-ZFS-1:~# zpool create pool1 mirror c8t5d0 c8t2d0 root at m1-sv-ZFS-1:~# zfs create pool1/fs root at m1-sv-ZFS-1:~# cd /pool1/fs root at m1-sv-ZFS-1:/pool1/fs# time dd if=/dev/zero of=bigfile.55 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 41.2861 s, 99.2 MB/s real 0m41.290s user 0m0.634s sys 0m12.907s # Mirror pool with two 500GB locally attached SATA disks - with log device root at m1-sv-ZFS-1:~# zpool add pool1 log c8t4d0 root at m1-sv-ZFS-1:~# zpool status pool1 pool: pool1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 logs ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 root at m1-sv-ZFS-1:/pool1/fs# time dd if=/dev/zero of=bigfile.55 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 51.0463 s, 80.2 MB/s real 0m51.107s user 0m0.639s sys 0m12.817s root at m1-sv-ZFS-1:/pool1/fs# time dd if=/dev/zero of=bigfile.66 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 52.0175 s, 78.7 MB/s real 0m52.022s user 0m0.641s sys 0m12.780s Performance dropped for some reason and still no io activity on the log device (c8t4d0 - Transcend 8GB SSD disk) TIA Dushyanth -- This message posted from opensolaris.org
I am sorry that I don''t have any links, but here is what I observe on my system. dd does not do sync writes, so the ZIL is not used. iSCSI traffic does sync writes (as of 2009.06, but not 2008.05), so if you repeat your test using an iSCSI target from your system, you should see log activity. Same for NFS. I see no ZIL activity using rsync, for an example of a network file transfer that does not require sync. Scott -- This message posted from opensolaris.org
On Nov 17, 2009, at 2:50 PM, Scott Meilicke wrote:> I am sorry that I don''t have any links, but here is what I observe > on my system. dd does not do sync writes, so the ZIL is not used. > iSCSI traffic does sync writes (as of 2009.06, but not 2008.05), so > if you repeat your test using an iSCSI target from your system, you > should see log activity. Same for NFS. I see no ZIL activity using > rsync, for an example of a network file transfer that does not > require sync.This is correct. Normal, local file I/O will be nicely cached. -- richard
>>>>> "d" == Dushyanth <dushyanth.h at directi.com> writes:d> Performance dropped for some reason the SSD''s black-box-filesystem is fragmented? Do the slog-less test again and see if it''s still fast. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091117/0ff2aa1a/attachment.bin>
Hi, Thanks for all the inputs. I did run some postmark tests without slog and with it and did not see any performance benefits on the iSCSI volume. I will repeat them again and post results here. Also pls note that the solarix box is the initiator and the target is a Infortrend S16-R1130 iSCSI array. Thanks Dushyanth -- This message posted from opensolaris.org
> the SSD''s black-box-filesystem is fragmented?Not very sure - Its a Transcend TS8GSSD25S 2.5" SLC SDD that i could find in our store immdtly. I also have a ACARD ANS-9010 DRAM (http://bit.ly/3cQ4fK) that iam experimenting with. The Intel X25e should arrive soon. Are there any other recommended devices for Slog ? I will repeat the tests and update Thanks Dushyanth -- This message posted from opensolaris.org
Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator to a third party target go through ZIL ? -- This message posted from opensolaris.org
On Nov 18, 2009, at 2:20 AM, Dushyanth wrote:> Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator > to a third party target go through ZIL ?ZFS doesn''t know what a block device is. So if you configure your pool to use iSCSI devices, then it will use them. To measure ZIL activity, use zilstat. http://www.richardelling.com/Home/scripts-and-programs-1/zilstat -- richard
I second the use of zilstat - very useful, especially if you don''t want to mess around with adding a log device and then having to destroy the pool if you don''t want the log device any longer. On Nov 18, 2009, at 2:20 AM, Dushyanth wrote:> Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator > to a third party target go through ZIL ?It depends on whether the application requires a sync or not. dd does not, but databases (in general) do. As Richard said, ZFS treats the iSCSI volume just like any other vdev (pool of disks), so the fact that it is an iSCSI volume has nothing to do with ZFS'' zil usage. -Scott -- This message posted from opensolaris.org
Scott Meilicke wrote:> I second the use of zilstat - very useful, especially if you don''t want to mess around with adding a log device and then having to destroy the pool if you don''t want the log device any longer.log devices can be removed as of zpool version 19. -- Darren J Moffat
Darren J Moffat wrote:> Meilicke wrote: >> I second the use of zilstat - very useful, especially if you don''t want to mess >> around with adding a log device and then having to destroy the pool if you don''t >> want the log device any longer. > > log devices can be removed as of zpool version 19.no change to get rid of a ZIL/log device on Solaris10U8 (ZFS pool version 15) ? -- Michael
Thanks a lot. This clears many of the doubts I had. I was actually trying to improve the performance of our email storage. We are using dovecot as the LDA on a set of RHEL boxes and the email volume seems to be saturating the write throughput of our infortrend iSCSI SAN. So it looks like a mail storage server type workload will not benifit with a ZIL as proven by zilstat and a postmark workload. Iam guessing Sun 7310/7410 will behave the same way with this type of workload. Do the logzillas/readzillas do anything special on these boxes ? TIA Dushyanth ------Original Message------ From: Richard Elling To: Dushyanth Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness Sent: Nov 18, 2009 21:55 On Nov 18, 2009, at 2:20 AM, Dushyanth wrote:> Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator > to a third party target go through ZIL ?ZFS doesn''t know what a block device is. So if you configure your pool to use iSCSI devices, then it will use them. To measure ZIL activity, use zilstat. http://www.richardelling.com/Home/scripts-and-programs-1/zilstat -- richard Sent from BlackBerry
On Nov 19, 2009, at 10:28 AM, Dushyanth Harinath wrote:> Thanks a lot. This clears many of the doubts I had. > > I was actually trying to improve the performance of our email > storage. We are using dovecot as the LDA on a set of RHEL boxes and > the email volume seems to be saturating the write throughput of our > infortrend iSCSI SAN.I''m not familiar with dovecot, but I''ve done 500k users on a system with just a few FC connections. The mail workload can be very stressful to storage. Of course, we had plenty of HDDs -- enough to handle the large IOPS requirement. I''d love to revisit this with SSDs instead :-)> So it looks like a mail storage server type workload will not > benifit with a ZIL as proven by zilstat and a postmark workload. > > Iam guessing Sun 7310/7410 will behave the same way with this type > of workload. Do the logzillas/readzillas do anything special on > these boxes ?In general, the performance will be limited by the storage devices. -- richard