Charles Baker
2009-Aug-04 13:57 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
> My testing has shown some serious problems with the > iSCSI implementation for OpenSolaris. > > I setup a VMware vSphere 4 box with RAID 10 > direct-attached storage and 3 virtual machines: > - OpenSolaris 2009.06 (snv_111b) running 64-bit > - CentOS 5.3 x64 (ran yum update) > - Ubuntu Server 9.04 x64 (ran apt-get upgrade) > > I gave each virtual 2 GB of RAM, a 32 GB drive and > setup a 16 GB iSCSI target on each (the two Linux vms > used iSCSI Enterprise Target 0.4.16 with blockio). > VMware Tools was installed on each. No tuning was > done on any of the operating systems. > > I ran two tests for write performance - one one the > server itself and one from my Mac connected via > Gigabit (mtu of 1500) iSCSI connection using > globalSAN?s latest initiator. > > Here?s what I used on the servers: > time dd if=/dev/zero of=/root/testfile bs=1048576k > count=4 > and the Mac OS with the iSCSI connected drive > (formatted with GPT / Mac OS Extended journaled): > time dd if=/dev/zero of=/Volumes/test/testfile > bs=1048576k count=4 > > The results were very interesting (all calculations > using 1 MB = 1,084,756 bytes) > > For OpenSolaris, the local write performance averaged > 86 MB/s. I turned on lzjb compression for rpool (zfs > set compression=lzjb rpool) and it went up to 414 > MB/s since I?m writing zeros). The average > performance via iSCSI was an abysmal 16 MB/s (even > with compression turned on - with it off, 13 MB/s). > > For CentOS (ext3), local write performance averaged > 141 MB/s. iSCSI performance was 78 MB/s (almost as > fast as local ZFS performance on the OpenSolaris > server when compression was turned off). > > Ubuntu Server (ext4) had 150 MB/s for the local > write. iSCSI performance averaged 80 MB/s. > > One of the main differences between the three virtual > machines was that the iSCSI target on the Linux > machines used partitions with no file system. On > OpenSolaris, the iSCSI target created sits on top of > ZFS. That creates a lot of overhead (although you do > get some great features). > > Since all the virtual machines were connected to the > same switch (with the same MTU), had the same amount > of RAM, used default configurations for the operating > systems, and sat on the same RAID 10 storage, I?d say > it was a pretty level playing field. > > While jumbo frames will help iSCSI performance, it > won?t overcome inherit limitations of the iSCSI > target?s implementation.cross-posting with zfs discuss. -- This message posted from opensolaris.org
erik.ableson
2009-Aug-04 14:40 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
You''re running into the same problem I had with 2009.06 as they have "corrected" a bug where the iSCSI target prior to 2009.06 didn''t honor completely SCSI sync commands issued by the initiator. Some background : Discussion: http://opensolaris.org/jive/thread.jspa?messageID=388492 "corrected bug" http://bugs.opensolaris.org/view_bug.do?bug_id=6770534 The upshot is that unless you have an SSD (or other high speed dedicated device) attached as a ZIL (or slog) on 2009.06 you won''t see anywhere near the local speed performance that the storage is capable of since you''re forcing individual transactions all the way down to disk and back up before moving onto the next SCSI block command. This iSCSI performance profile is currently specific to 2009.06 and does not occur on 2008.11. As a stopgap (since I don''t have a budget for SSDs right now) I''m keeping my production servers on 2008.11 (taking into account the additional potential risk, but these are machines with battery backed SAS cards in a conditioned data center). These machines are serving up iSCSI to ESX 3.5 and ESX 4 servers. For my freewheeling home use where everything gets tried, crashed, patched and put back together with baling twine (and is backed up elsewhere...) I''ve mounted a RAM disk of 1Gb which is attached to the pool as a ZIL and you see the performance run in cycles where the ZIL loads up to saturation, flushes out to disk and keeps going. I did write a script to regularly dd the ram disk device out to a file so that I can recreate with the appropriate signature if I have to reboot the osol box. This is used with the GlobalSAN initiator on OS X as well as various Windows and Linux machines, physical and VM. Assuming this is a test system that you''re playing with and you can destroy the pool with inpunity, and you don''t have an SSD lying around to test with, try the following : ramdiskadm -a slog 2g (or whatever size you can manage reasonably with the available physical RAM - try "vmstat 1 2" to determine available memory) zpool add <poolname> log /dev/ramdisk/slog If you want to perhaps reuse the slog later (ram disks are not preserved over reboot) write the slog volume out to disk and dump it back in after restarting. dd if=/dev/ramdisk/slog of=/root/slog.dd All of the above assumes that you are not doing this stuff against rpool. I think that attaching a volatile log device to your boot pool would result in a machine that can''t mount the root zfs volume. It''s easiest to monitor from the Mac (I find) so try your test again with the Activity Monitor showing network traffic and you''ll see that it goes to a wire speed ceiling while it''s filling up the ZIL and once it''s saturated your traffic will drop to near nothing, and then pick up again after a few seconds. If you don''t saturate the ZIL you''ll see continuous speed data transfer. Cheers, Erik On 4 ao?t 09, at 15:57, Charles Baker wrote:>> My testing has shown some serious problems with the >> iSCSI implementation for OpenSolaris. >> >> I setup a VMware vSphere 4 box with RAID 10 >> direct-attached storage and 3 virtual machines: >> - OpenSolaris 2009.06 (snv_111b) running 64-bit >> - CentOS 5.3 x64 (ran yum update) >> - Ubuntu Server 9.04 x64 (ran apt-get upgrade) >> >> I gave each virtual 2 GB of RAM, a 32 GB drive and >> setup a 16 GB iSCSI target on each (the two Linux vms >> used iSCSI Enterprise Target 0.4.16 with blockio). >> VMware Tools was installed on each. No tuning was >> done on any of the operating systems. >> >> I ran two tests for write performance - one one the >> server itself and one from my Mac connected via >> Gigabit (mtu of 1500) iSCSI connection using >> globalSAN?s latest initiator. >> >> Here?s what I used on the servers: >> time dd if=/dev/zero of=/root/testfile bs=1048576k >> count=4 >> and the Mac OS with the iSCSI connected drive >> (formatted with GPT / Mac OS Extended journaled): >> time dd if=/dev/zero of=/Volumes/test/testfile >> bs=1048576k count=4 >> >> The results were very interesting (all calculations >> using 1 MB = 1,084,756 bytes) >> >> For OpenSolaris, the local write performance averaged >> 86 MB/s. I turned on lzjb compression for rpool (zfs >> set compression=lzjb rpool) and it went up to 414 >> MB/s since I?m writing zeros). The average >> performance via iSCSI was an abysmal 16 MB/s (even >> with compression turned on - with it off, 13 MB/s). >> >> For CentOS (ext3), local write performance averaged >> 141 MB/s. iSCSI performance was 78 MB/s (almost as >> fast as local ZFS performance on the OpenSolaris >> server when compression was turned off). >> >> Ubuntu Server (ext4) had 150 MB/s for the local >> write. iSCSI performance averaged 80 MB/s. >> >> One of the main differences between the three virtual >> machines was that the iSCSI target on the Linux >> machines used partitions with no file system. On >> OpenSolaris, the iSCSI target created sits on top of >> ZFS. That creates a lot of overhead (although you do >> get some great features). >> >> Since all the virtual machines were connected to the >> same switch (with the same MTU), had the same amount >> of RAM, used default configurations for the operating >> systems, and sat on the same RAID 10 storage, I?d say >> it was a pretty level playing field. >> >> While jumbo frames will help iSCSI performance, it >> won?t overcome inherit limitations of the iSCSI >> target?s implementation. > > cross-posting with zfs discuss. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090804/746390ef/attachment.html>
Ross Walker
2009-Aug-04 15:15 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Tue, Aug 4, 2009 at 10:40 AM, erik.ableson<eableson at mac.com> wrote:> You''re running into the same problem I had with 2009.06 as they have > "corrected" a bug where the iSCSI target?prior to > 2009.06?didn''t?honor?completely?SCSI?sync?commands?issued by the initiator. > Some background : > Discussion: > http://opensolaris.org/jive/thread.jspa?messageID=388492 > "corrected bug" > http://bugs.opensolaris.org/view_bug.do?bug_id=6770534But this MUST happen. If it doesn''t then you are playing Russian Roulette with your data, as a kernel panic can cause a loss of up to 1/8 of the size of your system''s RAM (ZFS lazy write cache) of your iSCSI target''s data!> The upshot is that unless you have an SSD (or other high speed dedicated > device) attached as a ZIL (or slog) on 2009.06 you won''t see anywhere near > the local speed performance that the storage is capable of since you''re > forcing individual transactions all the way down to disk and back up before > moving onto the next SCSI block command.Actually I recommend using a controller with an NVRAM cache on it, say 256MB-512MB (or more). This is much faster then SSD and has the advantage that the ZIL is stripped across the pool making ZIL reads much faster! You don''t need to use the hardware raid, export the drives as JBOD or individual RAID0 luns and make a zpool out of them.> This iSCSI performance profile is currently specific to 2009.06 and does not > occur on 2008.11. ?As a stopgap (since I don''t have a budget for SSDs right > now) I''m keeping my production servers on 2008.11 (taking into account the > additional potential risk, but these are machines with battery backed SAS > cards in a conditioned data center). These machines are serving up iSCSI to > ESX 3.5 and ESX 4 servers.God I hope not. Tick-tock, eventually you will corrupt your iSCSI data with that setup, it''s not a matter of if, it''s a matter of when.> For my freewheeling home use where everything gets tried, crashed, patched > and put back together with baling twine (and is backed up elsewhere...) I''ve > mounted a RAM disk of 1Gb which is attached to the pool as a ZIL and you see > the performance run in cycles where the ZIL loads up to saturation, flushes > out to disk and keeps going. I did write a script to regularly dd the ram > disk device out to a file so that I can recreate with the appropriate > signature if I have to reboot the osol box. This is used with the GlobalSAN > initiator on OS X as well as various Windows and Linux machines, physical > and VM. > Assuming this is a test system that you''re playing with and you can destroy > the pool with inpunity, and you don''t have an SSD lying around to test with, > try the following : > ramdiskadm -a slog 2g (or whatever size you can manage reasonably with the > available physical RAM - try "vmstat 1 2" to determine available memory) > zpool add <poolname> log /dev/ramdisk/slog > If you want to perhaps reuse the slog later (ram disks are not preserved > over reboot) write the slog volume out to disk and dump it back in after > restarting. > ?dd if=/dev/ramdisk/slog of=/root/slog.ddYou might as well use a ramdisk ZIL in production too with 2008.11 ZVOLs.> All of the above assumes that you are not doing this stuff against rpool. ?I > think that attaching a volatile log device to your boot pool would result in > a machine that can''t mount the root zfs volume.I think you can re-create the ramdisk and do a replace to bring the pool online. Just don''t do it with your rpool or you will be in a world of hurt.> It''s easiest to monitor from the Mac (I find) so try your test again with > the Activity Monitor showing network traffic and you''ll see that it goes to > a wire speed ceiling while it''s filling up the ZIL and once it''s saturated > your traffic will drop to near nothing, and then pick up again after a few > seconds. If you don''t saturate the ZIL you''ll see continuous speed data > transfer.I also use a network activity monitor for quick estimate of throughput while running. Works good on Mac, Windows (Task Manager) or Linux (variety, GUI sysstat, ntop, etc). -Ross
Ross Walker
2009-Aug-04 15:21 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Tue, Aug 4, 2009 at 9:57 AM, Charles Baker<no-reply at opensolaris.org> wrote:>> My testing has shown some serious problems with the >> iSCSI implementation for OpenSolaris. >> >> I setup a VMware vSphere 4 box with RAID 10 >> direct-attached storage and 3 virtual machines: >> - OpenSolaris 2009.06 (snv_111b) running 64-bit >> - CentOS 5.3 x64 (ran yum update) >> - Ubuntu Server 9.04 x64 (ran apt-get upgrade) >> >> I gave each virtual 2 GB of RAM, a 32 GB drive and >> setup a 16 GB iSCSI target on each (the two Linux vms >> used iSCSI Enterprise Target 0.4.16 with blockio). >> VMware Tools was installed on each. No tuning was >> done on any of the operating systems. >> >> I ran two tests for write performance - one one the >> server itself and one from my Mac connected via >> Gigabit (mtu of 1500) iSCSI connection using >> globalSAN?s latest initiator. >> >> Here?s what I used on the servers: >> time dd if=/dev/zero of=/root/testfile bs=1048576k >> count=4 >> and the Mac OS with the iSCSI connected drive >> (formatted with GPT / Mac OS Extended journaled): >> time dd if=/dev/zero of=/Volumes/test/testfile >> bs=1048576k count=4 >> >> The results were very interesting (all calculations >> using 1 MB = 1,084,756 bytes) >> >> For OpenSolaris, the local write performance averaged >> 86 MB/s. I turned on lzjb compression for rpool (zfs >> set compression=lzjb rpool) and it went up to 414 >> MB/s since I?m writing zeros). The average >> performance via iSCSI was an abysmal 16 MB/s (even >> with compression turned on - with it off, 13 MB/s). >> >> For CentOS (ext3), local write performance averaged >> 141 MB/s. iSCSI performance was 78 MB/s (almost as >> fast as local ZFS performance on the OpenSolaris >> server when compression was turned off). >> >> Ubuntu Server (ext4) had 150 MB/s for the local >> write. iSCSI performance averaged 80 MB/s. >> >> One of the main differences between the three virtual >> machines was that the iSCSI target on the Linux >> machines used partitions with no file system. On >> OpenSolaris, the iSCSI target created sits on top of >> ZFS. That creates a lot of overhead (although you do >> get some great features). >> >> Since all the virtual machines were connected to the >> same switch (with the same MTU), had the same amount >> of RAM, used default configurations for the operating >> systems, and sat on the same RAID 10 storage, I?d say >> it was a pretty level playing field. >> >> While jumbo frames will help iSCSI performance, it >> won?t overcome inherit limitations of the iSCSI >> target?s implementation.If you want to host your VMs from Solaris (Open or not) use NFS right now as the iSCSI implementation is still quite a bit immature and won''t perform nearly as good as the Linux implementation. Until comstar stabilizes and replaces iscsitgt I would hold off on iSCSI on Solaris. -Ross
Scott Meilicke
2009-Aug-04 15:35 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
This has been a very enlightening thread for me, and explains a lot of the performance data I have collected on both 2008.11 and 2009.06 which mirrors the experiences here. Thanks to you all. NFS perf tuning, here I come... -Scott -- This message posted from opensolaris.org
Ross Walker
2009-Aug-04 15:50 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Tue, Aug 4, 2009 at 11:21 AM, Ross Walker<rswwalker at gmail.com> wrote:> On Tue, Aug 4, 2009 at 9:57 AM, Charles Baker<no-reply at opensolaris.org> wrote: >>> My testing has shown some serious problems with the >>> iSCSI implementation for OpenSolaris. >>> >>> I setup a VMware vSphere 4 box with RAID 10 >>> direct-attached storage and 3 virtual machines: >>> - OpenSolaris 2009.06 (snv_111b) running 64-bit >>> - CentOS 5.3 x64 (ran yum update) >>> - Ubuntu Server 9.04 x64 (ran apt-get upgrade) >>> >>> I gave each virtual 2 GB of RAM, a 32 GB drive and >>> setup a 16 GB iSCSI target on each (the two Linux vms >>> used iSCSI Enterprise Target 0.4.16 with blockio). >>> VMware Tools was installed on each. No tuning was >>> done on any of the operating systems. >>> >>> I ran two tests for write performance - one one the >>> server itself and one from my Mac connected via >>> Gigabit (mtu of 1500) iSCSI connection using >>> globalSAN?s latest initiator. >>> >>> Here?s what I used on the servers: >>> time dd if=/dev/zero of=/root/testfile bs=1048576k >>> count=4 >>> and the Mac OS with the iSCSI connected drive >>> (formatted with GPT / Mac OS Extended journaled): >>> time dd if=/dev/zero of=/Volumes/test/testfile >>> bs=1048576k count=4 >>> >>> The results were very interesting (all calculations >>> using 1 MB = 1,084,756 bytes) >>> >>> For OpenSolaris, the local write performance averaged >>> 86 MB/s. I turned on lzjb compression for rpool (zfs >>> set compression=lzjb rpool) and it went up to 414 >>> MB/s since I?m writing zeros). The average >>> performance via iSCSI was an abysmal 16 MB/s (even >>> with compression turned on - with it off, 13 MB/s). >>> >>> For CentOS (ext3), local write performance averaged >>> 141 MB/s. iSCSI performance was 78 MB/s (almost as >>> fast as local ZFS performance on the OpenSolaris >>> server when compression was turned off). >>> >>> Ubuntu Server (ext4) had 150 MB/s for the local >>> write. iSCSI performance averaged 80 MB/s. >>> >>> One of the main differences between the three virtual >>> machines was that the iSCSI target on the Linux >>> machines used partitions with no file system. On >>> OpenSolaris, the iSCSI target created sits on top of >>> ZFS. That creates a lot of overhead (although you do >>> get some great features). >>> >>> Since all the virtual machines were connected to the >>> same switch (with the same MTU), had the same amount >>> of RAM, used default configurations for the operating >>> systems, and sat on the same RAID 10 storage, I?d say >>> it was a pretty level playing field. >>> >>> While jumbo frames will help iSCSI performance, it >>> won?t overcome inherit limitations of the iSCSI >>> target?s implementation. > > If you want to host your VMs from Solaris (Open or not) use NFS right > now as the iSCSI implementation is still quite a bit immature and > won''t perform nearly as good as the Linux implementation. Until > comstar stabilizes and replaces iscsitgt I would hold off on iSCSI on > Solaris.This sounds crazy, but I was wondering if someone has tried running Linux iSCSI from within a domU in Xen on OpenSolaris 2009.06 to a ZVOL on dom0. Of course the zpool still needs NVRAM or SSD ZIL to perform well, but if the Xen dom0 is stable it and the crossbow networking works well, this could allow the best of both worlds. -Ross
Bob Friesenhahn
2009-Aug-04 17:35 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Tue, 4 Aug 2009, Ross Walker wrote:> > But this MUST happen. If it doesn''t then you are playing Russian > Roulette with your data, as a kernel panic can cause a loss of up to > 1/8 of the size of your system''s RAM (ZFS lazy write cache) of your > iSCSI target''s data!The actual risk (with recent zfs) seems to be 7/8ths RAM (not 1/8), sufficient data to accomplish 5 seconds of 100% write, or up to 30 seconds of aggregation time. On a large memory system with high performance I/O, this can represent a huge amount (gigabytes) of data.> Actually I recommend using a controller with an NVRAM cache on it, say > 256MB-512MB (or more). > > This is much faster then SSD and has the advantage that the ZIL is > stripped across the pool making ZIL reads much faster!Are you sure that it is faster than an SSD? The data is indeed pushed closer to the disks, but there may be considerably more latency associated with getting that data into the controller NVRAM cache than there is into a dedicated slog SSD. Remember that only synchronous writes go into the slog but all writes must pass through the controller''s NVRAM, and so synchronous writes may need to wait for other I/Os to make it to controller NVRAM cache before their turn comes. There may also be read requests queued in the same I/O channel which are queued before the synchronous write request. Tests done by others show a considerable NFS write speed advantage when using a dedicated slog SSD rather than a controller''s NVRAM cache. The slog SSD is a dedicated function device so there is minimal access latency. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ross Walker
2009-Aug-04 22:36 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 1:35 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Tue, 4 Aug 2009, Ross Walker wrote: >> >> But this MUST happen. If it doesn''t then you are playing Russian >> Roulette with your data, as a kernel panic can cause a loss of up to >> 1/8 of the size of your system''s RAM (ZFS lazy write cache) of your >> iSCSI target''s data! > > The actual risk (with recent zfs) seems to be 7/8ths RAM (not 1/8), > sufficient data to accomplish 5 seconds of 100% write, or up to 30 > seconds of aggregation time. On a large memory system with high > performance I/O, this can represent a huge amount (gigabytes) of data.Yikes! Worse then I thought.>> Actually I recommend using a controller with an NVRAM cache on it, >> say >> 256MB-512MB (or more). >> >> This is much faster then SSD and has the advantage that the ZIL is >> stripped across the pool making ZIL reads much faster! > > Are you sure that it is faster than an SSD? The data is indeed > pushed closer to the disks, but there may be considerably more > latency associated with getting that data into the controller NVRAM > cache than there is into a dedicated slog SSD.I don''t see how, as the SSD is behind a controller it still must make it to the controller.> Remember that only synchronous writes go into the slog but all > writes must pass through the controller''s NVRAM, and so synchronous > writes may need to wait for other I/Os to make it to controller > NVRAM cache before their turn comes. There may also be read > requests queued in the same I/O channel which are queued before the > synchronous write request.Well the duplexing benefit you mention does hold true. That''s a complex real-world scenario that would be hard to benchmark in production.> Tests done by others show a considerable NFS write speed advantage > when using a dedicated slog SSD rather than a controller''s NVRAM > cache.I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential write). It''s a Dell PERC 6/e with 512MB onboard.> The slog SSD is a dedicated function device so there is minimal > access latency.There is still bus and controller plus SSD latency. I suppose one could use a pair of disks as an slog mirror, enable NVRAM just for those and let the others do write-through with their disk caches enabled and there, dedicated slog device with NVRAM speed. It would be even better to have a pair of SSDs behind the NVRAM, but it''s hard to find compatible SSDs for these controllers, Dell currently doesn''t even support SSDs in their RAID products :-( -Ross
Carson Gaspar
2009-Aug-05 00:36 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:> I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential > write). It''s a Dell PERC 6/e with 512MB onboard....> there, dedicated slog device with NVRAM speed. It would be even better > to have a pair of SSDs behind the NVRAM, but it''s hard to find > compatible SSDs for these controllers, Dell currently doesn''t even > support SSDs in their RAID products :-(Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support recently. -- Carson
Mr liu
2009-Aug-05 01:10 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
What shall I do ? my server is not support ssd . go back to use 0811 ? -- This message posted from opensolaris.org
James Lever
2009-Aug-05 01:18 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On 05/08/2009, at 10:36 AM, Carson Gaspar wrote:> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support > recently.Yep, it''s a mega raid device. I have been using one with a Samsung SSD in RAID0 mode (to avail myself of the cache) recently with great success. cheers, James
Ross Walker
2009-Aug-05 01:34 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote:> Ross Walker wrote: > >> I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential >> write). It''s a Dell PERC 6/e with 512MB onboard. > ... >> there, dedicated slog device with NVRAM speed. It would be even >> better to have a pair of SSDs behind the NVRAM, but it''s hard to >> find compatible SSDs for these controllers, Dell currently doesn''t >> even support SSDs in their RAID products :-( > > Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support > recently.Yes, but the LSI support of SSDs is on later controllers. -Ross
Ross Walker
2009-Aug-05 01:36 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 9:18 PM, James Lever <j at jamver.id.au> wrote:> > On 05/08/2009, at 10:36 AM, Carson Gaspar wrote: > >> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support >> recently. > > Yep, it''s a mega raid device. > > I have been using one with a Samsung SSD in RAID0 mode (to avail > myself of the cache) recently with great success.Which model? -Ross
James Lever
2009-Aug-05 01:37 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On 05/08/2009, at 11:36 AM, Ross Walker wrote:> Which model?PERC 6/E w/512MB BBWC.
Ross Walker
2009-Aug-05 01:41 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 9:37 PM, James Lever <j at jamver.id.au> wrote:> > On 05/08/2009, at 11:36 AM, Ross Walker wrote: > >> Which model? > > PERC 6/E w/512MB BBWC.Really? You know I tried flashing mine with LSI''s firmware and while it seemed to take it still didn''t recognize my mtrons. What is your recipe for these? -Ross
Carson Gaspar
2009-Aug-05 01:55 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:> On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote:>> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support recently. > > Yes, but the LSI support of SSDs is on later controllers.Please cite your source for that statement. The PERC 6/e is an LSI 1078. The LSI web site has firmware updates that explicitly reference SSDs for that chip. See: http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0013_SAS_FW_Image_APP-1.40.42-0615.txt http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0008_SAS_FW_Image_APP-1.40.32-0580.txt Yes, they _say_ they''re only for the LSI 8[78]xx cards, but they should work with _any_ 1078 based controller. To quote the above: "Command syntax: MegaCli -adpfwflash -f SAS1078_FW_Image.rom -a0" -- Carson
James Lever
2009-Aug-05 02:17 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On 05/08/2009, at 11:41 AM, Ross Walker wrote:> What is your recipe for these?There wasn''t one! ;) The drive I''m using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3. cheers, James
Ross Walker
2009-Aug-05 02:19 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 9:55 PM, Carson Gaspar <carson at taltos.org> wrote:> Ross Walker wrote: >> On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote: > >>> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support >>> recently. >> Yes, but the LSI support of SSDs is on later controllers. > > Please cite your source for that statement. > > The PERC 6/e is an LSI 1078. The LSI web site has firmware updates > that explicitly reference SSDs for that chip. See: > > http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0013_SAS_FW_Image_APP-1.40.42-0615.txt > http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0008_SAS_FW_Image_APP-1.40.32-0580.txt > > Yes, they _say_ they''re only for the LSI 8[78]xx cards, but they > should work with _any_ 1078 based controller. To quote the above: > > "Command syntax: MegaCli -adpfwflash -f SAS1078_FW_Image.rom -a0"I tried that and while it looked like it took (bios reported lsi firm ver) it still didn''t recognize my SSDs after the reboot. -Ross
Bob Friesenhahn
2009-Aug-05 02:22 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Tue, 4 Aug 2009, Ross Walker wrote:>> Are you sure that it is faster than an SSD? The data is indeed pushed >> closer to the disks, but there may be considerably more latency associated >> with getting that data into the controller NVRAM cache than there is into a >> dedicated slog SSD. > > I don''t see how, as the SSD is behind a controller it still must make it to > the controller.If you take a look at ''iostat -x'' output you will see that the system knows about a queue for each device. If it was any other way, then a slow device would slow down access to all of the other devices. If there is concern about lack of bandwidth (PCI-E?) to the controller, then you can use a separate controller for the SSDs.> Well the duplexing benefit you mention does hold true. That''s a complex > real-world scenario that would be hard to benchmark in production.But easy to see the effects of.>> Tests done by others show a considerable NFS write speed advantage when >> using a dedicated slog SSD rather than a controller''s NVRAM cache. > > I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential write). > It''s a Dell PERC 6/e with 512MB onboard.I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM), but that is not very good when the network is good for 100 MB/s. With an SSD, some other folks here are getting essentially network speed.> There is still bus and controller plus SSD latency. I suppose one > could use a pair of disks as an slog mirror, enable NVRAM just for > those and let the others do write-through with their disk cachesBut this encounters the problem that when the NVRAM becomes full then you hit the wall of synchronous disk write performance. With the SSD slog, the write log can be quite large and disk writes are then done in a much more efficient ordered fashion similar to non-sync writes. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Ross Walker
2009-Aug-05 02:23 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 10:17 PM, James Lever <j at jamver.id.au> wrote:> > On 05/08/2009, at 11:41 AM, Ross Walker wrote: > >> What is your recipe for these? > > There wasn''t one! ;) > > The drive I''m using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3.So the key is the drive needs to have the Dell badging to work? I called my rep about getting a Dell badged SSD and he told me they didn''t support those in MD series enclosures so therefore were unavailable. Maybe it''s time for a new account rep. -Ross
Ross Walker
2009-Aug-05 02:41 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Tue, 4 Aug 2009, Ross Walker wrote: >>> Are you sure that it is faster than an SSD? The data is indeed >>> pushed closer to the disks, but there may be considerably more >>> latency associated with getting that data into the controller >>> NVRAM cache than there is into a dedicated slog SSD. >> >> I don''t see how, as the SSD is behind a controller it still must >> make it to the controller. > > If you take a look at ''iostat -x'' output you will see that the > system knows about a queue for each device. If it was any other > way, then a slow device would slow down access to all of the other > devices. If there is concern about lack of bandwidth (PCI-E?) to > the controller, then you can use a separate controller for the SSDs.It''s not bandwidth. Though with a lot of mirrors that does become a concern.>> Well the duplexing benefit you mention does hold true. That''s a >> complex real-world scenario that would be hard to benchmark in >> production. > > But easy to see the effects of.I actually meant to say, hard to bench out of production.>>> Tests done by others show a considerable NFS write speed advantage >>> when using a dedicated slog SSD rather than a controller''s NVRAM >>> cache. >> >> I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential >> write). It''s a Dell PERC 6/e with 512MB onboard. > > I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM), > but that is not very good when the network is good for 100 MB/s. > With an SSD, some other folks here are getting essentially network > speed.In testing with ram disks I was only able to get a max of around 60MB/ s with 4k block sizes, with 4 outstanding. I can do 64k blocks now and get around 115MB/s.>> There is still bus and controller plus SSD latency. I suppose one >> could use a pair of disks as an slog mirror, enable NVRAM just for >> those and let the others do write-through with their disk caches > > But this encounters the problem that when the NVRAM becomes full > then you hit the wall of synchronous disk write performance. With > the SSD slog, the write log can be quite large and disk writes are > then done in a much more efficient ordered fashion similar to non- > sync writes.Yes, you have a point there. So, what SSD disks do you use? -Ross
Henrik Johansen
2009-Aug-05 06:49 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:>On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote: > >> Ross Walker wrote: >> >>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential >>> write). It''s a Dell PERC 6/e with 512MB onboard. >> ... >>> there, dedicated slog device with NVRAM speed. It would be even >>> better to have a pair of SSDs behind the NVRAM, but it''s hard to >>> find compatible SSDs for these controllers, Dell currently doesn''t >>> even support SSDs in their RAID products :-( >> >> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support >> recently. > >Yes, but the LSI support of SSDs is on later controllers.Sure that''s not just a firmware issue ? My PERC 6/E seems to support SSD''s : # ./MegaCli -AdpAllInfo -a2 | grep -i ssd Enable Copyback to SSD on SMART Error : No Enable SSD Patrol Read : No Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Controller info : Versions ===============Product Name : PERC 6/E Adapter Serial No : xxxxxxxxxxxxxxxx FW Package Build: 6.0.3-0002 Mfg. Data ===============Mfg. Date : 06/08/07 Rework Date : 06/08/07 Revision No : Battery FRU : N/A Image Versions in Flash: ===============FW Version : 1.11.82-0473 BIOS Version : NT13-2 WebBIOS Version : 1.1-32-e_11-Rel Ctrl-R Version : 1.01-010B Boot Block Version : 1.00.00.01-0008 I currently have 2 x Intel X25-E (32 GB) as dedicated slogs and 1 x Intel X25-M (80 GB) for the L2ARC behind a PERC 6/i on my Dell R905 testbox. So far there have been no problems with them.>-Ross > >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Henrik Johansen
2009-Aug-05 07:09 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:>On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > > wrote: > >> On Tue, 4 Aug 2009, Ross Walker wrote: >>>> Are you sure that it is faster than an SSD? The data is indeed >>>> pushed closer to the disks, but there may be considerably more >>>> latency associated with getting that data into the controller >>>> NVRAM cache than there is into a dedicated slog SSD. >>> >>> I don''t see how, as the SSD is behind a controller it still must >>> make it to the controller. >> >> If you take a look at ''iostat -x'' output you will see that the >> system knows about a queue for each device. If it was any other >> way, then a slow device would slow down access to all of the other >> devices. If there is concern about lack of bandwidth (PCI-E?) to >> the controller, then you can use a separate controller for the SSDs. > >It''s not bandwidth. Though with a lot of mirrors that does become a >concern. > >>> Well the duplexing benefit you mention does hold true. That''s a >>> complex real-world scenario that would be hard to benchmark in >>> production. >> >> But easy to see the effects of. > >I actually meant to say, hard to bench out of production. > >>>> Tests done by others show a considerable NFS write speed advantage >>>> when using a dedicated slog SSD rather than a controller''s NVRAM >>>> cache. >>> >>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential >>> write). It''s a Dell PERC 6/e with 512MB onboard. >> >> I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM), >> but that is not very good when the network is good for 100 MB/s. >> With an SSD, some other folks here are getting essentially network >> speed. > >In testing with ram disks I was only able to get a max of around 60MB/ >s with 4k block sizes, with 4 outstanding. > >I can do 64k blocks now and get around 115MB/s.I just ran some filebench microbenchmarks against my 10 Gbit testbox which is a Dell R905, 4 x 2.5 Ghz AMD Quad Core CPU''s and 64 GB RAM. My current pool is comprised of 7 mirror vdevs (SATA disks), 2 Intel X25-E as slogs and 1 Intel X25-M for the L2ARC. The pool is a MD1000 array attached to a PERC 6/E using 2 SAS cables. The nic''s are ixgbe based. Here are the numbers : Randomwrite benchmark - via 10Gbit NFS : IO Summary: 4483228 ops, 73981.2 ops/s, (0/73981 r/w) 578.0mb/s, 44us cpu/op, 0.0ms latency Randomread benchmark - via 10Gbit NFS : IO Summary: 7663903 ops, 126467.4 ops/s, (126467/0 r/w) 988.0mb/s, 5us cpu/op, 0.0ms latency The real question is if these numbers can be trusted - I am currently preparing new test runs with other software to be able to do a comparison.>>> There is still bus and controller plus SSD latency. I suppose one >>> could use a pair of disks as an slog mirror, enable NVRAM just for >>> those and let the others do write-through with their disk caches >> >> But this encounters the problem that when the NVRAM becomes full >> then you hit the wall of synchronous disk write performance. With >> the SSD slog, the write log can be quite large and disk writes are >> then done in a much more efficient ordered fashion similar to non- >> sync writes. > >Yes, you have a point there. > >So, what SSD disks do you use? > >-Ross > > >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Henrik Johansen
2009-Aug-05 07:35 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:>On Aug 4, 2009, at 10:17 PM, James Lever <j at jamver.id.au> wrote: > >> >> On 05/08/2009, at 11:41 AM, Ross Walker wrote: >> >>> What is your recipe for these? >> >> There wasn''t one! ;) >> >> The drive I''m using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3. > >So the key is the drive needs to have the Dell badging to work? > >I called my rep about getting a Dell badged SSD and he told me they >didn''t support those in MD series enclosures so therefore were >unavailable.If the Dell branded SSD''s are Samsung''s then you might want to search the archives - if I remember correctly there were mentionings of less-than-desired performance using them but I cannot recall the details.>Maybe it''s time for a new account rep. > >-Ross > >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Ross Walker
2009-Aug-05 13:10 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 5, 2009, at 2:49 AM, Henrik Johansen <henrik at scannet.dk> wrote:> Ross Walker wrote: >> On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote: >> >>> Ross Walker wrote: >>> >>>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k >>>> sequential write). It''s a Dell PERC 6/e with 512MB onboard. >>> ... >>>> there, dedicated slog device with NVRAM speed. It would be even >>>> better to have a pair of SSDs behind the NVRAM, but it''s hard to >>>> find compatible SSDs for these controllers, Dell currently >>>> doesn''t even support SSDs in their RAID products :-( >>> >>> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support >>> recently. >> >> Yes, but the LSI support of SSDs is on later controllers. > > Sure that''s not just a firmware issue ? > > My PERC 6/E seems to support SSD''s : > # ./MegaCli -AdpAllInfo -a2 | grep -i ssd > Enable Copyback to SSD on SMART Error : No > Enable SSD Patrol Read : No > Allow SSD SAS/SATA Mix in VD : No > Allow HDD/SSD Mix in VD : No > > > Controller info : Versions > ===============> Product Name : PERC 6/E Adapter > Serial No : xxxxxxxxxxxxxxxx > FW Package Build: 6.0.3-0002 > > Mfg. Data > ===============> Mfg. Date : 06/08/07 > Rework Date : 06/08/07 > Revision No : Battery FRU : N/A > > Image Versions in Flash: > ===============> FW Version : 1.11.82-0473 > BIOS Version : NT13-2 > WebBIOS Version : 1.1-32-e_11-Rel > Ctrl-R Version : 1.01-010B > Boot Block Version : 1.00.00.01-0008 > > > I currently have 2 x Intel X25-E (32 GB) as dedicated slogs and 1 x > Intel X25-M (80 GB) for the L2ARC behind a PERC 6/i on my Dell R905 > testbox. > > So far there have been no problems with them.Really? Now you have my interest. Two questions, did you get the X25 from Dell? Are you using it with a hot-swap carrier? Knowing that these will work would be great news. -Ross
Ross Walker
2009-Aug-05 13:14 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On Aug 5, 2009, at 3:09 AM, Henrik Johansen <henrik at scannet.dk> wrote:> Ross Walker wrote: >> On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us >> > wrote: >> >>> On Tue, 4 Aug 2009, Ross Walker wrote: >>>>> Are you sure that it is faster than an SSD? The data is indeed >>>>> pushed closer to the disks, but there may be considerably more >>>>> latency associated with getting that data into the controller >>>>> NVRAM cache than there is into a dedicated slog SSD. >>>> >>>> I don''t see how, as the SSD is behind a controller it still must >>>> make it to the controller. >>> >>> If you take a look at ''iostat -x'' output you will see that the >>> system knows about a queue for each device. If it was any other >>> way, then a slow device would slow down access to all of the >>> other devices. If there is concern about lack of bandwidth (PCI- >>> E?) to the controller, then you can use a separate controller for >>> the SSDs. >> >> It''s not bandwidth. Though with a lot of mirrors that does become >> a concern. >> >>>> Well the duplexing benefit you mention does hold true. That''s a >>>> complex real-world scenario that would be hard to benchmark in >>>> production. >>> >>> But easy to see the effects of. >> >> I actually meant to say, hard to bench out of production. >> >>>>> Tests done by others show a considerable NFS write speed >>>>> advantage when using a dedicated slog SSD rather than a >>>>> controller''s NVRAM cache. >>>> >>>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k >>>> sequential write). It''s a Dell PERC 6/e with 512MB onboard. >>> >>> I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB >>> NVRAM), but that is not very good when the network is good for >>> 100 MB/s. With an SSD, some other folks here are getting >>> essentially network speed. >> >> In testing with ram disks I was only able to get a max of around >> 60MB/ s with 4k block sizes, with 4 outstanding. >> >> I can do 64k blocks now and get around 115MB/s. > > I just ran some filebench microbenchmarks against my 10 Gbit testbox > which is a Dell R905, 4 x 2.5 Ghz AMD Quad Core CPU''s and 64 GB RAM. > > My current pool is comprised of 7 mirror vdevs (SATA disks), 2 Intel > X25-E as slogs and 1 Intel X25-M for the L2ARC. > > The pool is a MD1000 array attached to a PERC 6/E using 2 SAS cables. > > The nic''s are ixgbe based. > > Here are the numbers : > Randomwrite benchmark - via 10Gbit NFS : IO Summary: 4483228 ops, > 73981.2 ops/s, (0/73981 r/w) 578.0mb/s, 44us cpu/op, 0.0ms latency > > Randomread benchmark - via 10Gbit NFS : > IO Summary: 7663903 ops, 126467.4 ops/s, (126467/0 r/w) 988.0mb/s, > 5us cpu/op, 0.0ms latency > > The real question is if these numbers can be trusted - I am currently > preparing new test runs with other software to be able to do a > comparison.Yes, need to make sure it is sync io as NFS clients can still choose to use async and work out of their own cache. -Ross
Henrik Johansen
2009-Aug-05 13:24 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:>On Aug 5, 2009, at 3:09 AM, Henrik Johansen <henrik at scannet.dk> wrote: > >> Ross Walker wrote: >>> On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us >>> > wrote: >>> >>>> On Tue, 4 Aug 2009, Ross Walker wrote: >>>>>> Are you sure that it is faster than an SSD? The data is indeed >>>>>> pushed closer to the disks, but there may be considerably more >>>>>> latency associated with getting that data into the controller >>>>>> NVRAM cache than there is into a dedicated slog SSD. >>>>> >>>>> I don''t see how, as the SSD is behind a controller it still must >>>>> make it to the controller. >>>> >>>> If you take a look at ''iostat -x'' output you will see that the >>>> system knows about a queue for each device. If it was any other >>>> way, then a slow device would slow down access to all of the >>>> other devices. If there is concern about lack of bandwidth (PCI- >>>> E?) to the controller, then you can use a separate controller for >>>> the SSDs. >>> >>> It''s not bandwidth. Though with a lot of mirrors that does become >>> a concern. >>> >>>>> Well the duplexing benefit you mention does hold true. That''s a >>>>> complex real-world scenario that would be hard to benchmark in >>>>> production. >>>> >>>> But easy to see the effects of. >>> >>> I actually meant to say, hard to bench out of production. >>> >>>>>> Tests done by others show a considerable NFS write speed >>>>>> advantage when using a dedicated slog SSD rather than a >>>>>> controller''s NVRAM cache. >>>>> >>>>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k >>>>> sequential write). It''s a Dell PERC 6/e with 512MB onboard. >>>> >>>> I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB >>>> NVRAM), but that is not very good when the network is good for >>>> 100 MB/s. With an SSD, some other folks here are getting >>>> essentially network speed. >>> >>> In testing with ram disks I was only able to get a max of around >>> 60MB/ s with 4k block sizes, with 4 outstanding. >>> >>> I can do 64k blocks now and get around 115MB/s. >> >> I just ran some filebench microbenchmarks against my 10 Gbit testbox >> which is a Dell R905, 4 x 2.5 Ghz AMD Quad Core CPU''s and 64 GB RAM. >> >> My current pool is comprised of 7 mirror vdevs (SATA disks), 2 Intel >> X25-E as slogs and 1 Intel X25-M for the L2ARC. >> >> The pool is a MD1000 array attached to a PERC 6/E using 2 SAS cables. >> >> The nic''s are ixgbe based. >> >> Here are the numbers : >> Randomwrite benchmark - via 10Gbit NFS : IO Summary: 4483228 ops, >> 73981.2 ops/s, (0/73981 r/w) 578.0mb/s, 44us cpu/op, 0.0ms latency >> >> Randomread benchmark - via 10Gbit NFS : >> IO Summary: 7663903 ops, 126467.4 ops/s, (126467/0 r/w) 988.0mb/s, >> 5us cpu/op, 0.0ms latency >> >> The real question is if these numbers can be trusted - I am currently >> preparing new test runs with other software to be able to do a >> comparison. > >Yes, need to make sure it is sync io as NFS clients can still choose >to use async and work out of their own cache.Quick snipped from zpool iostat : mirror 1.12G 695G 0 0 0 0 c8t12d0 - - 0 0 0 0 c8t13d0 - - 0 0 0 0 c7t2d0 4K 29.0G 0 1.56K 0 200M c7t3d0 4K 29.0G 0 1.58K 0 202M The disks on c7 are both Intel X25-E ....>-Ross >-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Henrik Johansen
2009-Aug-05 13:31 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Ross Walker wrote:>On Aug 5, 2009, at 2:49 AM, Henrik Johansen <henrik at scannet.dk> wrote: > >> Ross Walker wrote: >>> On Aug 4, 2009, at 8:36 PM, Carson Gaspar <carson at taltos.org> wrote: >>> >>>> Ross Walker wrote: >>>> >>>>> I get pretty good NFS write speeds with NVRAM (40MB/s 4k >>>>> sequential write). It''s a Dell PERC 6/e with 512MB onboard. >>>> ... >>>>> there, dedicated slog device with NVRAM speed. It would be even >>>>> better to have a pair of SSDs behind the NVRAM, but it''s hard to >>>>> find compatible SSDs for these controllers, Dell currently >>>>> doesn''t even support SSDs in their RAID products :-( >>>> >>>> Isn''t the PERC 6/e just a re-branded LSI? LSI added SSD support >>>> recently. >>> >>> Yes, but the LSI support of SSDs is on later controllers. >> >> Sure that''s not just a firmware issue ? >> >> My PERC 6/E seems to support SSD''s : >> # ./MegaCli -AdpAllInfo -a2 | grep -i ssd >> Enable Copyback to SSD on SMART Error : No >> Enable SSD Patrol Read : No >> Allow SSD SAS/SATA Mix in VD : No >> Allow HDD/SSD Mix in VD : No >> >> >> Controller info : Versions >> ===============>> Product Name : PERC 6/E Adapter >> Serial No : xxxxxxxxxxxxxxxx >> FW Package Build: 6.0.3-0002 >> >> Mfg. Data >> ===============>> Mfg. Date : 06/08/07 >> Rework Date : 06/08/07 >> Revision No : Battery FRU : N/A >> >> Image Versions in Flash: >> ===============>> FW Version : 1.11.82-0473 >> BIOS Version : NT13-2 >> WebBIOS Version : 1.1-32-e_11-Rel >> Ctrl-R Version : 1.01-010B >> Boot Block Version : 1.00.00.01-0008 >> >> >> I currently have 2 x Intel X25-E (32 GB) as dedicated slogs and 1 x >> Intel X25-M (80 GB) for the L2ARC behind a PERC 6/i on my Dell R905 >> testbox. >> >> So far there have been no problems with them. > >Really? > >Now you have my interest. > >Two questions, did you get the X25 from Dell? Are you using it with a >hot-swap carrier? > >Knowing that these will work would be great news.Those disks are not from Dell as they were incapable of delivering Intel SSD''s. Just out of curiosity - do they have to be from Dell ? I have tested the Intel SSD''s on various Dell servers - they work out-of-the-box with both their 2.5" and 3.5" trays (the 3.5" trays do require a SATA interposer which is included with all SATA disks ordered from them).>-Ross >-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Joseph L. Casale
2009-Aug-05 20:11 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
>Quick snipped from zpool iostat : > > mirror 1.12G 695G 0 0 0 0 > c8t12d0 - - 0 0 0 0 > c8t13d0 - - 0 0 0 0 > c7t2d0 4K 29.0G 0 1.56K 0 200M > c7t3d0 4K 29.0G 0 1.58K 0 202M > >The disks on c7 are both Intel X25-E ....Henrik, So the SATA discs are in the MD1000 behind the PERC 6/E and how have you configured/attached the 2 SSD slogs and L2ARC drive? If I understand you, you have sued 14 of the 15 slots in the MD so I assume you have the 3 SSD''s in the R905, what controller are they running on? Thanks! jlc
Henrik Johansen
2009-Aug-05 20:20 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Joseph L. Casale wrote:>>Quick snipped from zpool iostat : >> >> mirror 1.12G 695G 0 0 0 0 >> c8t12d0 - - 0 0 0 0 >> c8t13d0 - - 0 0 0 0 >> c7t2d0 4K 29.0G 0 1.56K 0 200M >> c7t3d0 4K 29.0G 0 1.58K 0 202M >> >>The disks on c7 are both Intel X25-E .... > >Henrik, >So the SATA discs are in the MD1000 behind the PERC 6/E and how >have you configured/attached the 2 SSD slogs and L2ARC drive? If >I understand you, you have sued 14 of the 15 slots in the MD so >I assume you have the 3 SSD''s in the R905, what controller are >they running on?The internal PERC 6/i controller - but I''ve had them on the PERC 6/E during other test runs since I have a couple of spare MD1000''s at hand. Both controllers work well with the SSD''s.>Thanks! >jlc >_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
Stephen Green
2009-Aug-07 00:03 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
erik.ableson wrote:> You''re running into the same problem I had with 2009.06 as they have > "corrected" a bug where the iSCSI target prior to > 2009.06 didn''t honor completely SCSI sync commands issued by the initiator.I think I''ve hit the same thing. I''m using an iscsi volume as the target for Time Machine backups for my new Mac Book Pro using the GlobalSAN initiator. Running against an iscsi volume on my zfs pool, with both the Mac and the Solaris box on gigE, I was seeing the Time Machine backup (of 90GB of data) running at about 600-700KB (yes, KB) per second. This would mean a backup time on the order of (optimistically) 45 hours, so I decided to give your suggestion a go.> For my freewheeling home use where everything gets tried, crashed, > patched and put back together with baling twine (and is backed up > elsewhere...) I''ve mounted a RAM disk of 1Gb which is attached to the > pool as a ZIL and you see the performance run in cycles where the ZIL > loads up to saturation, flushes out to disk and keeps going. I did write > a script to regularly dd the ram disk device out to a file so that I can > recreate with the appropriate signature if I have to reboot the osol > box. This is used with the GlobalSAN initiator on OS X as well as > various Windows and Linux machines, physical and VM. > > Assuming this is a test system that you''re playing with and you can > destroy the pool with inpunity, and you don''t have an SSD lying around > to test with, try the following : > > ramdiskadm -a slog 2g (or whatever size you can manage reasonably with > the available physical RAM - try "vmstat 1 2" to determine available memory) > zpool add <poolname> log /dev/ramdisk/slogI used a 2GB ram disk (the machine has 12GB of RAM) and this jumped the backup up to somewhere between 18-40MB/s, which means that I''m only a couple of hours away from finishing my backup. This is, as far as I can tell, magic (since I started this message nearly 10GB of data have been transferred, when it took from 6am this morning to get to 20GB.) It transfer speed drops like crazy when the write to disk happens, but it jumps right back up afterwards.> If you want to perhaps reuse the slog later (ram disks are not preserved > over reboot) write the slog volume out to disk and dump it back in after > restarting. > dd if=/dev/ramdisk/slog of=/root/slog.ddNow my only question is: what do I do when it''s done? If I reboot and the ram disk disappears, will my tank be dead? Or will it just continue without the slog? I realize that I''m probably totally boned if the system crashes, so I''m copying off the stuff that I really care about to another pool (the Mac''s already been backed up to a USB drive.) Have I meddled in the affairs of wizards? Is ZFS subtle and quick to anger? Steve -- Stephen Green http://blogs.sun.com/searchguy
erik.ableson
2009-Aug-07 07:31 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
On 7 ao?t 09, at 02:03, Stephen Green wrote:> I used a 2GB ram disk (the machine has 12GB of RAM) and this jumped > the backup up to somewhere between 18-40MB/s, which means that I''m > only a couple of hours away from finishing my backup. This is, as > far as I can tell, magic (since I started this message nearly 10GB > of data have been transferred, when it took from 6am this morning to > get to 20GB.) > > It transfer speed drops like crazy when the write to disk happens, > but it jumps right back up afterwards. > >> If you want to perhaps reuse the slog later (ram disks are not >> preserved over reboot) write the slog volume out to disk and dump >> it back in after restarting. >> dd if=/dev/ramdisk/slog of=/root/slog.dd > > Now my only question is: what do I do when it''s done? If I reboot > and the ram disk disappears, will my tank be dead? Or will it just > continue without the slog? I realize that I''m probably totally > boned if the system crashes, so I''m copying off the stuff that I > really care about to another pool (the Mac''s already been backed up > to a USB drive.) > > Have I meddled in the affairs of wizards? Is ZFS subtle and quick > to anger?You have a number of options to preserve the current state of affairs and be able to reboot the OpenSolaris server if required. The absolute safest bet would be the following, but the resilvering will take a while before you''ll be able to shutdown: create a file of the same size of the ramdisk on the rpool volume replace the ramdisk slog with the 2G file (zpool replace <poolname> / dev/ramdisk/slog /root/slogtemp) wait for the resilver/replacement operation to run its course reboot create a new ramdisk (same size, as always) replace the file slog with the newly created ramdisk If your machine reboots unexpectedly things are a little dicier, but you should still be able to get things back online. If you did a dump of the ramdisk via dd to a file it should contain the correct signature and be recognized by ZFS. Now there will be no guarantees to the state of the data since if there was anything actively used on the ramdisk when it stopped you''ll lose data and I''m not sure how the pool will deal with this. But in a pinch, you should be able to either replace the missing ramdisk device with the dd file copy of the ramdisk (make a copy first, just in case) or mount a new ramdisk, and dd the contents of the file back to the device and then import the pool. Cheers, Erik
Stephen Green
2009-Aug-07 15:20 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
erik.ableson wrote:> On 7 ao?t 09, at 02:03, Stephen Green wrote:Man, that looks so nice I think I''ll change my mail client to do dates in French :-)>> Now my only question is: what do I do when it''s done? If I reboot >> and the ram disk disappears, will my tank be dead? Or will it just >> continue without the slog? I realize that I''m probably totally boned >> if the system crashes, so I''m copying off the stuff that I really care >> about to another pool (the Mac''s already been backed up to a USB drive.) >> > You have a number of options to preserve the current state of affairs > and be able to reboot the OpenSolaris server if required. > > The absolute safest bet would be the following, but the resilvering will > take a while before you''ll be able to shutdown: > > create a file of the same size of the ramdisk on the rpool volume > replace the ramdisk slog with the 2G file (zpool replace <poolname> > /dev/ramdisk/slog /root/slogtemp) > wait for the resilver/replacement operation to run its course > reboot > create a new ramdisk (same size, as always) > replace the file slog with the newly created ramdiskWould having an slog as a file on a different pool provide anywhere near the same improvement that I saw by adding a ram disk? Would it affect the typical performance (i.e., reading and writing files in my editor) adversely? That is, could I move the slog to a file and then just leave it there so that I don''t have trouble across reboots? I could then just use the ramdisk when big things happened on the MacBook.> If your machine reboots unexpectedly things are a little dicier, but you > should still be able to get things back online. If you did a dump of > the ramdisk via dd to a file it should contain the correct signature and > be recognized by ZFS. Now there will be no guarantees to the state of > the data since if there was anything actively used on the ramdisk when > it stopped you''ll lose data and I''m not sure how the pool will deal with > this. But in a pinch, you should be able to either replace the missing > ramdisk device with the dd file copy of the ramdisk (make a copy first, > just in case) or mount a new ramdisk, and dd the contents of the file > back to the device and then import the pool.So, I take it if I just do a shutdown, the slog will be emptied appropriately to the pool, but then at startup the slog device will be missing and the system won''t be able to import that pool. If I dd the ramdisk to a file, I suppose that I should use a file on my rpool, right? Thanks for the advice, I think it might be time to convince the wife that I need to buy an SSD. Anyone have recommendations for a reasonably priced SSD for a home box? Steve
Stephen Green
2009-Aug-07 15:37 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote:> Thanks for the advice, I think it might be time to convince the wife > that I need to buy an SSD. Anyone have recommendations for a reasonably > priced SSD for a home box?For example, does anyone know if something like: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227436 manufacturers homepage: http://www.ocztechnology.com/products/solid_state_drives/ocz_minipci_express_ssd-sata_ would work in OpenSolaris? It (apparently) just looks like a SATA disk on the PCIe bus, and the package that they ship it in doesn''t look big enough to have a driver disk in it (and the manufacturer doesn''t provide drivers on their Web site.) Compatibility aside, would a 16GB SSD on a SATA port be a good solution to my problem? My box is a bit shy on SATA ports, but I''ve got lots of PCI ports. Should I get two? It''s only $60, so not such a troublesome sell to my wife. Steve
Scott Meilicke
2009-Aug-07 16:47 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Note - this has a mini PCIe interface, not PCIe. I had the 64GB version in a Dell Mini 9. While it was great for it''s small size, low power and low heat characteristics (no fan on the Mini 9!), it was only faster than the striped sata drives in my mac pro when it came to random reads. Everything else was slower, sometimes by a lot, as measured by XBench. Unfortunately I no longer have the numbers to share. I see the sustained writes listed as up to 25 MB/s, and bursts up to 51 MB/s. That said, I have read of people having good luck with fast CF cards (no ref, sorry). So maybe this will be just fine :) -Scott -- This message posted from opensolaris.org
Stephen Green
2009-Aug-07 17:06 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Scott Meilicke wrote:> Note - this has a mini PCIe interface, not PCIe.Well, that''s an *excellent* point. I guess that lets that one out. It turns out I do have an open SATA port, so I might just go for a disk that has a SATA interface, since that should just work.> I had the 64GB version in a Dell Mini 9. While it was great for it''s > small size, low power and low heat characteristics (no fan on the > Mini 9!), it was only faster than the striped sata drives in my mac > pro when it came to random reads. Everything else was slower, > sometimes by a lot, as measured by XBench. Unfortunately I no longer > have the numbers to share. I see the sustained writes listed as up to > 25 MB/s, and bursts up to 51 MB/s.Hmmm.> That said, I have read of people having good luck with fast CF cards > (no ref, sorry). So maybe this will be just fine :)That doohickey that took 4 CF cards that I saw on the list earlier this week was kind of interesting. Oh, and for those following along at home, the re-silvering of the slog to a file is proceeding well. 72% done in 25 minutes. Steve
Stephen Green
2009-Aug-07 18:38 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote:> Oh, and for those following along at home, the re-silvering of the slog > to a file is proceeding well. 72% done in 25 minutes.And, for the purposes of the archives, the re-silver finished in 34 minutes and I successfully removed the RAM disk. Thanks, Erik for the eminently followable instructions. Also, I got my wife to agree to a new SSD, so I presume that I can simply do the re-silver with the new drive when it arrives. Can I replace a log with a larger one? Can I partition the SSD (looks like I''ll be getting a 32GB one) and use half for cache and half for log? Even if I can, should I? Steve
Stephen Green
2009-Aug-07 23:41 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote:> Also, I got my wife to agree to a new SSD, so I presume that I can > simply do the re-silver with the new drive when it arrives.And the last thing for today, I ended up getting: http://www.newegg.com/Product/Product.aspx?Item=N82E16820609330 which is 16GB and should be sufficient to my needs. I''ll let you know how it works out. Suggestions as to pre/post installation IO tests welcome. Steve
Stephen Green
2009-Aug-12 14:07 UTC
[zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906
Stephen Green wrote:> I''ll let you know > how it works out. Suggestions as to pre/post installation IO tests > welcome.The installation went off without a hitch (modulo a bad few seconds after reboot.) Story here: http://blogs.sun.com/searchguy/entry/homebrew_hybrid_storage_pool I''ve got one problem that I need to deal with, but I''m going to put that in a separate thread. Steve -- Stephen Green // Stephen.Green at sun.com Principal Investigator \\ http://blogs.sun.com/searchguy Advanced Search Technologies Group // Voice: +1 781-442-0926 Sun Microsystems Labs \\ Fax: +1 781-442-1692