Chris Greer
2008-Nov-22 17:41 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
So to give a little background on this, we have been benchmarking Oracle RAC on Linux vs. Oracle on Solaris. In the Solaris test, we are using vxvm and vxfs. We noticed that the same Oracle TPC benchmark at roughly the same transaction rate was causing twice as many disk I/O''s to the backend DMX4-1500. So we concluded this is pretty much either Oracle is very different in RAC, or our filesystems may be the culprits. This testing is wrapping up (it all gets dismantled Monday), so we took the time to run a simulated disk I/O test with an 8K IO size. vxvm with vxfs we achieved 2387 IOPS vxvm with ufs we achieved 4447 IOPS ufs on disk devices we achieved 4540 IOPS zfs we achieved 1232 IOPS The only zfs tunings we have done are setting set zfs:zfs_nocache=1 in /etc/system and changing the recordsize to be 8K to match the test. I think the files we are using in the test were created before we changed the recordsize, so I deleted them and recreated them and have started the other test...but does anyone have any other ideas? This is my first experience with ZFS with a comercial RAID array and so far it''s not that great. For those interested, we are using the iorate command from EMC for the benchmark. For the different test, we have 13 luns presented. Each one is its own volume and filesystem and a singel file on those filesystems. We are running 13 iorate processes in parallel (there is no cpu bottleneck in this either). For zfs, we put all those luns in a pool with no redundancy and created 13 filesystems and still running 13 iorate processes. we are running Solaris 10U6 -- This message posted from opensolaris.org
Chris Greer
2008-Nov-22 18:11 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
that should be set zfs:zfs_nocacheflush=1 in the post above...that was my typo in the post. -- This message posted from opensolaris.org
Chris Greer
2008-Nov-22 18:13 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
zfs with the datafiles recreated after the recordsize change was 3079 IOPS So now we are at least in the ballpark. -- This message posted from opensolaris.org
Dale Ghent
2008-Nov-22 18:47 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Are you putting your archive and redo logs on a separate zpool (not just a different zfs fs with the same pool as your data files) ? Are you using direct io at all in any of the config scenarios you listed? /dale On Nov 22, 2008, at 12:41 PM, Chris Greer wrote:> So to give a little background on this, we have been benchmarking > Oracle RAC on Linux vs. Oracle on Solaris. In the Solaris test, we > are using vxvm and vxfs. > We noticed that the same Oracle TPC benchmark at roughly the same > transaction rate was causing twice as many disk I/O''s to the backend > DMX4-1500. > > So we concluded this is pretty much either Oracle is very different > in RAC, or our filesystems may be the culprits. This testing is > wrapping up (it all gets dismantled Monday), so we took the time to > run a simulated disk I/O test with an 8K IO size. > > > vxvm with vxfs we achieved 2387 IOPS > vxvm with ufs we achieved 4447 IOPS > ufs on disk devices we achieved 4540 IOPS > zfs we achieved 1232 IOPS > > The only zfs tunings we have done are setting set zfs:zfs_nocache=1 > in /etc/system and changing the recordsize to be 8K to match the test. > > I think the files we are using in the test were created before we > changed the recordsize, so I deleted them and recreated them and > have started the other test...but does anyone have any other ideas? > > This is my first experience with ZFS with a comercial RAID array and > so far it''s not that great. > > For those interested, we are using the iorate command from EMC for > the benchmark. For the different test, we have 13 luns presented. > Each one is its own volume and filesystem and a singel file on those > filesystems. We are running 13 iorate processes in parallel (there > is no cpu bottleneck in this either). > > For zfs, we put all those luns in a pool with no redundancy and > created 13 filesystems and still running 13 iorate processes. > > we are running Solaris 10U6 > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Todd Stansell
2008-Nov-22 18:58 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
> For those interested, we are using the iorate command from EMC for > the benchmark. For the different test, we have 13 luns presented. > Each one is its own volume and filesystem and a singel file on those > filesystems. We are running 13 iorate processes in parallel (there > is no cpu bottleneck in this either). > > For zfs, we put all those luns in a pool with no redundancy and > created 13 filesystems and still running 13 iorate processes.This doesn''t seem like an apples-to-apples comparison, unless I''m misunderstanding. If you put all of those luns in a single pool for zfs, you should similarly put all of them in a single volume for vxvm. Todd
Chris Greer
2008-Nov-22 20:07 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Right now we are not using Oracle...we are using iorate so we don''t have separate logs. When the testing was with Oracle the logs were separate. This test represents the 13 data luns that we had during those test. The reason it wasn''t striped with vxvm is that the original comparison test was vxvm + vxfs compared to Oracle RAC on linux with ocfs. On the linux side we don''t have a volume manager, so the database has to do the striping across the separate datafiles. The only way I could mimic that with zfs would be to create 13 separate zpools and that sounded pretty painful. Again, the thing that led us down this path was the the Oracle RAC on Linux accompished slightly more transactions but only required 1/2 the I/O''s to the array to do so. The Sun test, actually bottlenecked on the backend disk and had plenty of CPU left on the host. So if the I/O bottleneck is actually the vxfs filesystem causing more I/O to the backend, and we can fix that with a different filesystem, then the Sun box may beat the Linux RAC. But our initial testing has shown that vxfs is all it''s cracked up to be with respect to databases (yes we tried the database edition too and the performance actually got slightly worse). -- This message posted from opensolaris.org
Bob Friesenhahn
2008-Nov-22 21:44 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
On Sat, 22 Nov 2008, Chris Greer wrote:> zfs with the datafiles recreated after the recordsize change was 3079 IOPS > So now we are at least in the ballpark.ZFS is optimized for fast bulk data storage and data integrity and not so much for transactions. It seems that adding a non-volatile hardware cache device can help quite a lot, but you may need to use OpenSolaris to fully take advantage of it. It is important to consider how fast things will be a month or two from now so it may be necessary to run the benchmark for quite some time in order to see how performance degrades. The 3079 IOPS is probably the limit of what your current hardware can do with ZFS. I see a bit over 3100 here for random synchronous writers using 12 disks (arranged as six mirror pairs) and 8 writers. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2008-Nov-23 04:35 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
Chris Greer wrote:> Right now we are not using Oracle...we are using iorate so we don''t have separate logs. When the testing was with Oracle the logs were separate. This test represents the 13 data luns that we had during those test. > > The reason it wasn''t striped with vxvm is that the original comparison test was vxvm + vxfs compared to Oracle RAC on linux with ocfs.You can''t use ZFS directly for Oracle RAC, so perhaps you should test those things which might work for your application? -- richard
Mike Gerdts
2008-Nov-23 16:43 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
On Sat, Nov 22, 2008 at 11:41 AM, Chris Greer <pcgreer at fedex.com> wrote:> vxvm with vxfs we achieved 2387 IOPSIn this combination you should be using odm, which comes as part of the Storage Foundation for Oracle or Storage Foundation for Oracle RAC products. It makes the database files on vxfs behave much like they live on raw devices and tends to allow much higher transaction rate with fewer physical I/O''s and less kernel (%sys) utilization. The concept is similar to but different than direct I/O. This behavior is hard, if not impossible, to test without Oracle in the mix because (AFAIK) oracle is the only thing that knows how to make use of the odm interface.> vxvm with ufs we achieved 4447 IOPS > ufs on disk devices we achieved 4540 IOPS > zfs we achieved 1232 IOPSWhen you say RAC, I assume you mean multi-instance (clustered) databases. None of those are cluster file systems and as such are worthless for multi-instance oracle databases which require a shared file system. On Linux, you say that you were using ocfs. Where you really using ocfs, or were the databases really in ASM? Oracle''s recommendation (last I knew) was to have executables on ocfs and have databases in ASM. Have you tried ASM on Solaris? It should give you a lot of the benefits you would expect from ZFS (pooled storage, incremental backups, (I think) efficient snapshots). It will only work for oracle database files (and indexes, etc.) and should work for clustered storage as well. -- Mike Gerdts http://mgerdts.blogspot.com/
Tomer Gurantz
2008-Nov-24 06:46 UTC
[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help
I would add that you didn''t mention what if any optimizations you made with vxfs. Specifically, a default vxfs file system will have a file system block size of 1k, 2k, 4k, or 8k, depending on the file system size. Since you are using Oracle, you should always set the file system block size to 8k, irrelevant of the file systems size, due to Oracle I/O patterns. (You would do this using the vxfs mkfs option "-o bsize=8192"). Also, the "odm" comment that Mike mentions, below, is important, as vxfs is an odm-compliant file system. Before Oracle''s odm, people would often use vxfs with it''s Quick I/O feature, which enables individual files to be accessed as raw devices directly (again, different in subtle ways from Direct I/O). See the Storage Foundation for Oracle documentation off of Symantec''s website. And as Mike mentions, for Oracle RAC, we would probably assume that meant you''d be using multiple Oracle instances on different servers writing to the same shared database, which would imply that you will be using the Clustered Volume Manager (CVM) and Clustered File System (CFS) - which is vxvm and vxfs + the ability to allow concurrent access from multiple hosts (which of course is an additional license, aka $$$). Cheers, Tomer -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Mike Gerdts Sent: Monday, 24 November 2008 3:44 AM To: Chris Greer Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help On Sat, Nov 22, 2008 at 11:41 AM, Chris Greer <pcgreer at fedex.com> wrote:> vxvm with vxfs we achieved 2387 IOPSIn this combination you should be using odm, which comes as part of the Storage Foundation for Oracle or Storage Foundation for Oracle RAC products. It makes the database files on vxfs behave much like they live on raw devices and tends to allow much higher transaction rate with fewer physical I/O''s and less kernel (%sys) utilization. The concept is similar to but different than direct I/O. This behavior is hard, if not impossible, to test without Oracle in the mix because (AFAIK) oracle is the only thing that knows how to make use of the odm interface.> vxvm with ufs we achieved 4447 IOPS > ufs on disk devices we achieved 4540 IOPS > zfs we achieved 1232 IOPSWhen you say RAC, I assume you mean multi-instance (clustered) databases. None of those are cluster file systems and as such are worthless for multi-instance oracle databases which require a shared file system. On Linux, you say that you were using ocfs. Where you really using ocfs, or were the databases really in ASM? Oracle''s recommendation (last I knew) was to have executables on ocfs and have databases in ASM. Have you tried ASM on Solaris? It should give you a lot of the benefits you would expect from ZFS (pooled storage, incremental backups, (I think) efficient snapshots). It will only work for oracle database files (and indexes, etc.) and should work for clustered storage as well. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss