I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
You have a 6140 with SAS drives ?! When did this happen? On Nov 17, 2007 12:30 AM, Asif Iqbal <vadud3 at gmail.com> wrote:> I have the following layout > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > A1 anfd B1 controller port 4Gbps speed. > Each controller has 2G NVRAM > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > On 490 I created a zpool with 8 4+1 raidz1s > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > /etc/system > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > I also have write cache eanble according to CAM > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Nov 17, 2007 9:12 AM, Louwtjie Burger <zabermeister at gmail.com> wrote:> You have a 6140 with SAS drives ?! When did this happen?OOPS! I meant FC-AL> > > > On Nov 17, 2007 12:30 AM, Asif Iqbal <vadud3 at gmail.com> wrote: > > I have the following layout > > > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > > A1 anfd B1 controller port 4Gbps speed. > > Each controller has 2G NVRAM > > > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > > > On 490 I created a zpool with 8 4+1 raidz1s > > > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > > /etc/system > > > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > > > I also have write cache eanble according to CAM > > > > -- > > Asif Iqbal > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
(Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST3300007FC (300GB - 10000 RPM FC-AL) I created 16k seg size raid0 luns using single fcal disks. Then created a zpool with 8 4+1 raidz1 using those luns, out of single disks. Also set the zfs nocache flush to `1'' to take advantage of the 2G NVRAM cache of the controllers. I am using one port per controller. Rest of them are down (not in use). Each controller port speed is 4Gbps. All luns have one controller as primary and second one as secondary I am getting only 125MB/s according to the zpool IO. I should get ~ 512MB/s per IO. Also is it possible to get 2GB/s IO by using the leftover ports of the controllers? Is it also possible to get 4GB/s IO by aggregating the controllers (w/ 8 ports totat)? On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote:> I have the following layout > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > A1 anfd B1 controller port 4Gbps speed. > Each controller has 2G NVRAM > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > On 490 I created a zpool with 8 4+1 raidz1s > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > /etc/system > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > I also have write cache eanble according to CAM > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Torrey McMahon
2007-Nov-17 19:55 UTC
[zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
Have you tried disabling the zil cache flushing? http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes Asif Iqbal wrote:> (Including storage-discuss) > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > ST3300007FC (300GB - 10000 RPM FC-AL) > > I created 16k seg size raid0 luns using single fcal disks. Then > created a zpool with 8 4+1 raidz1 using those luns, out of single > disks. Also set the zfs nocache flush to `1'' to > take advantage of the 2G NVRAM cache of the controllers. > > I am using one port per controller. Rest of them are down (not in > use). Each controller port > speed is 4Gbps. > > All luns have one controller as primary and second one as secondary > > I am getting only 125MB/s according to the zpool IO. > > I should get ~ 512MB/s per IO. > > Also is it possible to get 2GB/s IO by using the leftover ports of the > controllers? > > Is it also possible to get 4GB/s IO by aggregating the controllers (w/ > 8 ports totat)? > > > > On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > >> I have the following layout >> >> A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using >> A1 anfd B1 controller port 4Gbps speed. >> Each controller has 2G NVRAM >> >> On 6140s I setup raid0 lun per SAS disks with 16K segment size. >> >> On 490 I created a zpool with 8 4+1 raidz1s >> >> I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in >> /etc/system >> >> Is there a way I can improve the performance. I like to get 1GB/sec IO. >> >> Currently each lun is setup as primary A1 and secondary B1 or vice versa >> >> I also have write cache eanble according to CAM >> >> -- >> Asif Iqbal >> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu >> >> > > > >
Asif Iqbal
2007-Nov-17 22:00 UTC
[zfs-discuss] [perf-discuss] zpool io to 6140 is really slow
Looks like max IO write I get is ~ 194MB/s on controller c6, where the zpool is on On Nov 17, 2007 3:29 PM, adrian cockcroft <adrian.cockcroft at gmail.com> wrote:> What do you get from iostat? Try something like > > % iostat -xnMCez 10 10 > > (extended, named, Mbyte, controller, errors, nonzero, interval 10 > secs, 10 measurements) > > Post the results and you may get more commentary... > > Adrian > > > On 11/17/07, Asif Iqbal <vadud3 at gmail.com> wrote: > > (Including storage-discuss) > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > I created 16k seg size raid0 luns using single fcal disks. Then > > created a zpool with 8 4+1 raidz1 using those luns, out of single > > disks. Also set the zfs nocache flush to `1'' to > > take advantage of the 2G NVRAM cache of the controllers. > > > > I am using one port per controller. Rest of them are down (not in > > use). Each controller port > > speed is 4Gbps. > > > > All luns have one controller as primary and second one as secondary > > > > I am getting only 125MB/s according to the zpool IO. > > > > I should get ~ 512MB/s per IO. > > > > Also is it possible to get 2GB/s IO by using the leftover ports of the > > controllers? > > > > Is it also possible to get 4GB/s IO by aggregating the controllers (w/ > > 8 ports totat)? > > > > > > > > On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > I have the following layout > > > > > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > > > A1 anfd B1 controller port 4Gbps speed. > > > Each controller has 2G NVRAM > > > > > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > > > > > On 490 I created a zpool with 8 4+1 raidz1s > > > > > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > > > /etc/system > > > > > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > > > > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > > > > > I also have write cache eanble according to CAM > > > > > > -- > > > Asif Iqbal > > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > > > > > > > > > > -- > > Asif Iqbal > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > _______________________________________________ > > perf-discuss mailing list > > perf-discuss at opensolaris.org > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: iostat.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071117/b6757b15/attachment.txt>
Asif Iqbal
2007-Nov-18 01:19 UTC
[zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 17, 2007 2:55 PM, Torrey McMahon <tmcmahon2 at yahoo.com> wrote:> Have you tried disabling the zil cache flushing?I already have zfs nocache flush set to 1 to take advantage of NVRAM of the raid controllers set zfs:zfs_nocacheflush = 1> > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes > > > Asif Iqbal wrote: > > (Including storage-discuss) > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > I created 16k seg size raid0 luns using single fcal disks. Then > > created a zpool with 8 4+1 raidz1 using those luns, out of single > > disks. Also set the zfs nocache flush to `1'' to > > take advantage of the 2G NVRAM cache of the controllers. > > > > I am using one port per controller. Rest of them are down (not in > > use). Each controller port > > speed is 4Gbps. > > > > All luns have one controller as primary and second one as secondary > > > > I am getting only 125MB/s according to the zpool IO. > > > > I should get ~ 512MB/s per IO. > > > > Also is it possible to get 2GB/s IO by using the leftover ports of the > > controllers? > > > > Is it also possible to get 4GB/s IO by aggregating the controllers (w/ > > 8 ports totat)? > > > > > > > > On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > >> I have the following layout > >> > >> A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > >> A1 anfd B1 controller port 4Gbps speed. > >> Each controller has 2G NVRAM > >> > >> On 6140s I setup raid0 lun per SAS disks with 16K segment size. > >> > >> On 490 I created a zpool with 8 4+1 raidz1s > >> > >> I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > >> /etc/system > >> > >> Is there a way I can improve the performance. I like to get 1GB/sec IO. > >> > >> Currently each lun is setup as primary A1 and secondary B1 or vice versa > >> > >> I also have write cache eanble according to CAM > >> > >> -- > >> Asif Iqbal > >> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > >> > >> > > > > > > > > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Louwtjie Burger
2007-Nov-19 06:43 UTC
[zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote:> (Including storage-discuss) > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > ST3300007FC (300GB - 10000 RPM FC-AL)Those disks are 2Gb disks, so the tray will operate at 2Gb.> I created 16k seg size raid0 luns using single fcal disks. ThenYou "showed" the single disks as LUN''s to the host... if I understand correctly. Q: Why 16K?> created a zpool with 8 4+1 raidz1 using those luns, out of singleWhat is the layout here? Inside 1 tray, over multiple trays?> disks. Also set the zfs nocache flush to `1'' to > take advantage of the 2G NVRAM cache of the controllers. > > I am using one port per controller. Rest of them are down (not in > use). Each controller port > speed is 4Gbps. >The 6140 is assymetric and as such the second controller will be available in fail-over mode, it is not actively used for load balancing. You need to hook up more FC links to the primary controller that has the active LUN''s assigned, that is the only way to easily get more IOP''s.> All luns have one controller as primary and second one as secondary > > I am getting only 125MB/s according to the zpool IO. >Seems a tad low, how are you testing?> I should get ~ 512MB/s per IO.Hmmm, how did you get to this total? Keeping in mind that your tray is sitting at 2Gb and your extensions to the CSM trays are all single channel... you will get a 2Gb ceiling. Also have a look at http://en.wikipedia.org/wiki/Fibre_Channel#History At first glance and not knowing the exact setup I would say that you will not get more than 200MB/s (if that much). Any reason why you are not using the RAID controller to do the work for you?> Also is it possible to get 2GB/s IO by using the leftover ports of the > controllers? > > Is it also possible to get 4GB/s IO by aggregating the controllers (w/ > 8 ports totat)? > > > > On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > I have the following layout > > > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > > A1 anfd B1 controller port 4Gbps speed. > > Each controller has 2G NVRAM > > > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > > > On 490 I created a zpool with 8 4+1 raidz1s > > > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > > /etc/system > > > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > > > I also have write cache eanble according to CAM > > > > -- > > Asif Iqbal > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > > > > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > _______________________________________________ > storage-discuss mailing list > storage-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/storage-discuss >
Asif Iqbal wrote:> I have the following layout > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > A1 anfd B1 controller port 4Gbps speed. > Each controller has 2G NVRAM > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > On 490 I created a zpool with 8 4+1 raidz1s > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > /etc/system > > Is there a way I can improve the performance. I like to get 1GB/sec IO. >I don''t believe a V490 is capable of driving 1 GByte/s of I/O. The V490 has two schizos and the schizo is not a full speed bridge. For more information see Section 1.2 of: http://www.sun.com/processors/manuals/External_Schizo_PRM.pdf -- richard> Currently each lun is setup as primary A1 and secondary B1 or vice versa > > I also have write cache eanble according to CAM > >
Asif Iqbal
2007-Nov-20 06:16 UTC
[zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> wrote:> On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > (Including storage-discuss) > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > ST3300007FC (300GB - 10000 RPM FC-AL) > > Those disks are 2Gb disks, so the tray will operate at 2Gb. >That is still 256MB/s . I am getting about 194MB/s> > I created 16k seg size raid0 luns using single fcal disks. Then > > You "showed" the single disks as LUN''s to the host... if I understand correctly.Yes> > Q: Why 16K?To avoid segment crossing. It will mainly be used fro oracle db whose block size is 16K> > > created a zpool with 8 4+1 raidz1 using those luns, out of single > > What is the layout here? Inside 1 tray, over multiple trays?Over multiple trays> > > disks. Also set the zfs nocache flush to `1'' to > > take advantage of the 2G NVRAM cache of the controllers. > > > > I am using one port per controller. Rest of them are down (not in > > use). Each controller port > > speed is 4Gbps. > > > > The 6140 is assymetric and as such the second controller will be > available in fail-over mode, it is not actively used for load > balancing.So there is no way to create a aggreated channel off of both controllers?> > You need to hook up more FC links to the primary controller that has > the active LUN''s assigned, that is the only way to easily get more > IOP''s.Adding a second loop by adding another non active port I may have to rebuild the FS, no?> > All luns have one controller as primary and second one as secondary > > > > I am getting only 125MB/s according to the zpool IO. > > > > Seems a tad low, how are you testing? > > > I should get ~ 512MB/s per IO. > > Hmmm, how did you get to this total? Keeping in mind that your tray is > sitting at 2Gb and your extensions to the CSM trays are all single > channel... you will get a 2Gb ceiling. Also have a look atEven for the OS IO? So the controller nvram does not help increase the IO for OS?> http://en.wikipedia.org/wiki/Fibre_Channel#History > > At first glance and not knowing the exact setup I would say that you > will not get more than 200MB/s (if that much).I am gettin 194MB/s. Hmm my 490 has 16G memory. I really I could benefit some from OS and controller RAM, atleast for Oracle IO> > Any reason why you are not using the RAID controller to do the work for you?They are raid0 luns. So raid controller is in use. I get higher IO from zpool off of raid0 luns of single disks then raid5 type lun or raid0 among multilple disks as one lun and then zpool on top> > > Also is it possible to get 2GB/s IO by using the leftover ports of the > > controllers? > > > > Is it also possible to get 4GB/s IO by aggregating the controllers (w/ > > 8 ports totat)? > > > > > > > > On Nov 16, 2007 5:30 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > I have the following layout > > > > > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > > > A1 anfd B1 controller port 4Gbps speed. > > > Each controller has 2G NVRAM > > > > > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > > > > > On 490 I created a zpool with 8 4+1 raidz1s > > > > > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > > > /etc/system > > > > > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > > > > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > > > > > I also have write cache eanble according to CAM > > > > > > -- > > > Asif Iqbal > > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > > > > > > > > > > -- > > Asif Iqbal > > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > _______________________________________________ > > storage-discuss mailing list > > storage-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/storage-discuss > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
On Nov 19, 2007 11:47 PM, Richard Elling <Richard.Elling at sun.com> wrote:> Asif Iqbal wrote: > > I have the following layout > > > > A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using > > A1 anfd B1 controller port 4Gbps speed. > > Each controller has 2G NVRAM > > > > On 6140s I setup raid0 lun per SAS disks with 16K segment size. > > > > On 490 I created a zpool with 8 4+1 raidz1s > > > > I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in > > /etc/system > > > > Is there a way I can improve the performance. I like to get 1GB/sec IO. > > > > I don''t believe a V490 is capable of driving 1 GByte/s of I/O.Well I am getting ~190MB/s right now. I sure not hitting any where close to that ceiling> The V490 has two schizos and the schizo is not a full speed > bridge. For more information see Section 1.2 of: > http://www.sun.com/processors/manuals/External_Schizo_PRM.pdf > > -- richard > > > Currently each lun is setup as primary A1 and secondary B1 or vice versa > > > > I also have write cache eanble according to CAM > > > > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Chad Mynhier
2007-Nov-20 12:01 UTC
[zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote:> On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> wrote: > > On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > (Including storage-discuss) > > > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > Those disks are 2Gb disks, so the tray will operate at 2Gb. > > > > That is still 256MB/s . I am getting about 194MB/s2Gb fibre channel is going to max out at a data transmission rate around 200MB/s rather than the 256MB/s that you''d expect. Fibre channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data in 10 bits on the wire. So while 256MB/s is being transmitted on the connection itself, only 200MB/s of that is the data that you''re transmitting. Chad Mynhier
Asif Iqbal
2007-Nov-20 14:56 UTC
[zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 20, 2007 1:48 AM, Louwtjie Burger <zabermeister at gmail.com> wrote:> > > > That is still 256MB/s . I am getting about 194MB/s > > No, I don''t think you can take 2Gbit / 8bits per byte and say 256MB is > what you should get... > Someone with far more FC knowledge can comment here. There must be > some overhead in transporting data (as with regular SCSI) ... in the > same way ULTRA 320MB SCSI never yields close to 320 MB/s ... even > though it might seem so. > > > Adding a second loop by adding another non active port I may have to rebuild the > > FS, no? > > No. Use MPXio to help you out here ... Solaris will see the same LUN''s > on each of the 2,3 or 4 ports on the primary controller ... but with > multi-pathing switched on will only give you 1 vhci LUN to work with. > > What I would do is export the zpool(s). Hook up more links to the > primary and enable scsi_vhci. Reboot and look for the new cX vhci > devices. > > zpool import should rebuilt the pools from the multipath devices just fine. > > Interesting test though. > > > I am gettin 194MB/s. Hmm my 490 has 16G memory. I really I could benefit some > > from OS and controller RAM, atleast for Oracle IO > > Close to 200MB seems good from 1 x 2Gb.Should I not gain a lot (I am not getting any) of performance gain with 2 x 2GB RAM on my raid controllers NVRAM?> > Something else to try ... when creating hardware LUNs, one can assign > the LUN to either controller A or B (as preferred or owner). By doing > assignments one can use the secondary controller ... you are going to > then "stripe" over controllers .. as one way of looking at it. > > PS: Is this a direct connection? Switched fabric? >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Asif Iqbal
2007-Nov-20 14:58 UTC
[zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 20, 2007 7:01 AM, Chad Mynhier <cmynhier at gmail.com> wrote:> On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote: > > On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> wrote: > > > On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > > (Including storage-discuss) > > > > > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > > > Those disks are 2Gb disks, so the tray will operate at 2Gb. > > > > > > > That is still 256MB/s . I am getting about 194MB/s > > 2Gb fibre channel is going to max out at a data transmission rateBut I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of 300GB FC 10K rpm (2Gb/s) disks So I should get "a lot" more than ~ 200MB/s. Shouldn''t I?> around 200MB/s rather than the 256MB/s that you''d expect. Fibre > channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data > in 10 bits on the wire. So while 256MB/s is being transmitted on the > connection itself, only 200MB/s of that is the data that you''re > transmitting. > > Chad Mynhier >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Chad Mynhier
2007-Nov-20 15:06 UTC
[zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote:> On Nov 20, 2007 7:01 AM, Chad Mynhier <cmynhier at gmail.com> wrote: > > On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote: > > > On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> wrote: > > > > On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > > > (Including storage-discuss) > > > > > > > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > > > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > > > > > Those disks are 2Gb disks, so the tray will operate at 2Gb. > > > > > > > > > > That is still 256MB/s . I am getting about 194MB/s > > > > 2Gb fibre channel is going to max out at a data transmission rate > > around 200MB/s rather than the 256MB/s that you''d expect. Fibre > > channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data > > in 10 bits on the wire. So while 256MB/s is being transmitted on the > > connection itself, only 200MB/s of that is the data that you''re > > transmitting. > > But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of > 300GB FC 10K rpm (2Gb/s) disks > > So I should get "a lot" more than ~ 200MB/s. Shouldn''t I?Here, I''m relying on what Louwtjie said above, that the tray itself is going to be limited to 2Gb/s because of the 2Gb/s FC disks. Chad Mynhier
What kind of workload are you running. If you are you doing these measurements with some sort of "write as fast as possible" microbenchmark, once the 4 GB of nvram is full, you will be limited by backend performance (FC disks and their interconnect) rather than the host / controller bus. Since, best case, 4 gbit FC can transfer 4 GBytes of data in about 10 seconds, you will fill it up, even with the backend writing out data as fast as it can, in about 20 seconds. Once the nvram is full, you will only see the backend (e.g. 2 Gbit) rate. The reason these controller buffers are useful with real applications is that they smooth the bursts of writes that real applications tend to generate, thus reducing the latency of those writes and improving performance. They will then "catch up" during periods when few writes are being issued. But a typical microbenchmark that pumps out a steady stream of writes won''t see this benefit. Drew Wilson Asif Iqbal wrote: On Nov 20, 2007 7:01 AM, Chad Mynhier wrote: On 11/20/07, Asif Iqbal wrote: On Nov 19, 2007 1:43 AM, Louwtjie Burger wrote: On Nov 17, 2007 9:40 PM, Asif Iqbal wrote: (Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST3300007FC (300GB - 10000 RPM FC-AL) Those disks are 2Gb disks, so the tray will operate at 2Gb. That is still 256MB/s . I am getting about 194MB/s 2Gb fibre channel is going to max out at a data transmission rate But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of 300GB FC 10K rpm (2Gb/s) disks So I should get "a lot" more than ~ 200MB/s. Shouldn''t I? around 200MB/s rather than the 256MB/s that you''d expect. Fibre channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data in 10 bits on the wire. So while 256MB/s is being transmitted on the connection itself, only 200MB/s of that is the data that you''re transmitting. Chad Mynhier
Andrew Wilson
2007-Nov-20 16:06 UTC
Re: [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
And, just to add one more point, since pretty much everything the host writes to the controller eventually has to make it out to the disk drives, the long term average write rate cannot exceed the rate that the backend disk subsystem can absorb the writes, regardless of the workload. (An exception is if the controller can combine some overlapping writes). Basically just like putting water into a reservoir at twice the rate it is being withdrawn, the reservoir will eventually overflow! At least in this case the controller can limit the input from the host and avoid an actual data overflow situation. Drew Andrew Wilson wrote: What kind of workload are you running. If you are you doing these measurements with some sort of "write as fast as possible" microbenchmark, once the 4 GB of nvram is full, you will be limited by backend performance (FC disks and their interconnect) rather than the host / controller bus. Since, best case, 4 gbit FC can transfer 4 GBytes of data in about 10 seconds, you will fill it up, even with the backend writing out data as fast as it can, in about 20 seconds. Once the nvram is full, you will only see the backend (e.g. 2 Gbit) rate. The reason these controller buffers are useful with real applications is that they smooth the bursts of writes that real applications tend to generate, thus reducing the latency of those writes and improving performance. They will then "catch up" during periods when few writes are being issued. But a typical microbenchmark that pumps out a steady stream of writes won''t see this benefit. Drew Wilson Asif Iqbal wrote: >On Nov 20, 2007 7:01 AM, Chad Mynhier wrote: > > >>On 11/20/07, Asif Iqbal wrote: >> >> >>>On Nov 19, 2007 1:43 AM, Louwtjie Burger wrote: >>> >>> >>>>On Nov 17, 2007 9:40 PM, Asif Iqbal wrote: >>>> >>>> >>>>>(Including storage-discuss) >>>>> >>>>>I have 6 6140s with 96 disks. Out of which 64 of them are Seagate >>>>>ST3300007FC (300GB - 10000 RPM FC-AL) >>>>> >>>>> >>>>Those disks are 2Gb disks, so the tray will operate at 2Gb. >>>> >>>> >>>> >>>That is still 256MB/s . I am getting about 194MB/s >>> >>> >>2Gb fibre channel is going to max out at a data transmission rate >> >> > >But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of >300GB FC 10K rpm (2Gb/s) disks > >So I should get "a lot" more than ~ 200MB/s. Shouldn''t I? > > > > >>around 200MB/s rather than the 256MB/s that you''d expect. Fibre >>channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data >>in 10 bits on the wire. So while 256MB/s is being transmitted on the >>connection itself, only 200MB/s of that is the data that you''re >>transmitting. >> >>Chad Mynhier >> >> >> > > > > > _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Asif Iqbal
2007-Nov-20 20:08 UTC
[zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 20, 2007 10:40 AM, Andrew Wilson <Andrew.W.Wilson at sun.com> wrote:> > What kind of workload are you running. If you are you doing these > measurements with some sort of "write as fast as possible" microbenchmark,Oracle database with blocksize 16K .. populating the database as fast I can> once the 4 GB of nvram is full, you will be limited by backend performance > (FC disks and their interconnect) rather than the host / controller bus. > > Since, best case, 4 gbit FC can transfer 4 GBytes of data in about 10 > seconds, you will fill it up, even with the backend writing out data as fast > as it can, in about 20 seconds. Once the nvram is full, you will only see > the backend (e.g. 2 Gbit) rate. > > The reason these controller buffers are useful with real applications is > that they smooth the bursts of writes that real applications tend to > generate, thus reducing the latency of those writes and improving > performance. They will then "catch up" during periods when few writes are > being issued. But a typical microbenchmark that pumps out a steady stream of > writes won''t see this benefit. > > Drew Wilson > > > > Asif Iqbal wrote: > On Nov 20, 2007 7:01 AM, Chad Mynhier <cmynhier at gmail.com> wrote: > > > On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote: > > > On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> wrote: > > > On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > (Including storage-discuss) > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > ST3300007FC (300GB - 10000 RPM FC-AL) > > Those disks are 2Gb disks, so the tray will operate at 2Gb. > > > That is still 256MB/s . I am getting about 194MB/s > > 2Gb fibre channel is going to max out at a data transmission rate > > But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of > 300GB FC 10K rpm (2Gb/s) disks > > So I should get "a lot" more than ~ 200MB/s. Shouldn''t I? > > > > > around 200MB/s rather than the 256MB/s that you''d expect. Fibre > channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data > in 10 bits on the wire. So while 256MB/s is being transmitted on the > connection itself, only 200MB/s of that is the data that you''re > transmitting. > > Chad Mynhier > > > > > > >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
Asif Iqbal wrote:> On Nov 19, 2007 11:47 PM, Richard Elling <Richard.Elling at sun.com> wrote: > >> Asif Iqbal wrote: >> >>> I have the following layout >>> >>> A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using >>> A1 anfd B1 controller port 4Gbps speed. >>> Each controller has 2G NVRAM >>> >>> On 6140s I setup raid0 lun per SAS disks with 16K segment size. >>> >>> On 490 I created a zpool with 8 4+1 raidz1s >>> >>> I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in >>> /etc/system >>> >>> Is there a way I can improve the performance. I like to get 1GB/sec IO. >>> >>> >> I don''t believe a V490 is capable of driving 1 GByte/s of I/O. >> > > Well I am getting ~190MB/s right now. I sure not hitting any where close > to that ceiling > > >> The V490 has two schizos and the schizo is not a full speed >> bridge. For more information see Section 1.2 of: >> http://www.sun.com/processors/manuals/External_Schizo_PRM.pdf >>[err - see Section 1.3] You will notice from Table 1-1, the read bandwidth limit for a schizo PCI leaf is 204 MBytes/s. With two schizos, you can expect to max out at 816 MBytes/s or less, depending on resource contention. It makes no difference that a 4 Gbps FC card could read 400 MBytes/s, the best you can do for the card is 204 MBytes/s. 1 GBytes/s of read throughput will not be attainable with a V490. -- richard
Asif Iqbal
2007-Nov-21 16:20 UTC
[zfs-discuss] [perf-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 21, 2007 10:37 AM, Lion, Oren-P64304 <Oren.Lion at gdc4s.com> wrote:> > I recently tweaked Oracle (8K blocks, log_buffer gt 2M) on a SolarisOracle here is setup as 16K and 2G log buffer. I am using a testpool with raid0 of 6 10K RPM FC disks (2 from each of 3 trays). I played with 16K and 32K segment size only. I will try other sizes and post here the performance. Thanks for sharing the dtrace result as well. Excellent data!> AMD64 system for max performance on a Sun 6140 with one tray of 73 GB > 15K RPM drives. Definitely needed to place the datafiles and redo logs > on isolated RAID groups. Wasn''t sure how many blocks Oracle batches for > IO. Used dtrace''s bitesize script to generate the distributions below. > Based on the dtrace output, and after testing multiple segment sizes, > finally settled on Segment Size (stripe size) 256K for both datafiles > and redo logs. > > Also observed performance boost by using forcedirectio and noatime on > the 6140 mount points and observed smoother performance by using 2M > pagesize (MPSS) by adding the line below to Oracle''s .profile (and > verified with pmap -s [ORACLE PID]|grep 2M). > > Oracle MPSS .profile > LD_PRELOAD=$LD_PRELOAD:mpss.so.1 > MPSSHEAP=2M > MPSSSTACK=2M > export LD_PRELOAD MPSSHEAP MPSSSTACK > MPSSERRFILE=~/mpsserr > export MPSSERRFILE > > Here''s the final 6140 config: > Oracle datafiles => 12 drives RAID 10 Sement Size 256 > Oracle redo log A => 2 drives RAID 0 Sement Size 256 > Oracle redo log B => 2 drives RAID 0 Sement Size 256 > > ./bitesize.d > 1452 ora_dbw2_prf02\0 > > value ------------- Distribution ------------- count > 16384 | 0 > 32768 |@@@@@@@@@@@@@@@@@@@@ 1 > 65536 | 0 > 131072 |@@@@@@@@@@@@@@@@@@@@ 1 > 262144 | 0 > > 1454 ora_dbw3_prf02\0 > > value ------------- Distribution ------------- count > 4096 | 0 > 8192 |@@@@@@@@@@@@@@@@@@@@@@@ 4 > 16384 |@@@@@@ 1 > 32768 |@@@@@@ 1 > 65536 | 0 > 131072 |@@@@@@ 1 > 262144 | 0 > > 1448 ora_dbw0_prf02\0 > > value ------------- Distribution ------------- count > 4096 | 0 > 8192 |@@@@@@@@@@@@@@@@@@@@@@ 5 > 16384 |@@@@@@@@@@@@@ 3 > 32768 | 0 > 65536 | 0 > 131072 |@@@@ 1 > 262144 | 0 > > 1450 ora_dbw1_prf02\0 > > value ------------- Distribution ------------- count > 65536 | 0 > 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 > 262144 | 0 > > 1458 ora_ckpt_prf02\0 > > value ------------- Distribution ------------- count > 8192 | 0 > 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 43 > 32768 | 0 > > 1456 ora_lgwr_prf02\0 > > value ------------- Distribution ------------- count > 256 | 0 > 512 |@@@@@@@@ 24 > 1024 |@@@@ 12 > 2048 |@@@@@ 15 > 4096 |@@@@@ 14 > 8192 | 0 > 16384 | 1 > 32768 |@ 4 > 65536 | 0 > 131072 |@ 4 > 262144 |@@ 6 > 524288 |@@@@@@@@@@@@@@ 42 > 1048576 | 0 > > > This email message is for the sole use of the intended recipient(s) and > may contain GDC4S confidential or privileged information. Any > unauthorized review, use, disclosure or distribution is prohibited. If > you are not an intended recipient, please contact the sender by reply > email and destroy all copies of the original message. > > -----Original Message----- > From: perf-discuss-bounces at opensolaris.org > [mailto:perf-discuss-bounces at opensolaris.org] On Behalf Of Asif Iqbal > Sent: Tuesday, November 20, 2007 3:08 PM > To: Andrew.W.Wilson at sun.com > Cc: zfs-discuss at opensolaris.org; perf-discuss at opensolaris.org; > storage-discuss at opensolaris.org > Subject: Re: [perf-discuss] [storage-discuss] zpool io to 6140 is really > slow > > > On Nov 20, 2007 10:40 AM, Andrew Wilson <Andrew.W.Wilson at sun.com> wrote: > > > > What kind of workload are you running. If you are you doing these > > measurements with some sort of "write as fast as possible" > > microbenchmark, > > Oracle database with blocksize 16K .. populating the database as fast I > can > > > once the 4 GB of nvram is full, you will be limited by backend > > performance (FC disks and their interconnect) rather than the host / > controller bus. > > > > Since, best case, 4 gbit FC can transfer 4 GBytes of data in about 10 > > > seconds, you will fill it up, even with the backend writing out data > > as fast as it can, in about 20 seconds. Once the nvram is full, you > > will only see the backend (e.g. 2 Gbit) rate. > > > > The reason these controller buffers are useful with real applications > > > is that they smooth the bursts of writes that real applications tend > > to generate, thus reducing the latency of those writes and improving > > performance. They will then "catch up" during periods when few writes > > are being issued. But a typical microbenchmark that pumps out a steady > > > stream of writes won''t see this benefit. > > > > Drew Wilson > > > > > > > > Asif Iqbal wrote: > > On Nov 20, 2007 7:01 AM, Chad Mynhier <cmynhier at gmail.com> wrote: > > > > > > On 11/20/07, Asif Iqbal <vadud3 at gmail.com> wrote: > > > > > > On Nov 19, 2007 1:43 AM, Louwtjie Burger <zabermeister at gmail.com> > wrote: > > > > > > On Nov 17, 2007 9:40 PM, Asif Iqbal <vadud3 at gmail.com> wrote: > > > > > > (Including storage-discuss) > > > > I have 6 6140s with 96 disks. Out of which 64 of them are Seagate > > ST3300007FC (300GB - 10000 RPM FC-AL) > > > > Those disks are 2Gb disks, so the tray will operate at 2Gb. > > > > > > That is still 256MB/s . I am getting about 194MB/s > > > > 2Gb fibre channel is going to max out at a data transmission rate > > > > But I am running 4GB fiber channels with 4GB NVRAM on a 6 tray of > > 300GB FC 10K rpm (2Gb/s) disks > > > > So I should get "a lot" more than ~ 200MB/s. Shouldn''t I? > > > > > > > > > > around 200MB/s rather than the 256MB/s that you''d expect. Fibre > > channel uses an 8-bit/10-bit encoding, so it transmits 8-bits of data > > in 10 bits on the wire. So while 256MB/s is being transmitted on the > > connection itself, only 200MB/s of that is the data that you''re > > transmitting. > > > > Chad Mynhier > > > > > > > > > > > > > > > > > > -- > Asif Iqbal > PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu > > _______________________________________________ > perf-discuss mailing list > perf-discuss at opensolaris.org >-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu