hi all, currently having trouble with sustained write performance with my setup... ms server 2003/ms iscsi initiator 2.08 w/intel e1000g nic directly connected to snv_101 w/ intel e1000g nic. basically, given enough time, the sustained write behavior is perfectly periodic. if i copy a large file to the iscsi target, iostat reports 10 seconds or so of -no- writes to disk, just small reads... then 2-3 seconds of disk-maxed writes, during which time windows reports the write performance dropping to zero (disk queues maxed). so iostat will report something like this for each of my zpool disks (with iostat -xtc 1) 1s: %b 0 2s: %b 0 3s: %b 0 4s: %b 0 5s: %b 0 6s: %b 0 7s: %b 0 8s: %b 0 9s: %b 0 10s: %b 0 11s: %b 100 12s: %b 100 13s: %b 100 14s: %b 0 15s: %b 0 it looks like solaris hangs out caching the writes and not actually committing them to disk... when the cache gets flushed, the iscsitgt (or whatever) just stops accepting writes. this is happening across controllers and zpools. also, a test copy of a 10gb file from one zpool to another (not iscsi) yielded similar iostat results: 10 seconds of big reads from the source zpool, 2-3 seconds of big writes to the target zpool (target zpool is 5x bigger than source zpool). anyone got any ideas? point me in the right direction? thanks, milosz -- This message posted from opensolaris.org
Rob at Logan.com
2008-Dec-09 01:19 UTC
[zfs-discuss] zfs & iscsi sustained write performance
> (with iostat -xtc 1)it sure would be nice to know if actv > 0 so we would know if the lun was busy because its queue is full or just slow (svc_t > 200) for tracking errors try `iostat -xcen 1` and `iostat -E` Rob
On Mon, Dec 8, 2008 at 3:09 PM, milosz <mewash at gmail.com> wrote:> hi all, > > currently having trouble with sustained write performance with my setup... > > ms server 2003/ms iscsi initiator 2.08 w/intel e1000g nic directly connected to snv_101 w/ intel e1000g nic. > > basically, given enough time, the sustained write behavior is perfectly periodic. if i copy a large file to the iscsi target, iostat reports 10 seconds or so of -no- writes to disk, just small reads... then 2-3 seconds of disk-maxed writes, during which time windows reports the write performance dropping to zero (disk queues maxed). > > so iostat will report something like this for each of my zpool disks (with iostat -xtc 1) > > 1s: %b 0 > 2s: %b 0 > 3s: %b 0 > 4s: %b 0 > 5s: %b 0 > 6s: %b 0 > 7s: %b 0 > 8s: %b 0 > 9s: %b 0 > 10s: %b 0 > 11s: %b 100 > 12s: %b 100 > 13s: %b 100 > 14s: %b 0 > 15s: %b 0 > > it looks like solaris hangs out caching the writes and not actually committing them to disk... when the cache gets flushed, the iscsitgt (or whatever) just stops accepting writes. > > this is happening across controllers and zpools. also, a test copy of a 10gb file from one zpool to another (not iscsi) yielded similar iostat results: 10 seconds of big reads from the source zpool, 2-3 seconds of big writes to the target zpool (target zpool is 5x bigger than source zpool). > > anyone got any ideas? point me in the right direction? > > thanks, > > milosz > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Are you running at compression? I see this behavior with heavy loads, and GZIP compression enabled. What does ''zfs get compression'' say? -- Brent Jones brent at servuhome.net
compression is off across the board. svc_t is only maxed during the periods of heavy write activity (2-3 seconds every 10 or so seconds)... otherwise disks are basically idling. -- This message posted from opensolaris.org
Bob Friesenhahn
2008-Dec-09 02:37 UTC
[zfs-discuss] zfs & iscsi sustained write performance
On Mon, 8 Dec 2008, milosz wrote:> compression is off across the board. > > svc_t is only maxed during the periods of heavy write activity (2-3 > seconds every 10 or so seconds)... otherwise disks are basically > idling.Check for some hardware anomaly which might impact disks 11, 12, and 13 but not the other disks. For example, perhaps they share a cable, share the same controller, or or there is some other common point which is slow or producing recoverable errors. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
my apologies... 11s, 12s, and 13s represent the number of seconds in a read/write period, not disks. so, 11 seconds into a period, %b suddenly jumps to 100 after having been 0 for the first 10. -- This message posted from opensolaris.org
Roch Bourbonnais
2009-Jan-03 15:24 UTC
[zfs-discuss] zfs & iscsi sustained write performance
Le 9 d?c. 08 ? 03:16, Brent Jones a ?crit :> On Mon, Dec 8, 2008 at 3:09 PM, milosz <mewash at gmail.com> wrote: >> hi all, >> >> currently having trouble with sustained write performance with my >> setup... >> >> ms server 2003/ms iscsi initiator 2.08 w/intel e1000g nic directly >> connected to snv_101 w/ intel e1000g nic. >> >> basically, given enough time, the sustained write behavior is >> perfectly periodic. if i copy a large file to the iscsi target, >> iostat reports 10 seconds or so of -no- writes to disk, just small >> reads... then 2-3 seconds of disk-maxed writes, during which time >> windows reports the write performance dropping to zero (disk queues >> maxed). >>This looks consistent with being limited by the network factors. Disks are idling while the next ZFS transaction group is being formed. What is less clear is why windows write performance drops to zero. One possible explanation is that during, the write bursts the small reads are being starved preventing progress on the Initiator side. -r>> so iostat will report something like this for each of my zpool >> disks (with iostat -xtc 1) >> >> 1s: %b 0 >> 2s: %b 0 >> 3s: %b 0 >> 4s: %b 0 >> 5s: %b 0 >> 6s: %b 0 >> 7s: %b 0 >> 8s: %b 0 >> 9s: %b 0 >> 10s: %b 0 >> 11s: %b 100 >> 12s: %b 100 >> 13s: %b 100 >> 14s: %b 0 >> 15s: %b 0 >> >> it looks like solaris hangs out caching the writes and not actually >> committing them to disk... when the cache gets flushed, the >> iscsitgt (or whatever) just stops accepting writes. >> >> this is happening across controllers and zpools. also, a test copy >> of a 10gb file from one zpool to another (not iscsi) yielded >> similar iostat results: 10 seconds of big reads from the source >> zpool, 2-3 seconds of big writes to the target zpool (target zpool >> is 5x bigger than source zpool). >> >> anyone got any ideas? point me in the right direction? >> >> thanks, >> >> milosz >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > Are you running at compression? I see this behavior with heavy loads, > and GZIP compression enabled. > What does ''zfs get compression'' say? > > -- > Brent Jones > brent at servuhome.net > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> What is less clear is why windows write performance drops to zero.Perhaps the tweak for Nagel''s Algorithm in Windows would be in order? http://blogs.sun.com/constantin/entry/x4500_solaris_zfs_iscsi_perfect -- This message posted from opensolaris.org
thanks for your responses, guys... the nagle''s tweak is the first thing i did, actually. not sure what the network limiting factors could be here... there''s no switch, jumbo frames are on... maybe it''s the e1000g driver? it''s been wonky since 94 or so. even during the write bursts i''m only getting 60% of gigabit on average. -- This message posted from opensolaris.org
Roch Bourbonnais
2009-Jan-12 14:09 UTC
[zfs-discuss] zfs & iscsi sustained write performance
Le 4 janv. 09 ? 21:09, milosz a ?crit :> thanks for your responses, guys... > > the nagle''s tweak is the first thing i did, actually. > > not sure what the network limiting factors could be here... there''s > no switch, jumbo frames are on... maybe it''s the e1000g driver? > it''s been wonky since 94 or so. even during the write bursts i''m > only getting 60% of gigabit on average.How about tcp window size (particularly tcp_recv_hiwat on the recv side) and whether or not some CPU is saturated (particularly the interrupt cpu on recv side, check with mpstat 1). There is also some magic incantation to allow bigger transfer size in iscsi (blaise should have the details). Can you verify the single connection throughput using either of iperf,uperf,netperf. -r> > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch Bourbonnais wrote:> Le 4 janv. 09 ? 21:09, milosz a ?crit : > >> thanks for your responses, guys... >> >> the nagle''s tweak is the first thing i did, actually. >> >> not sure what the network limiting factors could be here... there''s >> no switch, jumbo frames are on... maybe it''s the e1000g driver? >> it''s been wonky since 94 or so. even during the write bursts i''m >> only getting 60% of gigabit on average. > > How about tcp window size (particularly tcp_recv_hiwat on the recv > side) and whether or not some CPU is saturated (particularly the > interrupt cpu on recv side, check with mpstat 1). > There is also some magic incantation to allow bigger transfer size in > iscsi (blaise should have the details).For Solaris, the value can be set on either the iSCSI Target or Initiator, replacing 65536 (64K), with a value of one''s choosing. [ target ] iscsitadm modify target --maxrecv 65536 <target-IQN> [ initiator] iscsiadm modify target-param -p maxrecvdataseglen=65536 <target-IQN> Jim> > Can you verify the single connection throughput using either of > iperf,uperf,netperf. > > -r > > >> >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090112/8815ee8a/attachment.html>
iperf test coming out fine, actually... iperf -s -w 64k iperf -c -w 64k -t 900 -i 5 [ ID] Interval Transfer Bandwidth [ 5] 0.0-899.9 sec 81.1 GBytes 774 Mbits/sec totally steady. i could probably implement some tweaks to improve it, but if i were getting a steady 77% of gigabit i''d be very happy. not seeing any cpu saturation with mpstat... nothing unusual other than low activity while zfs commits writes to disk (ostensibly this is when the transfer rate troughs)... -- This message posted from opensolaris.org
milosz writes: > iperf test coming out fine, actually... > > iperf -s -w 64k > > iperf -c -w 64k -t 900 -i 5 > > [ ID] Interval Transfer Bandwidth > [ 5] 0.0-899.9 sec 81.1 GBytes 774 Mbits/sec > > totally steady. i could probably implement some tweaks to improve it, but if i were getting a steady 77% of gigabit i''d be very happy. > So you''re trying to get from 60% to 77%. IIRC you had some small amount of reads going on. If you can find out where those come from and eliminate them that could help. Did we cover maxrecvdataseglen also ? I''ve seen this help throughput using solaris initiator : iscsiadm list target | grep ^Target | awk ''{print $2}'' | while read x ; do iscsiadm modify target-param -p maxrecvdataseglen=65536 $x done -r > not seeing any cpu saturation with mpstat... nothing unusual other than low activity while zfs commits writes to disk (ostensibly this is when the transfer rate troughs)... > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
sorry, that 60% statement was misleading... i will VERY OCCASIONALLY get a spike to 60%, but i''m averaging more like 15%, with the throughput often dropping to zero for several seconds at a time. that iperf test more or less demonstrates it isn''t a network problem, no? also i have been using microsoft iscsi initiator... i will try doing a solaris-solaris test later. -- This message posted from opensolaris.org