I was playing with a Gigabyte i-RAM card and found out it works great to improve overall performance when there are a lot of writes of small files over NFS to such a ZFS pool. However, I noted a frequent situation in periods of long writes over NFS of small files. Here''s a snippet of iostat during that period. sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB) extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 During this time no operations can occur. I''ve attached the iRAM disk via a 3124 card. I''ve never seen a svc_t time of 0, and full wait and busy disk. Any clue what this might mean?
I would expect such iostat output from a device which can handle only a single queued I/O to the device (eg. IDE driver) and an I/O is stuck. There are 3 more I/Os in the wait queue waiting for the active I/O to complete. The %w and %b are measured as the percent of time during which an I/O was in queue. The svc_t is 0 because the I/O is not finished. By default, most of the drivers will retry I/Os which don''t seem to finish, but the retry interval is often on the order of 60 seconds. If a retry succeeds, then no message is logged to syslog, so you might not see any messages. But just to be sure, what does fmdump (and fmdump -e) say about the system? Are messages logged in /var/adm/messages? -- richard Joe Little wrote:> I was playing with a Gigabyte i-RAM card and found out it works great > to improve overall performance when there are a lot of writes of small > files over NFS to such a ZFS pool. > > However, I noted a frequent situation in periods of long writes over > NFS of small files. Here''s a snippet of iostat during that period. > sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB) > > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > extended device statistics > device r/s w/s kr/s kw/s wait actv svc_t %w %b > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > During this time no operations can occur. I''ve attached the iRAM disk > via a 3124 card. I''ve never seen a svc_t time of 0, and full wait and > busy disk. Any clue what this might mean? > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Nov 26, 2007 7:00 PM, Richard Elling <Richard.Elling at sun.com> wrote:> I would expect such iostat output from a device which can handle > only a single queued I/O to the device (eg. IDE driver) and an I/O > is stuck. There are 3 more I/Os in the wait queue waiting for the > active I/O to complete. The %w and %b are measured as the percent > of time during which an I/O was in queue. The svc_t is 0 because > the I/O is not finished. > > By default, most of the drivers will retry I/Os which don''t seem to > finish, but the retry interval is often on the order of 60 seconds. > If a retry succeeds, then no message is logged to syslog, so you > might not see any messages. But just to be sure, what does > fmdump (and fmdump -e) say about the system? Are messages > logged in /var/adm/messages?nothing with fmdump or /var/adm/messages. Your answer explains why its 60 seconds or so. What''s sad is that this is a ramdisk so to speak, albeit connected via SATA-I to the sil3124. Any way to isolate this further? Anyway to limit i/o timeouts to a drive? this is just two sticks of ram.. ms would be fine :)> -- richard > > > Joe Little wrote: > > I was playing with a Gigabyte i-RAM card and found out it works great > > to improve overall performance when there are a lot of writes of small > > files over NFS to such a ZFS pool. > > > > However, I noted a frequent situation in periods of long writes over > > NFS of small files. Here''s a snippet of iostat during that period. > > sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB) > > > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > extended device statistics > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > > sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > > sd17 0.0 0.0 0.0 0.0 3.0 1.0 0.0 100 100 > > > > During this time no operations can occur. I''ve attached the iRAM disk > > via a 3124 card. I''ve never seen a svc_t time of 0, and full wait and > > busy disk. Any clue what this might mean? > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > >
On Nov 26, 2007 8:41 PM, Joe Little <jmlittle at gmail.com> wrote:> I was playing with a Gigabyte i-RAM card and found out it works great > to improve overall performance when there are a lot of writes of small > files over NFS to such a ZFS pool. > > However, I noted a frequent situation in periods of long writes over > NFS of small files. Here''s a snippet of iostat during that period. > sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB) > > [iostat output] > > During this time no operations can occur. I''ve attached the iRAM disk > via a 3124 card. I''ve never seen a svc_t time of 0, and full wait and > busy disk. Any clue what this might mean?This sounds like 6566207: si3124 driver loses interrupts. I have observed similar behavior as a result of this bug. Upgrading to build 71 or later should fix things. Chris
Joe Little wrote:> On Nov 26, 2007 7:00 PM, Richard Elling <Richard.Elling at sun.com> wrote: > >> I would expect such iostat output from a device which can handle >> only a single queued I/O to the device (eg. IDE driver) and an I/O >> is stuck. There are 3 more I/Os in the wait queue waiting for the >> active I/O to complete. The %w and %b are measured as the percent >> of time during which an I/O was in queue. The svc_t is 0 because >> the I/O is not finished. >> >> By default, most of the drivers will retry I/Os which don''t seem to >> finish, but the retry interval is often on the order of 60 seconds. >> If a retry succeeds, then no message is logged to syslog, so you >> might not see any messages. But just to be sure, what does >> fmdump (and fmdump -e) say about the system? Are messages >> logged in /var/adm/messages? >> > > nothing with fmdump or /var/adm/messages. Your answer explains why its > 60 seconds or so. What''s sad is that this is a ramdisk so to speak, > albeit connected via SATA-I to the sil3124. Any way to isolate this > further? Anyway to limit i/o timeouts to a drive? this is just two > sticks of ram.. ms would be fine :) >I suspect a bug in the driver or firmware. It might be difficult to identify if it is in the firmware. A pretty good white paper on storage stack timeout tuning is available at BigAdmin: http://www.sun.com/bigadmin/features/hub_articles/tuning_sfs.jsp But it won''t directly apply to your case because you aren''t using the ssd driver. I''d wager the cmdk driver is being used for your case and I''m not familiar with its internals. prtconf -D will show which driver(s) are in use. -- richard
On Nov 26, 2007 7:57 PM, Richard Elling <Richard.Elling at sun.com> wrote:> Joe Little wrote: > > On Nov 26, 2007 7:00 PM, Richard Elling <Richard.Elling at sun.com> wrote: > > > >> I would expect such iostat output from a device which can handle > >> only a single queued I/O to the device (eg. IDE driver) and an I/O > >> is stuck. There are 3 more I/Os in the wait queue waiting for the > >> active I/O to complete. The %w and %b are measured as the percent > >> of time during which an I/O was in queue. The svc_t is 0 because > >> the I/O is not finished. > >> > >> By default, most of the drivers will retry I/Os which don''t seem to > >> finish, but the retry interval is often on the order of 60 seconds. > >> If a retry succeeds, then no message is logged to syslog, so you > >> might not see any messages. But just to be sure, what does > >> fmdump (and fmdump -e) say about the system? Are messages > >> logged in /var/adm/messages? > >> > > > > nothing with fmdump or /var/adm/messages. Your answer explains why its > > 60 seconds or so. What''s sad is that this is a ramdisk so to speak, > > albeit connected via SATA-I to the sil3124. Any way to isolate this > > further? Anyway to limit i/o timeouts to a drive? this is just two > > sticks of ram.. ms would be fine :) > > > > I suspect a bug in the driver or firmware. It might be difficult to > identify > if it is in the firmware. > > A pretty good white paper on storage stack timeout tuning is available > at BigAdmin: > http://www.sun.com/bigadmin/features/hub_articles/tuning_sfs.jsp > But it won''t directly apply to your case because you aren''t using the > ssd driver. I''d wager the cmdk driver is being used for your case > and I''m not familiar with its internals. prtconf -D will show which > driver(s) are in use. > -- richard >The previous message listed the sil bug. Its the sil driver, and thus sd* (as seen from the iostat)>