Hi, I have a 3-node(SunFire V890) VCS cluster running Solaris 10 u4 with LUNs from some Sun 6130,6140 and IBM 8100 arrays. It has been working well. But one of the nodes started to have troubles in running ZFS commands this Tue, 2/19. Any ZFS command, e.g., ''zpool import'' can take hours to complete. Sometimes it took 4-5 minutes, and run it again, it can take 60 minutes. On the other 2 nodes that share the same set of LUNs are still normal so far - take some 5-10 seconds or less for the same commands. I haven''t noticed any error messages from the arrays or SAN switches and other than the HBAs and switch ports, they are virtually identical. (other commands like cfgadm, format,... seems normal, so I suspect the culprit might be related to ZFS. I open a case with Sun, this route seems take forever for this kind of issue and I haven''t got any answer yet.) The host is not down or crashed. I rebooted it once today, not sure if it''s fixed by reboot, ''zpool import'' can still take minutes rather than seconds to complete). I still need to create some test LUNs and pools for more tests. It seems everything is still normal except the ZFS. Most zfs commands also cause cpu loads well up till completed, as seen in vmstast,mpstat, or top. This has been causing us troubles as our home grown VCS ZFS agent would consider the zpool is dead after some consecutive failures in probing the pool (zpool status takes forever to complete). Does anyone has same problem or know what might be the cause/fix? Thanks. Max Holm This message posted from opensolaris.org
Prabahar Jeyaram
2008-Feb-21 22:32 UTC
[zfs-discuss] ZFS commands sudden slow down, cpu spiked
Hi Max, You might be hitting the BUG 6513209 (Contributer to the ''zpool import'' delay). There is going to be an official patch soon. Currently it is in T-Patch state. You should be able to get the T-Patch through your support channel. -- Prabahar. Max Holm wrote:> Hi, > > I have a 3-node(SunFire V890) VCS cluster running Solaris 10 u4 > with LUNs from some Sun 6130,6140 and IBM 8100 arrays. It has been > working well. But one of the nodes started to have troubles > in running ZFS commands this Tue, 2/19. Any ZFS command, e.g., > ''zpool import'' can take hours to complete. Sometimes it took 4-5 > minutes, and run it again, it can take 60 minutes. On the other 2 > nodes that share the same set of LUNs are still normal so far - > take some 5-10 seconds or less for the same commands. > I haven''t noticed any error messages from the arrays or SAN switches > and other than the HBAs and switch ports, they are virtually identical. > (other commands like cfgadm, format,... seems normal, so I suspect > the culprit might be related to ZFS. I open a case with Sun, this route > seems take forever for this kind of issue and I haven''t got any answer yet.) > > The host is not down or crashed. I rebooted it once today, not sure if > it''s fixed by reboot, ''zpool import'' can still take minutes rather than > seconds to complete). I still need to create some test LUNs and pools > for more tests. It seems everything is still normal except the ZFS. > Most zfs commands also cause cpu loads well up till completed, > as seen in vmstast,mpstat, or top. This has been causing us troubles > as our home grown VCS ZFS agent would consider the zpool is dead > after some consecutive failures in probing the pool (zpool status > takes forever to complete). > > Does anyone has same problem or know what might be the cause/fix? > Thanks. > > Max Holm > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
thanks. but when this happens, any ZFS command will take forever to complete not just ''zpool import'' - but it might have been triggered by the import action.(?) By the way, the symptoms were first noticed not long after adding LUNs from a IBM DS8100 array. But no error messages or complains about the LUNs on the OS or the array. And now, I suspect the symptoms are showing up on 2nd node of this 3-node cluster. max This message posted from opensolaris.org
Thomas R. Stevenson
2008-Feb-25 19:55 UTC
[zfs-discuss] ZFS commands sudden slow down, cpu spiked
We are having the same problem. We were told that patch 127728-06 should fix our problem once it is released. Is this the same T-patch you are talking about? On Thu, Feb 21, 2008 at 5:32 PM, Prabahar Jeyaram <Prabahar.Jeyaram at sun.com> wrote:> Hi Max, > > You might be hitting the BUG 6513209 (Contributer to the ''zpool import'' > delay). There is going to be an official patch soon. Currently it is in > T-Patch state. > > You should be able to get the T-Patch through your support channel. > > -- > Prabahar. > > > > Max Holm wrote: > > Hi, > > > > I have a 3-node(SunFire V890) VCS cluster running Solaris 10 u4 > > with LUNs from some Sun 6130,6140 and IBM 8100 arrays. It has been > > working well. But one of the nodes started to have troubles > > in running ZFS commands this Tue, 2/19. Any ZFS command, e.g., > > ''zpool import'' can take hours to complete. Sometimes it took 4-5 > > minutes, and run it again, it can take 60 minutes. On the other 2 > > nodes that share the same set of LUNs are still normal so far - > > take some 5-10 seconds or less for the same commands. > > I haven''t noticed any error messages from the arrays or SAN switches > > and other than the HBAs and switch ports, they are virtually identical. > > (other commands like cfgadm, format,... seems normal, so I suspect > > the culprit might be related to ZFS. I open a case with Sun, this route > > seems take forever for this kind of issue and I haven''t got any answer > yet.) > > > > The host is not down or crashed. I rebooted it once today, not sure if > > it''s fixed by reboot, ''zpool import'' can still take minutes rather than > > seconds to complete). I still need to create some test LUNs and pools > > for more tests. It seems everything is still normal except the ZFS. > > Most zfs commands also cause cpu loads well up till completed, > > as seen in vmstast,mpstat, or top. This has been causing us troubles > > as our home grown VCS ZFS agent would consider the zpool is dead > > after some consecutive failures in probing the pool (zpool status > > takes forever to complete). > > > > Does anyone has same problem or know what might be the cause/fix? > > Thanks. > > > > Max Holm > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Lead System Software Engineer WSU/C&IT/CS/ECS Tom''s info: http://tom.cc.wayne.edu/wiki/User:Tom http://ThomasRStevenson.blogspot.com http://www.linkedin.com/in/thomasrstevenson "A common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools." Douglas Adams / Mostly Harmless -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080225/b3ac04b6/attachment.html>