Hi all- We''ve been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don''t even know where to start... any advices? Ian -- This message posted from opensolaris.org
To add to that... iostat on the client boxes show the connection to always be around 98% util and tops at 100% whenever it hangs. The same clients are connected to another ZFS server with much lower specs and a smaller number of slower disks, it performs much better and rarely get past 5% util. They share the same network. -- This message posted from opensolaris.org
Ian, Did you enable DeDup? Rocky -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Ian D Sent: Tuesday, July 26, 2011 7:52 AM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] Entire client hangs every few seconds Hi all- We''ve been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don''t even know where to start... any advices? Ian -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
No dedup. The hiccups started around 2am on Sunday while (obviously) nobody was interacting with neither the clients or the server. It''s been running for months (as is) without any problem. My guess is that it''s a defective hard drive that instead of totally failing, just stutters. Or maybe it''s the cache. We disabled the SLOG with no effect, but we haven''t tried with the L2ARC. -- This message posted from opensolaris.org
Garrett D''Amore
2011-Jul-26 19:27 UTC
[zfs-discuss] Entire client hangs every few seconds
This is actually a recently known problem, and a fix for it is in the 3.1 version, which should be available any minute now, if it isn''t already available. The problem has to do with some allocations which are sleeping, and jobs in the ZFS subsystem get backed behind some other work. If you have adequate system memory, you are less likely to see this problem, I think. - Garrett On Tue, 2011-07-26 at 08:29 -0700, Rocky Shek wrote:> Ian, > > Did you enable DeDup? > > Rocky > > > -----Original Message----- > From: zfs-discuss-bounces at opensolaris.org > [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Ian D > Sent: Tuesday, July 26, 2011 7:52 AM > To: zfs-discuss at opensolaris.org > Subject: [zfs-discuss] Entire client hangs every few seconds > > Hi all- > We''ve been experiencing a very strange problem for two days now. > > We have three client (Linux boxes) connected to a ZFS box (Nexenta) via > iSCSI. Every few seconds (seems random), iostats shows the clients go from > an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are > fine again. When that happens, I/Os on the local disks stops too, even the > totally unrelated ones. How can that be? All three clients show the same > pattern and everything was fine prior to Sunday. Nothing has changed on > neither the clients or the server. The ZFS box is not even close to be > saturated, nor the network. > > We don''t even know where to start... any advices? > Ian
Hi Garrett- It is something that could happen at any time on a system that has been working fine for a while? That system has 256G of RAM, I think "adequate" is not a concern here :) We''ll try 3.1 as soon as we can download it. Ian -- This message posted from opensolaris.org
Are the "disk active" lights typically ON when this happens? On Tue, Jul 26, 2011 at 3:27 PM, Garrett D''Amore <garrett at damore.org> wrote:> This is actually a recently known problem, and a fix for it is in the > 3.1 version, which should be available any minute now, if it isn''t > already available. > > The problem has to do with some allocations which are sleeping, and jobs > in the ZFS subsystem get backed behind some other work. > > If you have adequate system memory, you are less likely to see this > problem, I think. > > ? ? ? ? - Garrett > > > On Tue, 2011-07-26 at 08:29 -0700, Rocky Shek wrote: >> Ian, >> >> Did you enable DeDup? >> >> Rocky >> >> >> -----Original Message----- >> From: zfs-discuss-bounces at opensolaris.org >> [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Ian D >> Sent: Tuesday, July 26, 2011 7:52 AM >> To: zfs-discuss at opensolaris.org >> Subject: [zfs-discuss] Entire client hangs every few seconds >> >> Hi all- >> We''ve been experiencing a very strange problem for two days now. >> >> We have three client (Linux boxes) connected to a ZFS box (Nexenta) via >> iSCSI. ?Every few seconds (seems random), iostats shows the clients go from >> an normal 80K+ IOPS to zero. ?It lasts up to a few seconds and things are >> fine again. ?When that happens, I/Os on the local disks stops too, even the >> totally unrelated ones. How can that be? ?All three clients show the same >> pattern and everything was fine prior to Sunday. ?Nothing has changed on >> neither the clients or the server. The ZFS box is not even close to be >> saturated, nor the network. >> >> We don''t even know where to start... any advices? >> Ian > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >