Jeff Bonwick
2006-Aug-17 06:38 UTC
[zfs-discuss] Re: system unresponsive after issuing a zpool attach
> And it started replacement/resilvering... after few minutes system becameunavailbale. Reboot only gives me a few minutes, then resilvering make system unresponsible.> > Is there any workaroud or patch for this problem???Argh, sorry -- the problem is that we don''t do aggressive enough scrub/resilver throttling. The effect is most pronounced on 32-bit or low-memory systems. We''re working on it. One thing you might try is reducing txg_time to 1 second (the default is 5 seconds) by saying this: "echo txg_time/W1 | mdb -kw". Let me describe what''s happening, and why this may help. When we kick off a scrub (same code path as resilver, so I''ll use the term generically), we traverse the entire block tree looking for blocks that need scrubbing. The tree traversal itself is single-threaded, but the work it generates is not -- each time we find a block that needs scrubbing, we schedule an async I/O to do it. As you''ve discovered, we can generate work faster than the I/O subsystem can process it. To avoid overloading the disks, we throttle I/O downstream, but we don''t (yet) have an upstream throttle. If we discover blocks really fast, we can end up scheduling lots of I/O -- and sitting on lots of memory -- before the downstream throttle kicks in. The reason this relates to txg_time is that every time we sync a transaction group, we suspend the scrub thread and wait for all pending scrub I/Os to complete. This ensures that we won''t asynchronously scrub a block that was freed and reallocated in a future txg; when coupled with the COW nature of ZFS, this allows us to run scrubs entirely independent of all filesystem-level structure (e.g. directories) and locking rules. This little trick makes the scrubbing algorithms *much* simpler. The key point is that each spa_sync() throttles the scrub to zero. By lowering txg_time from 5 to 1, you''re cutting down the maximum number of pending scrub I/Os by roughly 5x. The unresponsiveness you''re seeing is a threshold effect; I''m hoping that by running spa_sync() more often, we can get you below that threshold. Please let me know if this works for you. Jeff