Gareth Bult
2008-Jan-09  18:20 UTC
[Xen-users] XEN server stalling .. problem spotted - solution required
Ok, I''ve been chasing this for many days .. I have a server running 10 instances that periodically freezes .. then sometimes "comes back." I tried many things to try to spot the problem and finally found it by accident. It''s a little frustrating as typically the Dom0 and One (or two) instances "go" and the rest carry on .. and there is diddley squat when it comes to logging information or error messages. I''m now using ''watch "cat /proc/meminfo"'' in the Dom0. I watch the Dirty figure increase, and occasionally decrease. In an instance (this is just an easy way to reproduce it quickly) do; dd if=/dev/zero of=/tmp/bigfile bs=1M count=1000 Watch the "dirty" rise and at some point you''ll see "writeback" cut in. All looks good. Give it a few seconds and your "watch" of /proc/meminfo will freeze. On my system "Dirty" will at this point be reading about "500M" and "writeback" will have gone down to zero. "xm list" in another session will confirm that you have a major problem. (it will hang) For some reason PDFLUSH is not working properly !!! On another shell "sync" and the machine instantly jumps back to life! I''m running a stock Ubuntu XEN 3.1 kernel. File back XEN instances, typically 5Gb with 1Gb swap. Dual / Dual Core 2.8G Xeon (4 in total) with 6Gb RAM. Twin 500Gb SATA HDD (software RAID1) To my way of thinking (!) when it runs out of memory, it should force a sync (or similar) and it''s not, it''s just sitting there. If I wait for the dirty_expire_centisecs timer to expire, I may get some life back, some instances will survive and some will have hung. Here''s a working "meminfo"; MemTotal: 860160 kB MemFree: 22340 kB Buffers: 49372 kB Cached: 498416 kB SwapCached: 15096 kB Active: 92452 kB Inactive: 491840 kB SwapTotal: 4194288 kB SwapFree: 4136916 kB Dirty: 3684 kB Writeback: 0 kB AnonPages: 29104 kB Mapped: 13840 kB Slab: 45088 kB SReclaimable: 25304 kB SUnreclaim: 19784 kB PageTables: 2440 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 4624368 kB Committed_AS: 362012 kB VmallocTotal: 34359738367 kB VmallocUsed: 3144 kB VmallocChunk: 34359735183 kB Here''s one where "xm list" hangs, but my "watch" is still updating the /proc/meminfo display; MemTotal: 860160 kB MemFree: 13756 kB Buffers: 53656 kB Cached: 502420 kB SwapCached: 14812 kB Active: 84356 kB Inactive: 507624 kB SwapTotal: 4194288 kB SwapFree: 4136900 kB Dirty: 213096 kB Writeback: 0 kB AnonPages: 28832 kB Mapped: 13924 kB Slab: 45988 kB SReclaimable: 25728 kB SUnreclaim: 20260 kB PageTables: 2456 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 4624368 kB Committed_AS: 361796 kB VmallocTotal: 34359738367 kB VmallocUsed: 3144 kB VmallocChunk: 34359735183 kB Here''s a frozen one; MemTotal: 860160 kB MemFree: 15840 kB Buffers: 2208 kB Cached: 533048 kB SwapCached: 7956 kB Active: 49992 kB Inactive: 519916 kB SwapTotal: 4194288 kB SwapFree: 4136916 kB Dirty: 505112 kB Writeback: 3456 kB AnonPages: 34676 kB Mapped: 14436 kB Slab: 64508 kB SReclaimable: 18624 kB SUnreclaim: 45884 kB PageTables: 2588 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 4624368 kB Committed_AS: 368064 kB VmallocTotal: 34359738367 kB VmallocUsed: 3144 kB VmallocChunk: 34359735183 kB Help!!! Gareth. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
