Richard Eames
2002-Sep-06 03:01 UTC
Huge amount of used inode handlers reported by sar -v (inode-sz)
Any help with this problem would be very much appreciated (even "it's not 7.3 or ext3 pointers, look somewhere else"). I've seen a similar post to ext3-users, but since that one received no reply and I'm not convinced it's a ext3 problem (it only appears on our 7.3 hosts) , I'm CCing to the valhalla list. We have the same problem on ALL our Redhat 7.3 machines (various dual processor Dell 2400, 2500, 2600 machines with RAID cards). There appears to be no consistent time it starts (except that it's always out of business hours 9-5) but once it does the machine eventually dies. At first we though it was our amanda backups, but they're not always running when it starts. We've tightened the machines down as far as we can afford and patched one to the latest rpm and kernel versions available from redhat but no joy. Once it starts the file opens become unreliable and syslog and other processes that rely on sockets start to behave in strange ways. At one stage, on our amanda host the amanda backup kept going long after everything else stopped working, until it needed to rename the log files and then it died too (amanda keeps its log file open the entire time until the end, unlike syslog). I've read through the archives for ext3 and valhalla and only found one email concerning this problem (no reply) and looked through the Redhat errata, and google etc. I've also checked the proc filesystem and can't find any large numbers in inode-nr etc. The only way I've found to get rid of the problem is a reboot. Here's a copy of the sar output from one host. Note the interesting dentunusd values at one stage. 00:01:01 dentunusd file-sz %file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz 01:01:01 351831 506 0.24 4252446106 0 0.00 0 0.00 2 0.20 01:06:01 351832 469 0.22 4252446106 0 0.00 0 0.00 2 0.20 01:11:01 351832 507 0.24 4252446106 0 0.00 0 0.00 2 0.20 01:16:01 351833 507 0.24 4252446106 0 0.00 0 0.00 2 0.20 01:21:01 351834 467 0.22 4252446106 0 0.00 0 0.00 2 0.20 01:26:01 351835 508 0.24 4252446106 0 0.00 0 0.00 2 0.20 01:31:01 4294965457 461 0.22 4251971351 0 0.00 0 0.00 2 0.20 01:36:01 4294965457 460 0.22 4251971351 0 0.00 0 0.00 2 0.20 01:41:01 4294965461 459 0.22 4251971356 0 0.00 0 0.00 2 0.20 * * deleted to save bandwidth * 03:36:01 4294966740 509 0.24 4251971696 0 0.00 0 0.00 2 0.20 03:41:01 4294966741 508 0.24 4251971696 0 0.00 0 0.00 2 0.20 03:46:01 4294965710 468 0.22 4251971527 0 0.00 0 0.00 2 0.20 03:51:01 4294965736 507 0.24 4251971527 0 0.00 0 0.00 2 0.20 03:56:01 4294965752 509 0.24 4251971539 0 0.00 0 0.00 2 0.20 04:01:00 4294965763 508 0.24 4251971546 0 0.00 0 0.00 2 0.20 04:06:00 227450 470 0.22 4252135348 0 0.00 0 0.00 2 0.20 04:11:00 227935 470 0.22 4251950501 0 0.00 0 0.00 2 0.20 04:16:01 203080 472 0.23 4251887721 And another host (note how fast it happens, it's not a gradual build up). 00:01:00 dentunusd file-sz %file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz * * boring stuff edited out * 04:50:59 64932 992 0.95 61428 0 0.00 0 0.00 2 0.20 04:55:59 64947 992 0.95 61442 0 0.00 0 0.00 2 0.20 05:01:01 64970 992 0.95 61461 0 0.00 0 0.00 2 0.20 05:06:01 65098 983 0.94 61312 0 0.00 0 0.00 2 0.20 05:11:01 65121 983 0.94 61314 0 0.00 0 0.00 2 0.20 05:16:01 68 977 0.93 4294960298 0 0.00 0 0.00 3 0.29 05:21:01 622 992 0.95 4294960717 0 0.00 0 0.00 2 0.20 05:26:01 1252 1153 1.10 4294961116 0 0.00 0 0.00 1 0.10 05:31:01 1500 1175 1.12 4294961376 0 0.00 0 0.00 1 0.10 05:36:01 1499 1160 1.11 4294961380 0 0.00 0 0.00 1 0.10 05:41:01 1500 1160 1.11 4294961380 0 0.00 0 0.00 1 0.10 05:46:01 1503 1175 1.12 4294961376 0 0.00 0 0.00 1 0 One very strange thing, the average line from sar for the last one is Average: 9306 842 0.80 4298 0 0.00 0 0.00 1 0.10 But given that the sar file only has less than 50% of inode-sz values less than 4 billion I'm a little perplexed by this line.
Stephen C. Tweedie
2002-Sep-06 10:06 UTC
Re: Huge amount of used inode handlers reported by sar -v (inode-sz)
Hi, On Fri, Sep 06, 2002 at 12:31:30PM +0930, Richard Eames wrote:> Here's a copy of the sar output from one host. Note the interesting > dentunusd values at one stage.> 00:01:01 dentunusd file-sz %file-sz inode-sz super-sz %super-sz > dquot-sz %dquot-sz rtsig-sz %rtsig-sz > 01:01:01 351831 506 0.24 4252446106 0 0.00> 01:31:01 4294965457 461 0.22 4251971351 0 0.00 > 0 0.00 2 0.20OK, these counts look like they are going negative: 4 billion is where unsigned integers wrap round to if you try to decrement them below zero. I can't tell from this whether it's likely to be a kernel accounting problem or just sar misinterpreting things, so could you please open a bugzilla report for this? Thanks, Stephen
Karl F. Larsen
2002-Sep-06 11:36 UTC
Re: Huge amount of used inode handlers reported by sar -v (inode-sz)
Richard, this sounds like a bug to the kernel. Not sure if it is what your checking but your use of the file system is a bit heavy and it might be your up against uncharted effects. I would suggest switching to ext2 and see if the problem stops. That would prove the problem is a ext3 problem. Also if you are not used to using the Redhat Bugzilla service I can help you get started. On Fri, 6 Sep 2002, Richard Eames wrote:> Any help with this problem would be very much appreciated (even "it's not > 7.3 or ext3 pointers, look somewhere else"). > > I've seen a similar post to ext3-users, but since that one received no reply > and I'm not convinced it's a ext3 problem (it only appears on our 7.3 hosts) > , I'm CCing to the valhalla list. > > We have the same problem on ALL our Redhat 7.3 machines (various dual > processor Dell 2400, 2500, 2600 machines with RAID cards). There appears to > be no consistent time it starts (except that it's always out of business > hours 9-5) but once it does the machine eventually dies. At first we though > it was our amanda backups, but they're not always running when it starts. > We've tightened the machines down as far as we can afford and patched one to > the latest rpm and kernel versions available from redhat but no joy. Once it > starts the file opens become unreliable and syslog and other processes that > rely on sockets start to behave in strange ways. At one stage, on our amanda > host the amanda backup kept going long after everything else stopped > working, until it needed to rename the log files and then it died too > (amanda keeps its log file open the entire time until the end, unlike > syslog). > > I've read through the archives for ext3 and valhalla and only found one > email concerning this problem (no reply) and looked through the Redhat > errata, and google etc. I've also checked the proc filesystem and can't find > any large numbers in inode-nr etc. > > The only way I've found to get rid of the problem is a reboot. > > > Here's a copy of the sar output from one host. Note the interesting > dentunusd values at one stage. > > 00:01:01 dentunusd file-sz %file-sz inode-sz super-sz %super-sz > dquot-sz %dquot-sz rtsig-sz %rtsig-sz > 01:01:01 351831 506 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:06:01 351832 469 0.22 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:11:01 351832 507 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:16:01 351833 507 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:21:01 351834 467 0.22 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:26:01 351835 508 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:31:01 4294965457 461 0.22 4251971351 0 0.00 > 0 0.00 2 0.20 > 01:36:01 4294965457 460 0.22 4251971351 0 0.00 > 0 0.00 2 0.20 > 01:41:01 4294965461 459 0.22 4251971356 0 0.00 > 0 0.00 2 0.20 > * > * deleted to save bandwidth > * > 03:36:01 4294966740 509 0.24 4251971696 0 0.00 > 0 0.00 2 0.20 > 03:41:01 4294966741 508 0.24 4251971696 0 0.00 > 0 0.00 2 0.20 > 03:46:01 4294965710 468 0.22 4251971527 0 0.00 > 0 0.00 2 0.20 > 03:51:01 4294965736 507 0.24 4251971527 0 0.00 > 0 0.00 2 0.20 > 03:56:01 4294965752 509 0.24 4251971539 0 0.00 > 0 0.00 2 0.20 > 04:01:00 4294965763 508 0.24 4251971546 0 0.00 > 0 0.00 2 0.20 > 04:06:00 227450 470 0.22 4252135348 0 0.00 > 0 0.00 2 0.20 > 04:11:00 227935 470 0.22 4251950501 0 0.00 > 0 0.00 2 0.20 > 04:16:01 203080 472 0.23 4251887721 > > > And another host (note how fast it happens, it's not a gradual build up). > > 00:01:00 dentunusd file-sz %file-sz inode-sz super-sz %super-sz > dquot-sz %dquot-sz rtsig-sz %rtsig-sz > * > * boring stuff edited out > * > 04:50:59 64932 992 0.95 61428 0 0.00 > 0 0.00 2 0.20 > 04:55:59 64947 992 0.95 61442 0 0.00 > 0 0.00 2 0.20 > 05:01:01 64970 992 0.95 61461 0 0.00 > 0 0.00 2 0.20 > 05:06:01 65098 983 0.94 61312 0 0.00 > 0 0.00 2 0.20 > 05:11:01 65121 983 0.94 61314 0 0.00 > 0 0.00 2 0.20 > 05:16:01 68 977 0.93 4294960298 0 0.00 > 0 0.00 3 0.29 > 05:21:01 622 992 0.95 4294960717 0 0.00 > 0 0.00 2 0.20 > 05:26:01 1252 1153 1.10 4294961116 0 0.00 > 0 0.00 1 0.10 > 05:31:01 1500 1175 1.12 4294961376 0 0.00 > 0 0.00 1 0.10 > 05:36:01 1499 1160 1.11 4294961380 0 0.00 > 0 0.00 1 0.10 > 05:41:01 1500 1160 1.11 4294961380 0 0.00 > 0 0.00 1 0.10 > 05:46:01 1503 1175 1.12 4294961376 0 0.00 > 0 0.00 1 0 > > > > One very strange thing, the average line from sar for the last one is > > Average: 9306 842 0.80 4298 0 0.00 > 0 0.00 1 0.10 > > > But given that the sar file only has less than 50% of inode-sz values less > than 4 billion I'm a little perplexed by this line. > > > > > _______________________________________________ > Valhalla-list mailing list > Valhalla-list@redhat.com > https://listman.redhat.com/mailman/listinfo/valhalla-list >-- 72, Karl K5DI _ __ _ _ _ _ _ _ | | |_ _|| \| || | | | \ \/ / | |__ | | | .` || |_| | > < |____|__ ||_|\_|\____/ /_/\_\