thr3ads.net - Ext3 users - update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt [Nov 2002]

If this information is useful, please help other people find it:
Share via:

Neal McBurnett

2002-Nov-11 19:35 UTC

update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt

In August I reported a problem with the sys_ftruncate call that caused
me to reboot my machine.  I didn't see any responses to it then on the
ext3 list, and the problem is now recurring, so I thought I'd try
again.  I don't think I've rebooted since the last problem.
 
In the last few days it hasn't taken as long as 17 hours, but it has
sometimes taken unusual and uncomfortable amounts of time (many
minutes at least).  Normally, with my 266 MB $MAIL, it only takes a
few seconds to update the file, since mutt is clever enough to only
write the tail end of the file starting with the first change.

It doesn't seem like a mutt bug, since the whole episode takes place
inside a single system call, and the problem only showed up after
upgrading to Redhat 7.3 and ext3.

Any clues?  Anything to test out before rebooting and fscking again?

Neal McBurnett                 http://bcn.boulder.co.us/~neal/
GPG/PGP signed and/or sealed mail encouraged.  Keyid: 2C9EBA60


----- Forwarded message from Neal McBurnett <neal@bcn.boulder.co.us> -----

From: Neal McBurnett <neal@bcn.boulder.co.us>
To: ext3-users@redhat.com
Subject: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt
Date: Thu, 15 Aug 2002 10:26:08 -0600

Several times recently my "mutt" email program has looped for
hours at a time in the middle of a sys_ftruncate call.  This happens
when I use the "$" command to write changes out to my mailbox.  It
does eventually return from the call and everything seems to have
worked ok.  But in the meantime the CPU is pegged, $MAIL
is locked so I can't receive new mail, and signals to the program
(like kill -9) don't take effect for hours.  Once it was 17 hours,
once 3, etc.

The problem showed up shortly after upgrading from Red Hat 7.1 and
converting the file systems to ext3.  I'm running Red Hat 7.3, kernel
2.4.18-3, mutt-1.2.5.1-1.

Strace didn't help at all, but thanks to a tip from Kevin Fenzi
I learned how to use sysrq to find out where the process was, viz:

 18:03:10 kernel: mutt          R current   1024  8893   7929  (NOTLB)
 18:03:10 kernel: Call Trace: [<c0127061>] truncate_list_pages [kernel]
0x79
 18:03:10 kernel: [<c01271ff>] truncate_inode_pages [kernel] 0x3b 
 18:03:10 kernel: [<c0124f2e>] vmtruncate [kernel] 0x96 
 18:03:10 kernel: [<c01491f0>] inode_setattr [kernel] 0x24 
 18:03:10 kernel: [<d401f963>] ext3_setattr [ext3] 0x1c3 
 18:03:10 kernel: [<d401d810>] ext3_get_block [ext3] 0x0 
 18:03:10 kernel: [<c01281db>] do_generic_file_read [kernel] 0x2c3 
 18:03:10 kernel: [<c0149359>] notify_change [kernel] 0x5d 
 18:03:10 kernel: [<c012a2aa>] generic_file_write [kernel] 0x5c2 
 18:03:10 kernel: [<c01348ce>] do_truncate [kernel] 0x46 
 18:03:10 kernel: [<c0134bd1>] sys_ftruncate [kernel] 0x12d
 18:03:10 kernel: [<c01085f7>] system_call [kernel] 0x33 

I noticed that an fsck hadn't been done for months, so I did one
with this result, indicating some sort of problem with $MAIL:

 13:25:31 fsck: /var:  
 13:25:31 fsck: Truncating orphaned inode 44891 (uid=6265, gid=6265,
mode=0100600, size=175526062)
 13:25:36 fsck: /var has gone 69 days without being checked, check forced. 
 13:25:43 fsck: /var: 1057/104040 files (24.0% non-contiguous), 281356/415768
blocks

The file in question is large:

  44891 -rw-------    1 neal     neal     175694250 Aug 13 13:50 /var/mail/neal

I haven't seen the problem in the last day, but I've had successful
days in the past also.

I would hope that even in the face of a file system problem, the
kernel shouldn't take so long to do a system call.  Any ideas?

Is there a bug-tracking system (bugzilla?) for ext3 or the kernel?

Thanks,

Neal McBurnett                 http://bcn.boulder.co.us/~neal/
GPG/PGP signed and/or sealed mail encouraged.  Keyid: 2C9EBA60

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users

----- End forwarded message -----

Stephen C. Tweedie

2002-Nov-15 17:12 UTC

head link

Re: update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt

Hi,

On Mon, Nov 11, 2002 at 12:35:02PM -0700, Neal McBurnett
wrote:> In August I reported a problem with the sys_ftruncate call that caused
> me to reboot my machine.  I didn't see any responses to it then on the
> ext3 list, and the problem is now recurring, so I thought I'd try
> again.  I don't think I've rebooted since the last problem.
>  
> In the last few days it hasn't taken as long as 17 hours, but it has
> sometimes taken unusual and uncomfortable amounts of time (many
> minutes at least). 
Please have a look at 

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77669

as this sounds like the same problem.  The user in that case reports
that it happens with ext2 too.  Seems that the core VM truncate code
in that kernel is livelocking under some situations.

--Stephen

Maybe Matching Threads

Search for more possibly parallel threads

Ext3 users - Nov 2002 - update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt

update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt

Re: update: sys_ftruncate call lasting 17 hours on ext3 filesystem from mutt

Maybe Matching Threads