Martin Steigerwald
2011-Jan-23 12:06 UTC
[regression?] hang with 2.6.37 on a BTRFS test machine
Not reported on bugzilla.kernel.org yet as its so vague. Hi! I have something I do not have the slightest clue on how to track down: Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in such a way, that when I type a command at the prompt, nothing will happen anymore. Same goes for login into a tty. Even dmesg is not executed anymore. My KDE session doesn''t react to clicks anymore, except for window movement. Most of the time the mouse pointer is still movable, but sometimes it is frozen as well. But the latter might be a different problem with X.org. My theory is that at least I/O related kernel functions do not return anymore at that time. Nothing is in the logs after a reboot. Problem now is: This happens rather irregularily. That kernel run for days without problems. But today two times it hung after the first time it hung. One time it was on TTY 1 instead of in X.org. So when this happens it seems to persist a while. But I now booted with 2.6.36 and the machine is as stable as if nothing has happened at all. OTOH a 2.6.37 with recursive mtime patches from Jan Kara - that hung on the T23 as well - runs just fine on my T42 for nine days (minus hibernation times) already. Now it could be something hardware related, the T23 is different from the T42. Also the T23 has a fan that should be replaced or oiled - but it still works right up to now. Also with 2.6.36 or earlier kernels the machine never hung that way so far. Or it could be something BTRFS related. The T23 is using BTRFS on /home for months already and for / since beginning this year or end of last year while the T42 is still with Ext4 - until there is a robust fsck for BTRFS and BTRFS is marked stable in the kernel at last. A bisection is too much effort for me - especially as it could take days till the problems shows up. Some sysrq stuff doesn''t work without serial or netconsole since a call to dmesg does not return anymore when it happens. Having netconsole and serial running for days for just in case it happens is also quite some effort, since I have to remember to reinitialize it after each hibernation cycle to have a permanent setup. Since I have no other idea on how to track this with reasonable effort - I spent tons of time with trying to help debugging various, partly intermittent, kernel problems in the last months and I have enough of that ATM - I will stick to 2.6.36 for the time being and see whether the T23 is really stable with that one. I still wanted to report my - admittedly vague - findings in case it rings a bell for someone. I will monitor 2.6.37.y updates in case they contain BTRFS and or T23 hardware related fixes. Last but not least: I have no seen BTRFS related errors or other oddities in syslog. The following looks quite normal to me, I did a btrfs filesystem balance some time ago and had some hard switch off cold reboots due to the issue mentioned: artin@deepdance:~> zgrep -i btrfs /var/log/syslog* /var/log/syslog.1:Jan 17 20:43:06 localhost kernel: Btrfs loaded /var/log/syslog.1:Jan 23 12:01:12 localhost kernel: Btrfs loaded /var/log/syslog.1:Jan 23 12:01:12 localhost kernel: btrfs: unlinked 3 orphans /var/log/syslog.1:Jan 23 12:09:36 localhost kernel: Btrfs loaded /var/log/syslog.1:Jan 23 12:09:36 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.1:Jan 23 12:20:52 localhost kernel: Btrfs loaded /var/log/syslog.1:Jan 23 12:20:52 localhost kernel: btrfs: unlinked 18 orphans /var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: btrfs: unlinked 116 orphans /var/log/syslog.2.gz:Jan 14 12:40:02 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 14 14:44:28 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: btrfs: unlinked 3 orphans /var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.2.gz:Jan 16 19:05:53 localhost kernel: Btrfs loaded /var/log/syslog.2.gz:Jan 16 19:05:54 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.7.gz:Dec 9 19:05:38 localhost kernel: Btrfs loaded /var/log/syslog.7.gz:Dec 9 19:05:38 localhost kernel: btrfs: unlinked 2 orphans /var/log/syslog.7.gz:Dec 9 19:05:38 localhost kernel: btrfs: truncated 1 orphans /var/log/syslog.7.gz:Dec 9 20:54:10 localhost kernel: btrfs: relocating block group 10900996096 flags 36 /var/log/syslog.7.gz:Dec 9 20:55:02 localhost kernel: btrfs: found 2036 extents /var/log/syslog.7.gz:Dec 9 20:55:02 localhost kernel: btrfs: relocating block group 10766778368 flags 36 /var/log/syslog.7.gz:Dec 9 20:56:15 localhost kernel: btrfs: found 10432 extents /var/log/syslog.7.gz:Dec 9 20:56:16 localhost kernel: btrfs: relocating block group 9693036544 flags 1 /var/log/syslog.7.gz:Dec 9 20:56:26 localhost kernel: btrfs: found 155 extents /var/log/syslog.7.gz:Dec 9 20:56:31 localhost kernel: btrfs: found 154 extents /var/log/syslog.7.gz:Dec 9 20:56:31 localhost kernel: btrfs: relocating block group 8619294720 flags 1 /var/log/syslog.7.gz:Dec 9 20:57:37 localhost kernel: btrfs: found 5027 extents /var/log/syslog.7.gz:Dec 9 20:57:49 localhost kernel: btrfs: found 5027 extents /var/log/syslog.7.gz:Dec 9 20:57:51 localhost kernel: btrfs: relocating block group 7545552896 flags 1 /var/log/syslog.7.gz:Dec 9 20:59:48 localhost kernel: btrfs: found 12851 extents /var/log/syslog.7.gz:Dec 9 21:00:28 localhost kernel: btrfs: found 12846 extents /var/log/syslog.7.gz:Dec 9 21:00:29 localhost kernel: btrfs: relocating block group 6471811072 flags 1 /var/log/syslog.7.gz:Dec 9 21:01:59 localhost kernel: btrfs: found 46780 extents /var/log/syslog.7.gz:Dec 9 21:03:21 localhost kernel: btrfs: found 46780 extents /var/log/syslog.7.gz:Dec 9 21:03:26 localhost kernel: btrfs: relocating block group 5398069248 flags 1 /var/log/syslog.7.gz:Dec 9 21:04:34 localhost kernel: btrfs: found 19425 extents /var/log/syslog.7.gz:Dec 9 21:05:15 localhost kernel: btrfs: found 19425 extents /var/log/syslog.7.gz:Dec 9 21:05:16 localhost kernel: btrfs: relocating block group 4324327424 flags 1 /var/log/syslog.7.gz:Dec 9 21:06:33 localhost kernel: btrfs: found 26059 extents /var/log/syslog.7.gz:Dec 9 21:07:22 localhost kernel: btrfs: found 26059 extents /var/log/syslog.7.gz:Dec 9 21:07:25 localhost kernel: btrfs: relocating block group 3250585600 flags 1 /var/log/syslog.7.gz:Dec 9 21:08:47 localhost kernel: btrfs: found 15614 extents /var/log/syslog.7.gz:Dec 9 21:09:29 localhost kernel: btrfs: found 15614 extents /var/log/syslog.7.gz:Dec 9 21:09:31 localhost kernel: btrfs: relocating block group 2176843776 flags 1 /var/log/syslog.7.gz:Dec 9 21:10:58 localhost kernel: btrfs: found 4275 extents /var/log/syslog.7.gz:Dec 9 21:11:34 localhost kernel: btrfs: found 4275 extents /var/log/syslog.7.gz:Dec 9 21:11:35 localhost kernel: btrfs: relocating block group 1103101952 flags 1 /var/log/syslog.7.gz:Dec 9 21:13:01 localhost kernel: btrfs: found 7296 extents /var/log/syslog.7.gz:Dec 9 21:13:35 localhost kernel: btrfs: found 7296 extents /var/log/syslog.7.gz:Dec 9 21:13:36 localhost kernel: btrfs: relocating block group 29360128 flags 36 /var/log/syslog.7.gz:Dec 9 21:16:53 localhost kernel: btrfs: found 78308 extents /var/log/syslog.7.gz:Dec 9 21:16:54 localhost kernel: btrfs: relocating block group 20971520 flags 34 /var/log/syslog.7.gz:Dec 9 21:16:54 localhost kernel: btrfs: found 1 extents /var/log/syslog.7.gz:Dec 9 21:16:56 localhost kernel: btrfs: relocating block group 12582912 flags 1 /var/log/syslog.7.gz:Dec 9 21:16:57 localhost kernel: btrfs: found 85 extents /var/log/syslog.7.gz:Dec 9 21:16:59 localhost kernel: btrfs: found 85 extents /var/log/syslog.7.gz:Dec 9 21:17:01 localhost kernel: btrfs: relocating block group 4194304 flags 4 Hmmm, one thing there is: Jan 13 13:27:15 localhost kernel: INFO: task dpkg:21162 blocked for more than 120 seconds. Jan 13 13:27:15 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 13 13:27:15 localhost kernel: dpkg D ef891b0c 0 21162 2334 0x00000000 Jan 13 13:27:15 localhost kernel: d2e4be7c 00000082 ef8918cc ef891b0c ef8918a0 ef891b0c ef891b10 1b8b88c5 Jan 13 13:27:15 localhost kernel: 00018796 ef8918a0 7fffffff d2e4bf4c 7fffffff d2e4bee4 c13c4495 cb7862ac Jan 13 13:27:15 localhost kernel: d2e4be98 c102ab56 cb7862ac c1514fa0 d2e4beb4 c102d740 ee303140 d2e4bec0 Jan 13 13:27:15 localhost kernel: Call Trace: Jan 13 13:27:15 localhost kernel: [<c13c4495>] schedule_timeout+0x175/0x280 Jan 13 13:27:15 localhost kernel: [<c102ab56>] ? __dequeue_entity+0x26/0x50 Jan 13 13:27:15 localhost kernel: [<c102d740>] ? set_next_entity+0xc0/0xe0 Jan 13 13:27:15 localhost kernel: [<c10017b3>] ? __switch_to+0xa3/0x160 Jan 13 13:27:15 localhost kernel: [<c1030132>] ? T.1277+0x32/0xa0 Jan 13 13:27:15 localhost kernel: [<c13c394c>] ? schedule+0x1fc/0x5d0 Jan 13 13:27:15 localhost kernel: [<c13c4006>] wait_for_common+0xa6/0x160 Jan 13 13:27:15 localhost kernel: [<c10316a0>] ? default_wake_function+0x0/0x20 Jan 13 13:27:15 localhost kernel: [<c13c4197>] wait_for_completion+0x17/0x20 Jan 13 13:27:15 localhost kernel: [<c10f8d52>] sync_inodes_sb+0x92/0x1a0 Jan 13 13:27:15 localhost kernel: [<c111f230>] ? dquot_quota_sync+0x0/0x310 Jan 13 13:27:15 localhost kernel: [<c10fd4f7>] __sync_filesystem+0x87/0xa0 Jan 13 13:27:15 localhost kernel: [<c10fd52f>] sync_one_sb+0x1f/0x30 Jan 13 13:27:15 localhost kernel: [<c10d9e6b>] iterate_supers+0x7b/0xf0 Jan 13 13:27:15 localhost kernel: [<c10fd510>] ? sync_one_sb+0x0/0x30 Jan 13 13:27:15 localhost kernel: [<c10fd57c>] sys_sync+0x3c/0x60 Jan 13 13:27:15 localhost kernel: [<c1002f23>] sysenter_do_call+0x12/0x2c Jan 13 13:33:38 localhost init: Trying to re-exec init There is only one occurance of it: martin@deepdance:~> zgrep -i "blocked" /var/log/syslog* /var/log/syslog.2.gz:Jan 13 13:27:15 localhost kernel: INFO: task dpkg:21162 blocked for more than 120 seconds. martin@deepdance:~#1> I put the logs aside in case anyone wants to have a closer look. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
Helmut Hullen
2011-Jan-23 13:13 UTC
Re: [regression?] hang with 2.6.37 on a BTRFS test machine
Hallo, Martin, Du meintest am 23.01.11:> I have something I do not have the slightest clue on how to track > down:> Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in > such a way, that when I type a command at the prompt, nothing will > happen anymore. Same goes for login into a tty. Even dmesg is not > executed anymore. My KDE session doesn''t react to clicks anymore, > except for window movement. Most of the time the mouse pointer is > still movable, but sometimes it is frozen as well. But the latter > might be a different problem with X.org.Maybe it''s the same problem ... I have (among my machines) a Thinkpad T22 on which I have run btrfs with kernel 2.6.37-rc4. Sometimes it hung, too. But it''s an old machine, with old RAM, with some old disks. It has hung some time ago also without any btrfs partition and with older kernel(s). But I''ll test again. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2011-Jan-23 13:44 UTC
Re: [regression?] hang with 2.6.37 on a BTRFS test machine
CCs restored. Its usual to keep CCs on Linux kernel related mailing lists. Am Sunday 23 January 2011 schrieb Helmut Hullen:> Hallo, Martin, > > Du meintest am 23.01.11: > > I have something I do not have the slightest clue on how to track > > down: > > > > Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in > > such a way, that when I type a command at the prompt, nothing will > > happen anymore. Same goes for login into a tty. Even dmesg is not > > executed anymore. My KDE session doesn''t react to clicks anymore, > > except for window movement. Most of the time the mouse pointer is > > still movable, but sometimes it is frozen as well. But the latter > > might be a different problem with X.org. > > Maybe it''s the same problem ... I have (among my machines) a Thinkpad > T22 on which I have run btrfs with kernel 2.6.37-rc4. Sometimes it > hung, too. But it''s an old machine, with old RAM, with some old disks.Is the hung comparable to what I have seen? I.e. mouse pointer and stuff still works, but when entering a command the shell does hang? I can''t completely rule out a hardware related problem as well, but then with 2.6.36 the machine now just works again as if nothing happened - its mostly running at 733 MHz instead of 1.13 GHz and the fan still expels a steady stream of lukewarm air. I think I have to run it with 2.6.36 at least a week to see whether its stable with that one. Its not important to run it with 2.6.37 so thats not a problem. Its just my Amarok machine. Maybe it would be a good idea when you compare with 2.6.36 as well...> It has hung some time ago also without any btrfs partition and with > older kernel(s). > > But I''ll test again.I will also continue to test with a 2.6.37 on a workstation at work, but there only some scratch partition uses BTRFS. My T23 has / and /home on it. Ciao, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
Helmut Hullen
2011-Jan-23 14:13 UTC
Re: [regression?] hang with 2.6.37 on a BTRFS test machine
Hallo, Martin, Du meintest am 23.01.11:>> Maybe it''s the same problem ... I have (among my machines) a >> Thinkpad T22 on which I have run btrfs with kernel 2.6.37-rc4. >> Sometimes it hung, too. But it''s an old machine, with old RAM, with >> some old disks.> Is the hung comparable to what I have seen? I.e. mouse pointer and > stuff still works, but when entering a command the shell does hang?"What''s a mouse pointer"? I run this machine as a pure CLI machine. If I remember correct the last crashes: nothing worked. No keyboard, no LAN connections. Only the "power down" button ...> I will also continue to test with a 2.6.37 on a workstation at work, > but there only some scratch partition uses BTRFS. My T23 has / and > /home on it.On my machines I''ve deleted btrfs for "working" directories, I''m only testing on devices with no valuable data. There''s still a big problem with "ENOSPC". Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html