thr3ads.net - Btrfs devel - 3.11.5 kernel infinite loop [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Russell Coker

2013-Nov-06 01:52 UTC

3.11.5 kernel infinite loop

I have a system running the Debian package of 3.11.5 with an Amd Opteron 1212 
processor (2*64bit cores), 8G of RAM, and an Intel 120G SSD for the root and 
home subvols.  It has a RAID-1 array of 2*3TB disks for bulk storage (movies 
etc) but that probably isn''t relevant to this problem.

On the root filesystem I have cron jobs making daily snapshots of / and /home 
and additional snapshots of /home every 15 minutes.  At midnight a cron job 
removes older snapshots.  For the last 8 days the system has been reliably 
hanging at about 5 minutes after midnight and the subvol removal cron job is 
the only thing that has happened then.

So it seems clear to me that on my system 3.11.5 has a crash a few minutes 
after removing ~98 subvols at the same time.

Last night I watched it happen and deleted a few dozen extra subvols to test 
whether it would repeat.  That wasn''t such a good idea and I rebooted
the
system many times before giving up and booting 3.10.11 which is now working 
correctly.

When running 3.11.5 I was seeing kernel log messages such as the following 
shortly after boot.  Then after that it got into a state where a ssh session 
didn''t work and the X login prompt didn''t even flash
it''s cursor.  In that
state it could still forward packets (the system in question is an ethernet 
bridge which I use to connect my workstation to the Internet) but
couldn''t do
much else.  The NFS server processes locked and sshd wouldn''t complete
the
login process for new connection attempts.

[   68.056003] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-cleaner:270]     
[   68.144004] BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-transacti:271]

Prior to the lockup those two kernel processes had used most CPU time. 
I''m
not sure whether prior to the lockup they were in some sort of CPU loop or 
whether they were just reading a lot of data from a fast SSD and acting 
correctly.

As an aside I ordered a replacement server last week when I wasn''t sure
if
this was a hardware or a software problem.  This will allow me to test some 
things in more detail on the old server after the new one is running, however 
I don''t own a spare SSD so if it''s a SSD specific issue then I
have limited
ability to test.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan

2013-Nov-06 12:37 UTC

head link

Re: 3.11.5 kernel infinite loop

Russell Coker posted on Wed, 06 Nov 2013 12:52:38 +1100 as excerpted:
> I have a system running the Debian package of 3.11.5 with an Amd Opteron
> 1212 processor (2*64bit cores), 8G of RAM, and an Intel 120G SSD for the
> root and home subvols.  It has a RAID-1 array of 2*3TB disks for bulk
> storage (movies etc) but that probably isn''t relevant to this
problem.
> 
> On the root filesystem I have cron jobs making daily snapshots of / and
> /home and additional snapshots of /home every 15 minutes.  At midnight a
> cron job removes older snapshots.  For the last 8 days the system has
> been reliably hanging at about 5 minutes after midnight and the subvol
> removal cron job is the only thing that has happened then.
I believe there''s a btrfs-critical stable-series patch in 3.11.6, that 
you''re probably missing with 3.11.5.  (There were unfortunately some 
crossed signals and the patch was skipped for a couple weeks after it 
should have gone in, but it''s in now.)

Yes... Just checked the 3.11.6 changelog:

Josef Bacik (1):
      Btrfs: use right root when checking for hash collision


Note that there''s another critical patch in-flight, patching a bug 
triggered by btrfs balance on filesystems with pre-allocated files (like 
systemd does with its journal and various torrent clients do with their 
downloads).  But this one is currently being held up because stable rules 
require it to be in current mainline first, and 3.12 is out, but the two-
week 3.13 commit window that would normally be open now is suspended for 
a week, as Linux is traveling without a reliable net connection.  So the 
patch can''t hit mainline, and thus won''t hit stable unless an
exception
is made, until after Linus'' vacation, when the commit window opens and 
the patch is accepted.

See previous discussion here on this list for it, or simply don''t do
any
balances if you''re running systemd or with any other pre-allocated-file
apps such as torrent clients running, until after you get that patch.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Samuel

2013-Nov-09 01:22 UTC

head link

Re: 3.11.5 kernel infinite loop

On Wed, 6 Nov 2013 12:37:32 PM Duncan wrote:
> Note that there''s another critical patch in-flight, patching a bug
> triggered by btrfs balance on filesystems with pre-allocated files (like 
> systemd does with its journal and various torrent clients do with their 
> downloads).  But this one is currently being held up because stable rules 
> require it to be in current mainline first, and 3.12 is out, but the two-
> week 3.13 commit window that would normally be open now is suspended for 
> a week, as Linux is traveling without a reliable net connection.  So the 
> patch can''t hit mainline, and thus won''t hit stable
unless an exception
> is made, until after Linus'' vacation, when the commit window opens
and
> the patch is accepted.
Greg K-H has said he''ll accept stable patches that haven''t hit
the mainline
during this period.

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP

Possibly Parallel Threads

Search for more apparently analagous threads

Btrfs devel - Nov 2013 - 3.11.5 kernel infinite loop

3.11.5 kernel infinite loop

Re: 3.11.5 kernel infinite loop

Re: 3.11.5 kernel infinite loop

Possibly Parallel Threads