thr3ads.net - freebsd stable - shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Adam Strohl

2013-Jun-19 11:35 UTC

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both 
physical 9.1 boxes as well as VMs for I would say 6-9 months at least. 
  I finally have a physical box here that reproduces it consistently 
that I can reboot easily (ie; not a production/client server).

No matter what I do:

reboot
shutdown -p
shutdown -r

This specific server will stop at "All buffers synced" and not
actually
power down or reboot.  KB input seems to be ignored.  This server is a 
ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show 
this are using GMIRRORs for root/swap/boot (no ZFS).

Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg

When I reset the server it appears that disks were not dismounted 
cleanly ... on this ZFS box it comes back quick because ZFS is good like 
that but on the other servers with GMIRROR roots rebuilding the GMIRROR 
and fscking at the same time is murder on the disk/performance until it 
finishes.

Another interesting thing is that this particular server runs slapd 
(OpenLDAP) which, when it comes back up, has a "corrupted" DB (easily 
fixed with db_recover, but still).  This might be because FS commits 
aren't happening at the end.   I can even manually stop slapd (service 
slapd stop) then run sync(8) (I assume this does something for ZFS too) 
and it still comes back as hosed if I reboot shortly after.  If I 
start/stop slapd it's fine.  So I feel like there is an FS/dismount 
thing going on here.

Additional information: I also have some boxes which will reboot (ie; 
they don't freeze like some do at the end) but they don't dismount 
cleanly either and have to rebuild both GMIRROR and fsck.  This might be 
a different issue, too.

Anyone have any thoughts?  Let me know if I can provide more details etc.

-- 
Adam Strohl
http://www.ateamsystems.com/

Jeremy Chadwick

2013-Jun-19 12:21 UTC

head link

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl
wrote:> Hello -STABLE@,
> 
> So I've seen this situation seemingly randomly on a number of both
> physical 9.1 boxes as well as VMs for I would say 6-9 months at
> least.  I finally have a physical box here that reproduces it
> consistently that I can reboot easily (ie; not a production/client
> server).
> 
> No matter what I do:
> 
> reboot
> shutdown -p
> shutdown -r
> 
> This specific server will stop at "All buffers synced" and not
> actually power down or reboot.  KB input seems to be ignored.  This
> server is a ZFS NAS (with GMIRROR for boot blocks) but the other
> boxes which show this are using GMIRRORs for root/swap/boot (no
> ZFS).
> 
> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
> 
> When I reset the server it appears that disks were not dismounted
> cleanly ... on this ZFS box it comes back quick because ZFS is good
> like that but on the other servers with GMIRROR roots rebuilding the
> GMIRROR and fscking at the same time is murder on the
> disk/performance until it finishes.
1. You mention "as well as VMs".  Anything under a "virtual
machine" or
under a hypervisor is going to be very, very, **VERY** different than
bare metal.  So I hope the issues you're talking about above are on bare
metal -- I will assume so.

2. We need to know what version of "9.1" you're using, i.e.
9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).

3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.

4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?

5. Does "sysctl hw.acpi.handle_reboot=1" help you?

6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?

7. If none of the above helps, can you please boot verbose mode and then
when the system "locks up" on "shutdown -r now" take a
picture of the
VGA console?

8. Does the machine run moused(8) (check the process list please, do not
rely on rc.conf) ?
> Another interesting thing is that this particular server runs slapd
> (OpenLDAP) which, when it comes back up, has a "corrupted" DB
> (easily fixed with db_recover, but still).  This might be because FS
> commits aren't happening at the end.   I can even manually stop
> slapd (service slapd stop) then run sync(8) (I assume this does
> something for ZFS too) and it still comes back as hosed if I reboot
> shortly after.  If I start/stop slapd it's fine.  So I feel like
> there is an FS/dismount thing going on here.
sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html

Your problem is related to unclean shutdown; fix that and your issues go
away.
> Additional information: I also have some boxes which will reboot
> (ie; they don't freeze like some do at the end) but they don't
> dismount cleanly either and have to rebuild both GMIRROR and fsck.
> This might be a different issue, too.
Every issue needs to be handled/treated separately.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

Steven Hartland

2013-Jun-19 12:23 UTC

head link

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

OS version?
----- Original Message ----- 
From: "Adam Strohl" <adams-freebsd at ateamsystems.com>
To: <freebsd-stable at freebsd.org>
Sent: Wednesday, June 19, 2013 12:35 PM
Subject: shutdown -r / shutdown -h / reboot all hang and don't cleanly
dismount

> Hello -STABLE@,
> 
> So I've seen this situation seemingly randomly on a number of both 
> physical 9.1 boxes as well as VMs for I would say 6-9 months at least. 
>  I finally have a physical box here that reproduces it consistently 
> that I can reboot easily (ie; not a production/client server).
> 
> No matter what I do:
> 
> reboot
> shutdown -p
> shutdown -r
> 
> This specific server will stop at "All buffers synced" and not
actually
> power down or reboot.  KB input seems to be ignored.  This server is a 
> ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show 
> this are using GMIRRORs for root/swap/boot (no ZFS).
> 
> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
> 
> When I reset the server it appears that disks were not dismounted 
> cleanly ... on this ZFS box it comes back quick because ZFS is good like 
> that but on the other servers with GMIRROR roots rebuilding the GMIRROR 
> and fscking at the same time is murder on the disk/performance until it 
> finishes.
> 
> Another interesting thing is that this particular server runs slapd 
> (OpenLDAP) which, when it comes back up, has a "corrupted" DB
(easily
> fixed with db_recover, but still).  This might be because FS commits 
> aren't happening at the end.   I can even manually stop slapd (service 
> slapd stop) then run sync(8) (I assume this does something for ZFS too) 
> and it still comes back as hosed if I reboot shortly after.  If I 
> start/stop slapd it's fine.  So I feel like there is an FS/dismount 
> thing going on here.
> 
> Additional information: I also have some boxes which will reboot (ie; 
> they don't freeze like some do at the end) but they don't dismount 
> cleanly either and have to rebuild both GMIRROR and fsck.  This might be 
> a different issue, too.
> 
> Anyone have any thoughts?  Let me know if I can provide more details etc.
> 
> -- 
> Adam Strohl
> http://www.ateamsystems.com/
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>
===============================================This e.mail is private and
confidential between Multiplay (UK) Ltd. and the person or entity to whom it is
addressed. In the event of misdirection, the recipient is prohibited from using,
copying, printing or otherwise disseminating it or any information contained in
it.

In the event of misdirection, illegible or incomplete transmission please
telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.

Richard Tector

2013-Jun-20 18:22 UTC

head link

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

On 19/06/2013 12:35, Adam Strohl wrote:> Hello -STABLE@,
>
> So I've seen this situation seemingly randomly on a number of both
> physical 9.1 boxes as well as VMs for I would say 6-9 months at least.
>   I finally have a physical box here that reproduces it consistently
> that I can reboot easily (ie; not a production/client server).
>
> No matter what I do:
>
> reboot
> shutdown -p
> shutdown -r
>
> This specific server will stop at "All buffers synced" and not
actually
> power down or reboot.  KB input seems to be ignored.  This server is a
> ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show
> this are using GMIRRORs for root/swap/boot (no ZFS).
>
> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
>
Hi,

Just to add a 'me too'. I see this on two different boxes, both 
currently running recentish 9.1-STABLE, and it has definitely been an 
issue for me since at least 9.0-RELEASE.

One of the boxes is a Dell R210 II with a single WD HDD - dmesg: 
http://daniel.thekeelecentre.com/dmesg.txt
I've tried booting/rebooting without the USB KVM dongle attached too.
Notes - does not run moused and no OpenLDAP.

The second host I have the issue with is a home-build using a Tyan 
Toledo i3210W (S5211) and two Seagate HDDs - dmesg: 
http://daniel.thekeelecentre.com/dmesg-daffy.txt (yes, a disk has 
failed, but the reboot issue pre-dated this).
Note - does not run moused, but did run slapd. I saw the same DB 
corruption as the OP.

I can play with the latter box as it is no longer in use and will try 
the following suggestions from Jeremy later this evening:
     4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
     5. Does "sysctl hw.acpi.handle_reboot=1" help you?
     6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?

Regards,

Richard

Maybe Matching Threads

Search for more apparently analagous threads

freebsd stable - Jun 2013 - shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

Maybe Matching Threads