Willem Jan Withagen
2012-Nov-21 09:55 UTC
Some new hardware with 9.1 does not reboot easily
Hoi, I'm building some new hardware for a customer, and given that 9.1 is about to be around the corner, I installed 9.1-stable. svn from last night.... Trouble is that a reboot takes for ever... Same with shutdown -r now... What happens is: services get killed we end with all buffers synced. Then the systems is idle for like 30 secs (or more) And then after a while I go CTRL-ALT-DEL to see what happens..... Which gets me: shared obj libpcre.so.1 not found, required by postfix Writing entropy file Terminated. Init some proccess would not die; ps axl advised And the I get the std shutdown kernel messages again, but then with time-outs. Waiting (max 60 sec) for system process 'vnlru' to stop ... timed out Waiting (max 60 sec) for system process 'bufdeamon' to stop ... timed out Waiting (max 60 sec) for system process 'syncer' to stop ... timed out And then it reboots.... Now why is that? Rebooting it on the 9.0-RELEASE memory stick seems to work just fine. Thanx, --WjW The hardware: Supermicro motherboard: X9SRi-3F (Sandy bridge) AMI bios 1.0a E5-1260 XEON 64Gb EEC memory LSI diskcontroller mps0 at pci0:8:0:0: class=0x010700 card=0x30201000 chip=0x00721000 rev=0x03 hdr=0x00 vendor = 'LSI Logic / Symbios Logic' device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' class = mass storage subclass = SAS 4* Seagata ATA (2 on LSI, 2 on motherboard) 4* Seagate SAS (ALL on LSI) 2* Intel SSD (connected to ATA on motherboard) The system is completely ZFS, with a 4-way mirror on the ATA seagates as zfsboot
on 21/11/2012 11:55 Willem Jan Withagen said the following:> Hoi, > > I'm building some new hardware for a customer, and given that 9.1 is > about to be around the corner, I installed 9.1-stable. > > svn from last night.... > > Trouble is that a reboot takes for ever... > Same with shutdown -r now... > > What happens is: > services get killed > we end with all buffers synced. > > Then the systems is idle for like 30 secs (or more)At this stage try to enter ddb and run ps in it. Maybe you could spot something interesting / obvious.> And then after a while I go CTRL-ALT-DEL to see what happens..... > Which gets me: > > shared obj libpcre.so.1 not found, required by postfix > Writing entropy file > Terminated. > Init some proccess would not die; ps axl advised > > And the I get the std shutdown kernel messages again, but then with > time-outs. > Waiting (max 60 sec) for system process 'vnlru' to stop ... timed out > Waiting (max 60 sec) for system process 'bufdeamon' to stop ... timed out > Waiting (max 60 sec) for system process 'syncer' to stop ... timed out > > And then it reboots.... > > Now why is that? > Rebooting it on the 9.0-RELEASE memory stick seems to work just fine. > > Thanx, > --WjW > > The hardware: > Supermicro motherboard: X9SRi-3F > (Sandy bridge) > AMI bios 1.0a > E5-1260 XEON > 64Gb EEC memory > LSI diskcontroller > mps0 at pci0:8:0:0: class=0x010700 card=0x30201000 chip=0x00721000 > rev=0x03 hdr=0x00 > vendor = 'LSI Logic / Symbios Logic' > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > class = mass storage > subclass = SAS > > 4* Seagata ATA (2 on LSI, 2 on motherboard) > 4* Seagate SAS (ALL on LSI) > 2* Intel SSD (connected to ATA on motherboard) > > The system is completely ZFS, with a 4-way mirror on the ATA seagates as > zfsboot > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-- Andriy Gapon
Willem Jan Withagen
2012-Nov-21 18:08 UTC
Some new hardware with 9.1 does not reboot easily
On 2012-11-21 19:05, Andriy Gapon wrote:> on 21/11/2012 19:48 Willem Jan Withagen said the following: >> On 2012-11-21 18:27, Willem Jan Withagen wrote: >>> On 2012-11-21 18:11, Andriy Gapon wrote: >>>> on 21/11/2012 19:09 Andriy Gapon said the following: >>>>> on 21/11/2012 19:06 Willem Jan Withagen said the following: >>>>>> Nothing that stands out for me, but then I'm not into FreeBSD kernels. >>>>>> But there is certainly no more userspace processes running other than >>>>>> reboot..... >>>>>> >>>>>> Certainly no postfix, that could complain about missing libpcre.so.1 >>>>>> That seems to be something that should have been flushed from the >>>>>> print_buffer before. >>>>>> >>>>>> What I do see i a huge amount of ZFS threads.... >>>>>> >>>>>> Rebooting from DDB is instantaneously... >>>>>> >>>>>> So I'm not certain what to look for further? >>>>> >>>>> Perhaps share the output if you are able to capture it... >>>> >>>> State of the init process should be more interesting. >>>> You can switch to it (using thread <id>) and capture its stack trace ('bt'). >>> >>> The box is not on a serial connection. >>> So capturing will be picture with Iphone and retyping it. >>> >>> init process should be 1, right? >>> I'll give it a shot >> >> Just private since it include an image of the bt... >> >> Init is there, its state is 'RLs' >> , but it does not have threads and thread 1 does not work. >> but 'bt 1' does the trick. >> >> It seems to to be waiting/working in the ZFS code to get things unmounted. > > Yeah, oops, this is a known ZFS deadlock in zfs_freebsd_reclaim -> zfs_zget path. > I may commit my fix for it to head on the next weekend. > You may share this information with the list.Any change of getting this back into 9.1? Preferably before 9.1-RELEASE, but otherwise real soon after that. I'm the perfect test guinea-pig, it happens every time I reboot. --WjW> >> Disk situation: >> 4* SATA seagate 1T (2 on sandy bridge 2 on LSI 2008) >> 4* SAS seagate 600Gb/15K all on LSI 2008 >> 2* intel SSD 540 200GB both on Sata-3 on sandy bridge >> >> ZFS config >> zfsboot= 50Gb 4way mirror on 4* SATA >> 2*2Gb cache on both SSDs >> sataraid=remainder of SATA disks in raidz >> 2*1Gb log on SSDs >> 2*50Gb cache on SSDs >> sasraid=full disk raidz of sas disks >> 2*1GB log on SSDs >> 2*100GB cache on SSDs >> >> www# zpool status -v >> pool: sasraid >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> sasraid ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> gpt/sasraid0 ONLINE 0 0 0 >> gpt/sasraid1 ONLINE 0 0 0 >> gpt/sasraid2 ONLINE 0 0 0 >> gpt/sasraid3 ONLINE 0 0 0 >> logs >> gpt/log-sasraid0 ONLINE 0 0 0 >> gpt/log-sasraid1 ONLINE 0 0 0 >> cache >> ada0p5 ONLINE 0 0 0 >> ada1p5 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: sataraid >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> sataraid ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> gpt/sataraid0 ONLINE 0 0 0 >> gpt/sataraid1 ONLINE 0 0 0 >> gpt/sataraid2 ONLINE 0 0 0 >> gpt/sataraid3 ONLINE 0 0 0 >> logs >> gpt/log-sataraid0 ONLINE 0 0 0 >> gpt/log-sataraid1 ONLINE 0 0 0 >> cache >> ada0p3 ONLINE 0 0 0 >> ada1p3 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: zfsboot >> state: ONLINE >> scan: resilvered 513M in 0h0m with 0 errors on Tue Nov 20 13:41:00 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsboot ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> ada2p3 ONLINE 0 0 0 >> ada3p3 ONLINE 0 0 0 >> da3p3 ONLINE 0 0 0 >> da2p3 ONLINE 0 0 0 >> cache >> ada0p1 ONLINE 0 0 0 >> ada1p1 ONLINE 0 0 0 >> >> errors: No known data errors >> >> --WjW >> >> >> > >
Thus spake Andriy Gapon <avg at FreeBSD.org>:> on 29/11/2012 17:16 Willem Jan Withagen said the following: >> Would that mean that the regular checkout of stable/9 contains enough >> code to allow "painless" rebooting... > > Not yet...Has this been resolved? I still see a hang on reboot/shutdown on my box (zfs root on USB thumb drive), but I am not sure if the problem is related. Julian
Willem Jan Withagen
2013-Jan-11 19:38 UTC
Some new hardware with 9.1 does not reboot easily
On 2013-01-07 18:06, Julian Stecklina wrote:> Thus spake Andriy Gapon <avg at FreeBSD.org>: > >> on 29/11/2012 17:16 Willem Jan Withagen said the following: >>> Would that mean that the regular checkout of stable/9 contains enough >>> code to allow "painless" rebooting... >> >> Not yet... > > Has this been resolved? I still see a hang on reboot/shutdown on my box > (zfs root on USB thumb drive), but I am not sure if the problem is > related.Could very well be be. I have again the same problem as I reported before with the full and new 9.1 code. But did not have time yet to build a system te test with. My other 9.1 box is my ZFS only fileserver. And I do not want to fidle to much with it. A reboot work around that works for me: reboot -n shutdown -n now Of which the manual pages say: option should not be used. But I have not yet found bad effects. Perhaps becuase I only have ZFS fs-systems --WjW
on 11/01/2013 21:38 Willem Jan Withagen said the following:> On 2013-01-07 18:06, Julian Stecklina wrote: >> Has this been resolved? I still see a hang on reboot/shutdown on my box >> (zfs root on USB thumb drive), but I am not sure if the problem is >> related. > > Could very well be be. > > I have again the same problem as I reported before with the full and new > 9.1 code. > But did not have time yet to build a system te test with. > > My other 9.1 box is my ZFS only fileserver. And I do not want to fidle > to much with it.Have you guys tried to obtain a stack trace of the stuck reboot thread (in ddb)? Not all hangs in reboot have the same cause... -- Andriy Gapon