Steven Hartland
2011-Oct-20 17:34 UTC
zfs parition probing causing long delay at BTX loader
Installing a new machine here which has 10+ disks we're seeing BTX loader take 50+ seconds to enumerate the disks. After doing some digging I found the following thread on the forums which hinted that r198420 maybe the cause. http://forums.freebsd.org/showthread.php?t=12705 A quick change to zfs.c reverting the change to support 128 partitions back to 4 and BTX completes instantly like it used to. svn commit which introduced this delay is:- http://svnweb.freebsd.org/base?view=revision&revision=198420 the specific file in that changeset:- http://svnweb.freebsd.org/base/head/sys/boot/zfs/zfs.c?r1=198420&r2=198419&pathrev=198420 So the questions are:- 1. Can this be optimised so it doesn't have to test all of the possible 128 GPT partitions? 2. If a optimisation isn't possible or is too complex to achieve would it be better to have the partitions defined as an option which can be increased if needed as I suspect 99.99% if not 100% of users won't be making use of more than 4 partitions even with GPT, such as what the attached patch against 8.2-RELEASE achieves. Regards Steve ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. -------------- next part -------------- --- sys/boot/zfs/zfs.c.orig 2011-10-20 18:15:29.966685430 +0000 +++ sys/boot/zfs/zfs.c 2011-10-20 18:18:22.291033636 +0000 @@ -45,6 +45,12 @@ #include "zfsimpl.c" +/* + * For GPT this should be 128 but leads to 50+ second delay in BTX loader so + * we use the original 4 pre r198420 by default for the boot process + */ +#define ZFS_MAX_SLICES 4 + static int zfs_open(const char *path, struct open_file *f); static int zfs_write(struct open_file *f, void *buf, size_t size, size_t *resid); static int zfs_close(struct open_file *f); @@ -415,7 +421,7 @@ if (vdev_probe(vdev_read, (void*) (uintptr_t) fd, 0)) close(fd); - for (slice = 1; slice <= 128; slice++) { + for (slice = 1; slice <= ZFS_MAX_SLICES; slice++) { sprintf(devname, "disk%dp%d:", unit, slice); fd = open(devname, O_RDONLY); if (fd == -1) {
On 10/20/2011 07:23 PM, Steven Hartland wrote:> Installing a new machine here which has 10+ disks > we're seeing BTX loader take 50+ seconds to enumerate > the disks.I am running 8-STABLE. On my system with 22 disks, it took much longer than a minute (maybe 5 minutes... not sure, but overall boot was about 7 minutes). While this time is passing, I can watch the leds on the disks blink in order, many times in a loop. My IO card is a LSI SATA/SAS 9211-8i 6Gb/s. After I upgraded the firmware to version 11, it seems to take much less time, but I didn't time it. And watching the LEDs last time I rebooted, I don't notice them all blinking the same way. Instead, all were solid for a second or two after the long wait, and then only the root disks. So if you have the same card, I suggest you update the firmware. (I updated for stability rather than boot speed, and it seemed stable until it froze today, after 2 weeks)> After doing some digging I found the following thread > on the forums which hinted that r198420 maybe the > cause. > http://forums.freebsd.org/showthread.php?t=12705 > > A quick change to zfs.c reverting the change to > support 128 partitions back to 4 and BTX completes > instantly like it used to. > > svn commit which introduced this delay is:- > http://svnweb.freebsd.org/base?view=revision&revision=198420 > > the specific file in that changeset:- > http://svnweb.freebsd.org/base/head/sys/boot/zfs/zfs.c?r1=198420&r2=198419&pathrev=198420 > > > So the questions are:- > > 1. Can this be optimised so it doesn't have to test all > of the possible 128 GPT partitions? > > 2. If a optimisation isn't possible or is too complex to > achieve would it be better to have the partitions defined > as an option which can be increased if needed as I suspect > 99.99% if not 100% of users won't be making use of more > than 4 partitions even with GPT, such as what the attached > patch against 8.2-RELEASE achieves. > > Regards > Steve > > ===============================================> This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"-- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------
First post failed, because the attachment was too big. (Or maybe you got a copy anyway since you are also in the To) Here it is again: On 10/21/2011 01:32 PM, Peter Maloney wrote:> On 10/21/2011 01:04 PM, Steven Hartland wrote: >> ----- Original Message ----- From: "Peter Maloney" >> <peter.maloney@brockmann-consult.de> >> To: <freebsd-stable@freebsd.org> >> Sent: Friday, October 21, 2011 11:17 AM >> Subject: Re: zfs parition probing causing long delay at BTX loader >> >> >>> On 10/20/2011 07:23 PM, Steven Hartland wrote: >>>> Installing a new machine here which has 10+ disks >>>> we're seeing BTX loader take 50+ seconds to enumerate >>>> the disks. >>> I am running 8-STABLE. On my system with 22 disks, it took much longer >>> than a minute (maybe 5 minutes... not sure, but overall boot was >>> about 7 >>> minutes). While this time is passing, I can watch the leds on the disks >>> blink in order, many times in a loop. >>> >>> My IO card is a LSI SATA/SAS 9211-8i 6Gb/s. >> >> We are indeed using that 3 x 9211-8i's per chassis. >>> After I upgraded the firmware to version 11, it seems to take much less >>> time, but I didn't time it. And watching the LEDs last time I rebooted, >>> I don't notice them all blinking the same way. Instead, all were solid >>> for a second or two after the long wait, and then only the root disks. >> >> We are already running fw v11.00.00.00 but thanks for the heads up. >> > Are you running the IT or IR firmware version? I am running the IR one. > > And by the way, here is my uname -a: > # uname -a > FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 > 15:06:03 CEST 2011 > root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 > > And I installed 8-STABLE using cvsup using this date filter in my > cvsup file: > *default date=2011.09.27.00.00.00 > > And I remember one other thing I did since the firmware upgrade. I > booted off a Linux hard disk which I had to put in the first hard disk > bay, or it wouldn't boot from it. So I moved the root disk somewhere > else. FreeBSD still boots, so I left it where I moved it. I don't know > if that changed the boot time. > >>> So if you have the same card, I suggest you update the firmware. (I >>> updated for stability rather than boot speed, and it seemed stable >>> until >>> it froze today, after 2 weeks) >> >> Do you have any information about the hang? > I decided to rename some of my replication snapshots to fill in gaps > from daily snapshots (since I wasn't always doing them daily)... just > so I could delete old replication snapshots. > > So I wrote a bash script to take the first replication snapshot per > day and rename it > (need to get bash from ports or hope it runs in sh): > > for day in {4..16}; do > if [ "$day" -lt 10 ]; then > day="0$day" > fi > > firstSnapshotOfDay=$(zfs list -o name -t snapshot -r tank | grep > -E "^tank@replication-201110${day}" | head -n1) > > if [ "$firstSnapshotOfDay" = "" ]; then > continue > fi > > time=$(echo ${firstSnapshotOfDay} | cut -d'-' -f2) > hour=${time:8:2} > minute=${time:10:2} > second=${time:12:2} > > echo "=============" > echo $day $firstSnapshotOfDay $time $hour $minute $second > echo zfs rename -r "${firstSnapshotOfDay}" > tank@daily-2011-10-${day}T${hour}:${minute}:${second} > done > > And then I took the output from it, and started running it. > For example: > > zfs rename -r tank@replication-20111004111436 > tank@daily-2011-10-04T11:14:36 > > I ran 8 of the commands like the above, which took about 1 second > each. Then the 9th command froze. > > root@bcnas1:~/bin/zfs/snapshots# zfs rename -r > tank@replication-20111013000000 tank@daily-2011-10-13T00:00:00 > ^C > load: 0.13 cmd: zfs 70731 [tx->tx_sync_done_cv)] 489.40r 0.00u 0.07s > 0% 1760k > load: 0.01 cmd: zfs 70731 [tx->tx_sync_done_cv)] 638.13r 0.00u 0.07s > 0% 1328k > > I then tried other things in other windows (using screen). Anything > involving zpool or zfs would hang like this: > > root@bcnas1:~/bin/rsync# zpool status > ^C^C > load: 0.12 cmd: zpool 87352 [spa_namespace_lock] 479.77r 0.00u 0.00s > 0% 1628k > load: 0.01 cmd: zpool 87352 [spa_namespace_lock] 616.65r 0.00u 0.00s > 0% 1288k > > Other attempts to read from the tank zpool worked fine. But maybe it > was only reading from arc and l2arc. The system has 48 GB of memory. > And my NFS mounts stopped working. NFS requests would just timeout. > > top, gstat, etc. all show an idle system. I had a "zpool iostat 5" > running in another window, which was not frozen, but also just looked > normal and idle. > > It reminds me of when I was using 8.2-RELEASE, and zfs destroy on a > snapshot caused a kernel panic. > > I also tried kill -9 on the above "zfs rename" process, which didn't > work (normal for processes waiting for kernel calls). > > Then I did "shutdown -r now" which got me to a screen (attached a > photo... unfortunately I don't know how the serial console or KVM over > IP works so can't get a proper log) saying that shutdown terminated > abnormally, going to single user mode (never got a prompt though), and > "some processes would not die; ps axl advised". I don't know which > processes hung (58465 I guess) . If ctrl+t is showing the process id, > then it wasn't one of the commands above. I think it might have been > nfsd. (I have also found that "/etc/rc.d/nfsd restart" causes nfs to > stop and never come back, and use lots of cpu; here is my post about > it: http://forums.freebsd.org/showthread.php?t=26727) > > I looked through the logs, but don't see anything interesting. Here > is the log from my reboot until not including the next boot: > > Oct 21 11:39:59 bcnas1 shutdown: reboot by peter: > Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.422014, 0] > nmbd/nmbd_browsesync.c:585(collect_all_workgroup_names_from_wins_server) > Oct 21 11:40:01 bcnas1 nmbd[64827]: > collect_all_workgroup_names_from_wins_server: > Oct 21 11:40:01 bcnas1 nmbd[64827]: Cannot find my workgroup > ARBEITSGRUPPE on subnet UNICAST_SUBNET. > Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.522809, 0] > nmbd/nmbd_browsesync.c:585(collect_all_workgroup_names_from_wins_server) > Oct 21 11:40:01 bcnas1 nmbd[64827]: > collect_all_workgroup_names_from_wins_server: > Oct 21 11:40:01 bcnas1 nmbd[64827]: Cannot find my workgroup > ARBEITSGRUPPE on subnet UNICAST_SUBNET. > Oct 21 11:40:01 bcnas1 ntpd[75976]: ntpd exiting on signal 15 > Oct 21 11:40:01 bcnas1 nmbd[64827]: [2011/10/21 11:40:01.857859, 0] > nmbd/nmbd.c:71(terminate) > Oct 21 11:40:01 bcnas1 nmbd[64827]: Got SIGTERM: going down... > Oct 21 11:40:31 bcnas1 rc.shutdown: 30 second watchdog timeout > expired. Shutdown terminated. > Oct 21 11:40:31 bcnas1 init: /bin/sh on /etc/rc.shutdown terminated > abnormally, going to single user mode > Oct 21 11:40:31 bcnas1 syslogd: exiting on signal 15 > Oct 21 11:47:39 bcnas1 syslogd: kernel boot file is /boot/kernel/kernel > > Looking at the screenshot, maybe 58465 is the process id that wouldn't > die. I could not find that number in /var/log/messages. > > > Any suggestions on where I can find useful information? > > I plan to keep searching for something that shows error messages. On > every other panic, I would find SCSI errors, mps driver messages, etc. > >> >> Regards >> Steve >> This e.mail is private and confidential between Multiplay (UK) Ltd. >> and the person or entity to whom it is addressed. In the event of >> misdirection, the recipient is prohibited from using, copying, >> printing or otherwise disseminating it or any information contained >> in it. >> >> In the event of misdirection, illegible or incomplete transmission >> please telephone +44 845 868 1337 or return the E.mail to >> postmaster@multiplay.co.uk. >> > > > -- > > -------------------------------------------- > Peter Maloney > Brockmann Consult > Max-Planck-Str. 2 > 21502 Geesthacht > Germany > Tel: +49 4152 889 300 > Fax: +49 4152 889 333 > E-mail: peter.maloney@brockmann-consult.de > Internet: http://www.brockmann-consult.de > ---------------------------------------------- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------
Steven Hartland
2011-Oct-21 15:59 UTC
zfs parition probing causing long delay at BTX loader
----- Original Message ----- From: "Mark Saad" <nonesuch@longcount.org>>> Yes after the beastie menu we also have a noticable pause >> before the kernel start booting, just white square top left hand >> corner, not tracked that one down yet, any ideas? > > Steve the change you need is in HEAD not sure if it was MFC'd to 9-STABLE yet > > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/machdep.c?view=markup&pathrev=224516 > > There is a slow memory check done early in the kernel load that is > mostly useless. DragonFlyBSD removed it completely. FreeBSD has a > loader tunable to disable. I have a few boxes with 128G of ram and it > takes about 2mins to test the ram .Thanks Mark that removed the 45 seconds delay after beastie so in total boot is now 1 1/2 minutes quicker, result :) Regards Steve ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.