On 09/27/2016 08:57, Borja Marcos wrote:> >> On 27 Sep 2016, at 15:48, Jan Henrik Sylvester <me at janh.de> wrote: >> >> On 09/27/2016 12:16, Borja Marcos wrote: >>> I have noticed that the GENERIC kernel in 11-STABLE includes the >>> PCI_HP option, and the hotplug bits seem to be present in the >>> kernel, but I don?t see any userland support for it. >>> >>> Is it somewhat complete and in that case am I missing something? >> >> I do not know kind of userland support you mean. I just tried: >> >> Plugging in my USB 3.0 ExpressCard while 11.0 is running, the >> controller was detected and I was able to use USB devices with it. >> Great. > > Thanks :) > > I was hoping (and I assume it?s the ultimate goal of the project) to > be able to hot plug PCIe devices such as NVMe drives. > > On Solaris you can replace them provided you power them off > previously (there?s a command for that, ?hotplug?). > > On FreeBSD I?ve tried using devctl but powering off, disabling a > device and enabling it again has led to a panic. > > Interestingly, I disabled nvme0 using devctl and "nvmecontrol > devlist" didn?t find any nvme controllers despite having 10 > controllers and 10 drives. However, the ZFS pool of 10 NVMe drives > was working happily. Degraded of course, with one NVMe missing.To my knowledge, all the necessary PCIe-layer code is present. However, that's just one layer: Many drivers will likely need changes in order to cope with surprise removal of their devices. For that reason, HotPlug needs a lot of testing on a variety of platforms. The FreeBSD developer base is much smaller than its user base, of course, so the variety of our testing is rather limited. You can help immensely by giving us detailed bug reports, either on a mailing list or in Bugzilla. For a panic, the panic messages and stack trace of the current thread will be very helpful. Complete crashinfo(8) output would be great. The most relevant userland tool is devctl, followed closely by devinfo and pciconf. In the case of Jan's USB 3.0 ExpressCard, it's possible that one or all of the USB controller drivers (xhci, ehci, uhci) didn't cope with the surprise removal of the controller. Before removing the card, try detaching the driver(s) with "devctl detach xhciN". There might be more than one device. Use "pciconf -lc" to find the HotPlug-capable pcib devices (bridges). Use devinfo to find which one is your ExpressCard slot and find all the devices attached to it. Then use devctl to detach the devices. There could be a tree of devices; in that case, you can usually start at the level immediately under pcibN; you don't need to detach every device from the bottom up. Once all the devices are detached, you should be able to remove the card safely. Eric
On 09/27/2016 17:51, Eric van Gyzen wrote:> In the case of Jan's USB 3.0 ExpressCard, it's possible that one or all > of the USB controller drivers (xhci, ehci, uhci) didn't cope with the > surprise removal of the controller. Before removing the card, try > detaching the driver(s) with "devctl detach xhciN". There might be more > than one device. Use "pciconf -lc" to find the HotPlug-capable pcib > devices (bridges). Use devinfo to find which one is your ExpressCard > slot and find all the devices attached to it. Then use devctl to detach > the devices. There could be a tree of devices; in that case, you can > usually start at the level immediately under pcibN; you don't need to > detach every device from the bottom up. Once all the devices are > detached, you should be able to remove the card safely.Doing "devctl detach xhci0" before the removal of the USB 3.0 ExpressCard, there is no panic, the device gets deattached properly, and I can reconnect it later. Anyhow, because the mechanism holding the ExpressCard is not completely reliable, on the third time inserting the card, it did not hold and I got a panic, because it was immediately ejected without devctl detach. Due to the card not holding firmly, I often pulled it together with the usb device on 10.3-RELEASE and never got a panic. I guess it is a regression in the usb driver dealing with sudden loss of the device. The panic message is below, I guess I should take this discussion to freebsd-usb@, CCed. Thanks, Jan Henrik Fatal trap 9: general protection fault while in kernel mode cpuid = 1; acpic id = 01 instruction pointer = 0x20:0xffffffff80b1549c stack pointer = 0x28:0xfffffe022f62ca00 frame pointer = 0x28:0xfffffe022f62ca70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 14 (usbus1) trap number = 9 panic: general protection fault cpuid = 1 KDB: stack backtrace: #0 0xffffffff80b24077 at kdb_backtrace+0x67 #1 0xffffffff80ad93e2 at vpanic+0x182 #2 0xffffffff80ad9253 at panic+0x43 #3 0xffffffff80fa0d31 at trap_fatal+0x351 #4 0xffffffff80fa09c8 at trap+0x768 #5 0xffffffff80f84141 at calltrap+0x8 #6 0xffffffff808f2f63 at usb_detach_device+0xf3 #7 0xffffffff808f1d5b at usb_unconfigure+0x2b #8 0xffffffff808f5623 at usb_free_device+0x103 #9 0xffffffff808f58b1 at usb_bus_detach+0x161 #10 0xffffffff80903e95 at usb_process+0x125 #11 0xffffffff80a90055 at fork_exit+0x85 #12 0xffffffff80f8467e at fork_trampoline+0xe Uptime: 18m27s Automatic reboot in 15 seconds - press a key on the console to abort
> On 27 Sep 2016, at 17:51, Eric van Gyzen <vangyzen at FreeBSD.org> wrote: > > > To my knowledge, all the necessary PCIe-layer code is present. However, > that's just one layer: Many drivers will likely need changes in order > to cope with surprise removal of their devices.Thank you very much, that?s what I needed to know :) I saw that the bits were indeed present, but I was wondering wether I should expect it to work or not.> For that reason, HotPlug needs a lot of testing on a variety of > platforms. The FreeBSD developer base is much smaller than its user > base, of course, so the variety of our testing is rather limited. You > can help immensely by giving us detailed bug reports, either on a > mailing list or in Bugzilla. For a panic, the panic messages and stack > trace of the current thread will be very helpful. Complete crashinfo(8) > output would be great.Of course. Unfortunately, due to poor timing and a DOA server last month, this server is in a countdown to get into production tomorrow running Solaris, but I?ll try to get whatever I can today. Thanks! Borja.