I am running bsnmpd with basic snmpd.config (only community and location changed). When there is a problem with HDD and disk disapeared from ATA channel (eg.: disc physically removed) the bsnmpd always dumps core: kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) I see this for a long rime on all releases of 7.x and 8.x branches (i386 and amd64). I did not tested 9.x. Is it a known bug, or should I file PR? Miroslav Lachman
On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:> I am running bsnmpd with basic snmpd.config (only community and location > changed). > > When there is a problem with HDD and disk disapeared from ATA channel > (eg.: disc physically removed) the bsnmpd always dumps core: > > kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) > > I see this for a long rime on all releases of 7.x and 8.x branches (i386 > and amd64). I did not tested 9.x. > > Is it a known bug, or should I file PR?Do you happen to run bsnmp-ucd too? If you do then what version is it? In bsnmp-ucd-0.3.5 I introduced a bug that lead to bsnmpd crash on a disk detach. It has been fixed (thanks to Brian Somers) in 0.3.6. -- Mikolaj Golub
Mikolaj Golub wrote:> On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote: >> I am running bsnmpd with basic snmpd.config (only community and location >> changed). >> >> When there is a problem with HDD and disk disapeared from ATA channel >> (eg.: disc physically removed) the bsnmpd always dumps core: >> >> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) >> >> I see this for a long rime on all releases of 7.x and 8.x branches (i386 >> and amd64). I did not tested 9.x. >> >> Is it a known bug, or should I file PR? > > Do you happen to run bsnmp-ucd too? If you do then what version is it? > In bsnmp-ucd-0.3.5 I introduced a bug that lead to bsnmpd crash on a > disk detach. It has been fixed (thanks to Brian Somers) in 0.3.6.No, I never installed bsnmpd-ucd. We are using plain bsnmpd from base without any modules. It is used by MRTG only for network traffic. Nothing else. Miroslav Lachman
On Mon, Sep 10, 2012 at 04:46:15PM +0200, Miroslav Lachman wrote:> Mikolaj Golub wrote: > > On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote: > >> I am running bsnmpd with basic snmpd.config (only community and location > >> changed). > >> > >> When there is a problem with HDD and disk disapeared from ATA channel > >> (eg.: disc physically removed) the bsnmpd always dumps core: > >> > >> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) > >> > >> I see this for a long rime on all releases of 7.x and 8.x branches (i386 > >> and amd64). I did not tested 9.x. > >> > >> Is it a known bug, or should I file PR? > > > > Do you happen to run bsnmp-ucd too? If you do then what version is it? > > In bsnmp-ucd-0.3.5 I introduced a bug that lead to bsnmpd crash on a > > disk detach. It has been fixed (thanks to Brian Somers) in 0.3.6. > > No, I never installed bsnmpd-ucd. We are using plain bsnmpd from base > without any modules. > It is used by MRTG only for network traffic. Nothing else.Then the backtrace might be useful. gdb /usr/sbin/bsnmpd /path/to/bsnmpd.core bt -- Mikolaj Golub
On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:> I am running bsnmpd with basic snmpd.config (only community and location > changed). > > When there is a problem with HDD and disk disapeared from ATA channel > (eg.: disc physically removed) the bsnmpd always dumps core: > > kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) > > I see this for a long rime on all releases of 7.x and 8.x branches (i386 > and amd64). I did not tested 9.x.Ok, I was able to to reproduce this under qemu doing atacontrol detach ata1 It crashes in snmp_hostres module, in refresh_device_tbl->refresh_disk_storage_tbl->disk_OS_get_ATA_disks when traversing device_map list and dereferencing map->entry_p, which is NULL here. device_map table is used for consistent device table indexing. refresh_device_tbl(), refresh routine for hrDeviceTable, checks the list of available devices and calls device_entry_delete() for devices that have gone. It does not remove the entry from device_map table, but just sets entry_p to NULL for it (to preserve index reuse by another device). Then refresh_disk_storage_tbl() is called, which in turn calls disk_OS_get_ATA_disks(); disk_OS_get_MD_disks(); disk_OS_get_disks(); and it crashes in disk_OS_get_ATA_disks() when the removed map entry is dereferenced. I am attaching the patch that fixes the issue for me. I was wandering why the issue was not observed after md device removal, as disk_OS_get_MD_disks() did the same things. It has turned out that hostres just does not see md devices, so this function is currently useless. hostres gets devices from devinfo(3), which does not return md devices. disk_OS_get_disks() calls kern.disks sysctl to get the list of disks, and uses device_map differently, so it is not affected. -- Mikolaj Golub -------------- next part -------------- A non-text attachment was scrubbed... Name: hostres_diskstorage_tbl.c.skip.patch Type: text/x-diff Size: 940 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20120915/71997662/hostres_diskstorage_tbl.c.skip.bin
On 15.09.2012 16:50, Mikolaj Golub wrote:> I am attaching the patch that fixes the issue for me. > > I was wandering why the issue was not observed after md device > removal, as disk_OS_get_MD_disks() did the same things. It has turned > out that hostres just does not see md devices, so this function is > currently useless. hostres gets devices from devinfo(3), which does > not return md devices. > > disk_OS_get_disks() calls kern.disks sysctl to get the list of disks, > and uses device_map differently, so it is not affected.I also have a big patch to the hostres module, but it is not yet finished. Probably i should commit the part related to the disk subsystem. This part has been rewritten to be GEOM aware. -- WBR, Andrey V. Elsukov
Mikolaj Golub wrote:> On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote: >> I am running bsnmpd with basic snmpd.config (only community and location >> changed). >> >> When there is a problem with HDD and disk disapeared from ATA channel >> (eg.: disc physically removed) the bsnmpd always dumps core: >> >> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped) >> >> I see this for a long rime on all releases of 7.x and 8.x branches (i386 >> and amd64). I did not tested 9.x. > > Ok, I was able to to reproduce this under qemu doing > > atacontrol detach ata1[...]> and it crashes in disk_OS_get_ATA_disks() when the removed map entry > is dereferenced. > > I am attaching the patch that fixes the issue for me.I am glad to read that you found the bug! The fix (patch) seems trivial - will it be commited / MFCed? :) Thank you for your work on this problem! Miroslav Lachman
On Sun, Sep 16, 2012 at 05:56:22PM +0400, Andrey V. Elsukov wrote:> On 15.09.2012 16:50, Mikolaj Golub wrote: > > I am attaching the patch that fixes the issue for me. > > > > I was wandering why the issue was not observed after md device > > removal, as disk_OS_get_MD_disks() did the same things. It has turned > > out that hostres just does not see md devices, so this function is > > currently useless. hostres gets devices from devinfo(3), which does > > not return md devices. > > > > disk_OS_get_disks() calls kern.disks sysctl to get the list of disks, > > and uses device_map differently, so it is not affected. > > I also have a big patch to the hostres module, but it is not yet > finished. Probably i should commit the part related to the disk > subsystem. This part has been rewritten to be GEOM aware.Wonderful! And as I understand it will solve this problem too? Then I think no need in committing my patch, unless you are not planning to merge to stable/[78] (where any fix for this problem is highly desirable). -- Mikolaj Golub
On Sun, Sep 16, 2012 at 07:07:20PM +0200, Miroslav Lachman wrote:> I am glad to read that you found the bug! > The fix (patch) seems trivial - will it be commited / MFCed? :)Andrey told me that he was not sure when he would be able to commit his work, so I have just committed my fix. I am going to MFC it. -- Mikolaj Golub