Gareth Wyn Roberts
2015-Apr-12 17:57 UTC
msk msk0 watchdog timeout freeze hang lock stop problem
I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> mem 0xfa000000-0xfa003fff irq 19 at device 0.0 on pci6 msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00> on mskc0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: <MII bus> on msk0 e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma ster, auto, auto-flow The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Here's the patch to if_mskreg.h --- if_mskreg.h-orig 2014-11-11 20:02:58.000000000 +0000 +++ if_mskreg.h 2015-04-12 18:47:20.000000000 +0100 @@ -2179,9 +2179,11 @@ * At first I guessed 8 bytes, the size of a single descriptor, would be * required alignment constraints. But, it seems that Yukon II have 4096 * bytes boundary alignment constraints. + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) + * requires 8192 byte alignment to prevent locking. */ #define MSK_RING_ALIGN 4096 -#define MSK_STAT_ALIGN 4096 +#define MSK_STAT_ALIGN 8192 The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are attached. Perhaps the developers would consider committing these as it may be useful for future debugging. Gareth. -------------- next part -------------- A non-text attachment was scrubbed... Name: if_mskreg.h.rev264442.diff Type: text/x-patch Size: 603 bytes Desc: if_mskreg.h.rev264442.diff URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20150412/b581a021/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: if_msk.c.rev262524.dma.diff Type: text/x-patch Size: 3748 bytes Desc: if_msk.c.rev262524.dma.diff URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20150412/b581a021/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: if_mskreg.h.rev264442.dma.diff Type: text/x-patch Size: 1533 bytes Desc: if_mskreg.h.rev264442.dma.diff URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20150412/b581a021/attachment-0002.bin>
Hi!> I've run in to problems using the msk device [...]> I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. > > Here's the patch to if_mskreg.h[...] Thanks for the suggested fix. There are five PRs, all describe similar things: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197887 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197002 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=189404 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186872 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166727 I added some pointer to your posting, maybe someone can test it ? -- pi at opsec.eu +49 171 3101372 5 years to go !
Yonghyeon PYUN
2015-Apr-13 08:13 UTC
msk msk0 watchdog timeout freeze hang lock stop problem
On Sun, Apr 12, 2015 at 05:57:34PM +0000, Gareth Wyn Roberts wrote:> I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). > > I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: > > mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> mem 0xfa000000-0xfa003fff irq 19 at device 0.0 on pci6 > msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00> on mskc0 > msk0: Ethernet address: 00:13:77:e9:df:eb > miibus0: <MII bus> on msk0 > e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0 > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma > ster, auto, auto-flow > > The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. > > It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. > I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442.Thanks for letting me know your findings. I really appreciate that. I recall that the alignment requirement of status LEs(List Elements in Marvell terms) is 2048 and the maximum size of the status LEs is 4096 bytes(Actual alignment seems to be much lower value like 32 or 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). Later experiments showed some variants of Yukon II require 4096 bytes alignment and I changed the alignment to 4096 in the past. It seems your finding indicates msk(4) needs 8192 alignment for status LEs. However this does not explain how and why the same code in 8.x/9.x works well. In addition, it's not common to require alignment size greater than PAGE_SIZE on x86 given that the maximum size of DMA buffer is 4096 bytes. I have to check whether there was a change in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due to lack of spare time. Probably you can verify the DMA address of status LEs meets the following requirements both on i386 and amd64. - Alignment is 4096. - Number of DMA segment is 1. - DMA segment base address plus DMA segment size does not cross a PAGE_SIZE boundary.> > Here's the patch to if_mskreg.h > --- if_mskreg.h-orig 2014-11-11 20:02:58.000000000 +0000 > +++ if_mskreg.h 2015-04-12 18:47:20.000000000 +0100 > @@ -2179,9 +2179,11 @@ > * At first I guessed 8 bytes, the size of a single descriptor, would be > * required alignment constraints. But, it seems that Yukon II have 4096 > * bytes boundary alignment constraints. > + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) > + * requires 8192 byte alignment to prevent locking. > */ > #define MSK_RING_ALIGN 4096 > -#define MSK_STAT_ALIGN 4096 > +#define MSK_STAT_ALIGN 8192 > > > The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are attached. Perhaps the developers would consider committing these as it may be useful for future debugging. >If you have more than 4GB memory installed and disables 64bit DMA addressing, msk(4) shall use bounce buffers. Passing packets through bounce buffers involves copy operation and it costs a lot. You can check hw.busdma sysctl node to see whether there are drivers that use bounce buffers. And if you want to disable 64bit DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below BUS_SPACE_MAXADDR check in if_mskreg.h.