Jaeyong Yoo
2013-May-28 12:14 UTC
Very strange behavior between balloon driver and network driver (Arndale)
Samsung Enterprise Portal mySingle Hello, I'm testing xen in Arndale board and I found very strange behavior: When I'm creating/destroying domUs consecutively, suddenly network driver does not work. I tried both on-board and USB-dongle network devices but they show the similar behavior. Only difference is that when I'm using on-board, it gives the following error message: [ 709.900000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length 0x85555, offset 4 [ 709.910000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length 0x0, offset 4 [ 709.920000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length 0x0, offset 4 [ 709.930000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length 0x105555, offset 4 while USB-dongle network device does not show any message. I investigated this problem and found out that when I comment out the following line in file drivers/xen/balloon.c at function free_xenballooned_pages as follows, //if (current_credit()) // schedule_delayed_work(&balloon_worker, 0); , the problem disappears. balloon_worker is doing increase_reservation and decrease_reservation that use XENMEM hypercalls. I think while mapping and unmapping domain pages with the above hypercalls, somehow the network driver memory is corrupted. Currently, most suspcious function is create_p2m_entries in xen (xen/arch/arm/p2m.c) When we call this function with INSERT op, it maps the corresponding pte (page table entry) to the given mfn without checking whether the pte is already take by someone. I actually observed that pte's base address is overwritten to some other value and I'm not sure this behavior is OK. If it is not OK, I think it can corrupt others memory (for instance, inocent network driver). Do you have any ideas or experience similar bug (the network error)? Best, Jaeyong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-May-28 13:10 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
On Tue, 2013-05-28 at 12:14 +0000, Jaeyong Yoo wrote:> Hello, > I''m testing xen in Arndale board and I found very strange behavior: > > When I''m creating/destroying domUs consecutively, suddenly network > driver does not work. > I tried both on-board and USB-dongle network devices but they show the > similar behavior.I expect that the ballooning up and down is counteracting the affect of the 1:1 workaround which is required on the Arndale due to lack of an IOMMU driver. As a workaround I would recommend using dom0_mem=<something> and disabling autoballooning. Of course if, as a Sumsung developer, you are able to gain access to the documentation necessary to write an IOMMU driver for the Arndale platform that would be great! Ian.> Only difference is that when I''m using on-board, it gives the > following error message: > > > > [ 709.900000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length > 0x85555, offset 4 > [ 709.910000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length > 0x0, offset 4 > [ 709.920000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length > 0x0, offset 4 > [ 709.930000] asix 1-3.2.4:1.0 eth0: asix_rx_fixup() Bad Header Length > 0x105555, offset 4 > > while USB-dongle network device does not show any message. > > > > I investigated this problem and found out that when I comment out the > following line > in file drivers/xen/balloon.c at function free_xenballooned_pages as > follows, > > //if (current_credit()) > // schedule_delayed_work(&balloon_worker, 0); > , the problem disappears. > balloon_worker is doing increase_reservation and decrease_reservation > that use XENMEM > > hypercalls. I think while mapping and unmapping domain pages with the > above hypercalls, > > somehow the network driver memory is corrupted. > > > > Currently, most suspcious function is create_p2m_entries in xen > (xen/arch/arm/p2m.c) > When we call this function with INSERT op, it maps the corresponding > pte (page table entry) > > to the given mfn without checking whether the pte is already take by > someone. > I actually observed that pte''s base address is overwritten to some > other value and I''m not sure > > this behavior is OK. If it is not OK, I think it can corrupt others > memory (for instance, inocent > > network driver). > > > > Do you have any ideas or experience similar bug (the network error)? > > > > Best, > > Jaeyong > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-May-29 11:22 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
> > I expect that the ballooning up and down is counteracting the affect of > the 1:1 workaround which is required on the Arndale due to lack of an > IOMMU driver. > > As a workaround I would recommend using dom0_mem= and > disabling autoballooning.After I applied the above two things, I still see the network error. Autoballooning is still used in privcmd with ioctl IOCTL_PRIVCMD_MMAPBATCH_V2 when mapping foreign range (in order to deliver the domU kernel binary to domU memory). I applied as follows: - put dom0_mem=256M in dom0 kernel booting arg. - create domU with config autoballoon="off"> > Of course if, as a Sumsung developer, you are able to gain access to the > documentation necessary to write an IOMMU driver for the Arndale > platform that would be great!AFAIK Arndale board has sysMMU but it is placed between CPU and GPU for performance purpose. So, I think it is not be able to use it for virtualization.> > Ian. >Jaeyong
Ian Campbell
2013-May-29 11:32 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
On Wed, 2013-05-29 at 11:22 +0000, Jaeyong Yoo wrote:> > > > I expect that the ballooning up and down is counteracting the affect of > > the 1:1 workaround which is required on the Arndale due to lack of an > > IOMMU driver. > > > > As a workaround I would recommend using dom0_mem= and > > disabling autoballooning. > > After I applied the above two things, I still see the network error. > Autoballooning is still used in privcmd with ioctl > IOCTL_PRIVCMD_MMAPBATCH_V2 when mapping foreign range (in order to > deliver the domU kernel binary to domU memory). > > I applied as follows: > - put dom0_mem=256M in dom0 kernel booting arg.This needs to be a Xen argument, not a dom0 kernel argument. That said I thought that privcmd mmapbatch put back the original page, perhaps I am mistaken here though.> - create domU with config autoballoon="off"You put this in /etc/xen/xl.conf, not the guest config, correct?> > Of course if, as a Sumsung developer, you are able to gain access to the > > documentation necessary to write an IOMMU driver for the Arndale > > platform that would be great! > > AFAIK Arndale board has sysMMU but it is placed between CPU and GPU for > performance purpose. So, I think it is not be able to use it for > virtualization.Oh, that''s interesting and unfortunate :-( Ian.
Jaeyong Yoo
2013-May-29 11:57 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
> >This needs to be a Xen argument, not a dom0 kernel argument.Ah! got it.> >That said I thought that privcmd mmapbatch put back the original page, >perhaps I am mistaken here though. > >> - create domU with config autoballoon="off" > >>You put this in /etc/xen/xl.conf, not the guest config, correct?Yes. I guess the current work-around solution would be commenting out the following two lines, very nasty and introduces memory leak though. Memory leak gives more time than network error for testing xen on Arndale :) In function drivers/xen/balloon.c //if (current_credit()) // schedule_delayed_work(&balloon_worker, 0); Jaeyong
Ian Campbell
2013-May-29 13:51 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
On Wed, 2013-05-29 at 11:57 +0000, Jaeyong Yoo wrote:> > > >This needs to be a Xen argument, not a dom0 kernel argument. > Ah! got it. > > > > >That said I thought that privcmd mmapbatch put back the original page, > >perhaps I am mistaken here though. > > > >> - create domU with config autoballoon="off" > > > >>You put this in /etc/xen/xl.conf, not the guest config, correct? > > Yes. > > I guess the current work-around solution would be commenting out > the following two lines, very nasty and introduces memory leak though. > Memory leak gives more time than network error for testing xen on Arndale :)The kernel should reuse ballooned out pages where possible rather than creating more so the leak is probably limited. Not ideal though. I suspect the real answer is that the 1:1 workaround needs to be cleverer when dom0 decreases and increases its reservation...> > In function drivers/xen/balloon.c > //if (current_credit()) > // schedule_delayed_work(&balloon_worker, 0); > > Jaeyong
Stefano Stabellini
2013-May-29 13:55 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
On Wed, 29 May 2013, Ian Campbell wrote:> > > Of course if, as a Sumsung developer, you are able to gain access to the > > > documentation necessary to write an IOMMU driver for the Arndale > > > platform that would be great! > > > > AFAIK Arndale board has sysMMU but it is placed between CPU and GPU for > > performance purpose. So, I think it is not be able to use it for > > virtualization. > > Oh, that''s interesting and unfortunate :-(Am I getting this right that it not possible to use the sysMMU on the Arndale board to translate DMA accesses from the network card? FYI we wouldn''t need two translation levels, just one.
Jaeyong Yoo
2013-May-30 03:40 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
>> >> Oh, that''s interesting and unfortunate :-( > >Am I getting this right that it not possible to use the sysMMU on the >Arndale board to translate DMA accesses from the network card? >FYI we wouldn''t need two translation levels, just one. >I''m not perfectly sure about this because I''m a software guy, but, I think it is not possible. Jaeyong
Jaeyong Yoo
2013-May-30 03:44 UTC
Re: Very strange behavior between balloon driver and network driver (Arndale)
> > I guess the current work-around solution would be commenting out > > the following two lines, very nasty and introduces memory leak though. > > Memory leak gives more time than network error for testing xen on Arndale :) > > The kernel should reuse ballooned out pages where possible rather than > creating more so the leak is probably limited. Not ideal though. > > I suspect the real answer is that the 1:1 workaround needs to be > cleverer when dom0 decreases and increases its reservation...I agree. Jaeyong